Clojure and me » 2011

Valued functions

unsorted — cgrand, 17 October 2011 @ 14 h 55 min

Functions are black boxes and determining when two functions are equivalent is undecidable. That’s why:(= (fn [x] x) (fn [x] x)) is false (a clever compiler could sove this naive specific case but not all).

However from time to time I’d like functions constructed the same way be considered equals. For example (= (comp inc inc) (comp inc inc)) could be true and it wouldn’t necessitate a smart enough compiler or memoization.

It’s not without precedent: keywords, symbols, maps, sets and vectors are functions and also have value semantics: (= (hash-set 1) (hash-set 1)) is true!

In indexed-set I use maps whose values are functions, sometimes functions returned by higher-order functions and the user having no reference to the function returned by the hof can’t look it up in the map — if she calls the hof agin with the same arguments she gets another function which is not equal to the first one. This means that I should always let the user call the hof by herself and keep the result around. It may be ok for a low-level API but not for more user-friendly functions.

Helpfully, records have value semantics, can implement Java interfaces and do not already implement clojure.lang.IFn and that’s what I have done:

(defrecord Op-By [key f init]
  clojure.lang.IFn
  (invoke [this m v]
    (let [k (key v)]
      (assoc m k (f (m k init) v)))))

(defn- op-by [key f init] (Op-By. key f init))

instead of

(defn- op-by [key f init] 
  (fn [m v])
    (let [k (key v)]
      (assoc m k (f (m k init) v))))

I’m still not happy with that I’d like something more fluid (closer to fn, maybe built on reify but ideally a metadata flag on fn e.g. (^:valued fn [m v] ...)) ; I don’t think the type name is valueable, I’m on the fence concerning features I get for free from defrecord (namely conj, assoc and dissoc).

Comments (4)

A world in a ref

unsorted — cgrand, 6 October 2011 @ 13 h 00 min

At times I struggle deciding on the granularity I should give to my refs. If I put a big map in a single ref (what I call a megaref), updates to unrelated parts of the map may conflict and cause transactions to retry, but it’s dead easy to snapshot the whole state. If I use a ref for every entry of the map, concurrency is excellent again but snapshotting the whole state may be tricky (you need to tune the ref to have a history long enough) when there is a large number of rapidly changing refs or when the system is slow/loaded.

What I’d like is an alter-in function, a mix of alter and update-in, with the following guarantee: two alter-ins conflict when their paths are either equal or prefix from one another.

This is not something new: this problem bugs me since 2008 I think. I had several (failed/unfinished and private) attempts to patch the STM to accommodate for a new reference type.

Some weeks ago I realized that I didn’t need to hack the STM to create such a new reference type.

The basic idea behind all my attempts was to use lock-striping. What I didn’t realize is that I can leverage the existing STM by using ref-striping!

Let’s say the whole map is stored in a single ref, the root ref and you have N guard refs (whose value is nil). When I want to alter the value for a given key, I compute its hash modulo N which gives me an index into the guards vector. I ref-set the corresponding guard ref (to nil, the actual value doesn’t matter) thus claiming exclusive commit rights for this key. Once that done I simply commute on the root ref being sure that the operation will only commute with non-conflicting other ops.

NB: An alternative design is to simply have N roots and no guards and to merge the N maps on deref — efficient merging would need a didcated map implementation. However it doesn’t help much with nested maps but it helps a lot with concurrency since in the other design everything is serialized on the single root. I think an hybrid solution (N roots, each roots having M guards) woud bring an interesting trade-off between concurrency, number of retries and ease of snapshotting.

To support nested maps, instead of a single key, one has to deal with a path, eg [:a :b :c]. Here the idea is to ensure guards for path prefixes ([:a] and [:a :b]) and to ref-set the guard for the full-path as before.

;; except ensure-in and alter-in, the others are here because
;; they can be optimized in a mega ref wth several roots
(defprotocol AssociativeRef
  (-alter-in [aref ks f args])
  (-commute-in [aref ks f args])
  (ref-set-in [aref ks v])
  (ensure-in [aref ks])
  (deref-in [aref ks]))
 
(defn alter-in [aref ks f & args]
  (-alter-in aref ks f args))

(defn commute-in [aref ks f & args]
  (-commute-in aref ks f args))

(defn- ensure-path [guard-for path]
  (loop [path path]
    (when-not (= [] path)
      (ensure (guard-for path))
      (recur (pop path)))))

(defn- guards-fn [guards]
  (let [n (count guards)]
    #(nth guards (mod (hash %) n))))

(deftype MegaRef [r guards]
  AssociativeRef
  (-alter-in [this ks f args]
    (if (seq ks)
      (let [guard-for (guards-fn guards)]
        (ensure-path guard-for (pop (vec ks)))
        (ref-set (guard-for ks) nil)
        (let [v (apply f (get-in @r ks) args)]
          ; v is precomputed to not evaluate it twice because of commute
          (commute r assoc-in ks v)))
      (throw (IllegalArgumentException. "ks must not be empty."))))
  (-commute-in [this ks f args]
    (apply commute r update-in ks f args))
  (ref-set-in [this ks v]
    (-alter-in this ks (constantly v) nil))
  (ensure-in [this ks]
    (ensure-path  (vec ks) (guards-fn guards))
    (deref-in this ks))
  (deref-in [this ks]
    (get-in @r ks))
  clojure.lang.IDeref
  (deref [this]
    @r))

(defn megaref [entries & options]
  (let [guards (:guards options 16)
        root-options (select-keys options [:validator :min-history :max-history])]
    (MegaRef. (apply ref (into {} entries) root-options)
              (vec (repeatedly guards #(ref nil))))))

NB: Write skew anomalies can now occur at the “sub ref” level: when you deref-in a path unrelated to the ones you update; ensure-in is the answer.

So, some questions:

Is this of interest to anyone except me? If yes, once mature (eg multiroot) should it belong in contrib?
Is my implementation sound? Do I rely on implementation details?
Should the STM expose a “token” facility so that I don’t have to use refs as guards?

Comments (85)