At times I struggle deciding on the granularity I should give to my refs. If I put a big map in a single ref (what I call a megaref), updates to unrelated parts of the map may conflict and cause transactions to retry, but it’s dead easy to snapshot the whole state. If I use a ref for every entry of the map, concurrency is excellent again but snapshotting the whole state may be tricky (you need to tune the ref to have a history long enough) when there is a large number of rapidly changing refs or when the system is slow/loaded.
What I’d like is an alter-in
function, a mix of alter
and update-in
, with the following guarantee: two alter-in
s conflict when their paths are either equal or prefix from one another.
This is not something new: this problem bugs me since 2008 I think. I had several (failed/unfinished and private) attempts to patch the STM to accommodate for a new reference type.
Some weeks ago I realized that I didn’t need to hack the STM to create such a new reference type.
The basic idea behind all my attempts was to use lock-striping. What I didn’t realize is that I can leverage the existing STM by using ref-striping!
Let’s say the whole map is stored in a single ref, the root ref and you have N guard refs (whose value is nil
). When I want to alter the value for a given key, I compute its hash modulo N which gives me an index into the guards vector. I ref-set
the corresponding guard ref (to nil
, the actual value doesn’t matter) thus claiming exclusive commit rights for this key. Once that done I simply commute
on the root ref being sure that the operation will only commute with non-conflicting other ops.
NB: An alternative design is to simply have N roots and no guards and to merge the N maps on deref
— efficient merging would need a didcated map implementation. However it doesn’t help much with nested maps but it helps a lot with concurrency since in the other design everything is serialized on the single root. I think an hybrid solution (N roots, each roots having M guards) woud bring an interesting trade-off between concurrency, number of retries and ease of snapshotting.
To support nested maps, instead of a single key, one has to deal with a path, eg [:a :b :c]
. Here the idea is to ensure
guards for path prefixes ([:a]
and [:a :b]
) and to ref-set
the guard for the full-path as before.
;; except ensure-in and alter-in, the others are here because
;; they can be optimized in a mega ref wth several roots
(defprotocol AssociativeRef
(-alter-in [aref ks f args])
(-commute-in [aref ks f args])
(ref-set-in [aref ks v])
(ensure-in [aref ks])
(deref-in [aref ks]))
(defn alter-in [aref ks f & args]
(-alter-in aref ks f args))
(defn commute-in [aref ks f & args]
(-commute-in aref ks f args))
(defn- ensure-path [guard-for path]
(loop [path path]
(when-not (= [] path)
(ensure (guard-for path))
(recur (pop path)))))
(defn- guards-fn [guards]
(let [n (count guards)]
#(nth guards (mod (hash %) n))))
(deftype MegaRef [r guards]
AssociativeRef
(-alter-in [this ks f args]
(if (seq ks)
(let [guard-for (guards-fn guards)]
(ensure-path guard-for (pop (vec ks)))
(ref-set (guard-for ks) nil)
(let [v (apply f (get-in @r ks) args)]
; v is precomputed to not evaluate it twice because of commute
(commute r assoc-in ks v)))
(throw (IllegalArgumentException. "ks must not be empty."))))
(-commute-in [this ks f args]
(apply commute r update-in ks f args))
(ref-set-in [this ks v]
(-alter-in this ks (constantly v) nil))
(ensure-in [this ks]
(ensure-path (vec ks) (guards-fn guards))
(deref-in this ks))
(deref-in [this ks]
(get-in @r ks))
clojure.lang.IDeref
(deref [this]
@r))
(defn megaref [entries & options]
(let [guards (:guards options 16)
root-options (select-keys options [:validator :min-history :max-history])]
(MegaRef. (apply ref (into {} entries) root-options)
(vec (repeatedly guards #(ref nil))))))
NB: Write skew anomalies can now occur at the “sub ref” level: when you deref-in
a path unrelated to the ones you update; ensure-in
is the answer.
So, some questions:
- Is this of interest to anyone except me? If yes, once mature (eg multiroot) should it belong in contrib?
- Is my implementation sound? Do I rely on implementation details?
- Should the STM expose a “token” facility so that I don’t have to use refs as guards?