Valued functions

unsorted — cgrand, 17 October 2011 @ 14 h 55 min

Functions are black boxes and determining when two functions are equivalent is undecidable. That’s why:(= (fn [x] x) (fn [x] x)) is false (a clever compiler could sove this naive specific case but not all).

However from time to time I’d like functions constructed the same way be considered equals. For example (= (comp inc inc) (comp inc inc)) could be true and it wouldn’t necessitate a smart enough compiler or memoization.

It’s not without precedent: keywords, symbols, maps, sets and vectors are functions and also have value semantics: (= (hash-set 1) (hash-set 1)) is true!

In indexed-set I use maps whose values are functions, sometimes functions returned by higher-order functions and the user having no reference to the function returned by the hof can’t look it up in the map — if she calls the hof agin with the same arguments she gets another function which is not equal to the first one. This means that I should always let the user call the hof by herself and keep the result around. It may be ok for a low-level API but not for more user-friendly functions.

Helpfully, records have value semantics, can implement Java interfaces and do not already implement clojure.lang.IFn and that’s what I have done:

(defrecord Op-By [key f init]
  clojure.lang.IFn
  (invoke [this m v]
    (let [k (key v)]
      (assoc m k (f (m k init) v)))))

(defn- op-by [key f init] (Op-By. key f init))

instead of

(defn- op-by [key f init]
  (fn [m v])
    (let [k (key v)]
      (assoc m k (f (m k init) v))))

I’m still not happy with that I’d like something more fluid (closer to fn, maybe built on reify but ideally a metadata flag on fn e.g. (^:valued fn [m v] ...)) ; I don’t think the type name is valueable, I’m on the fence concerning features I get for free from defrecord (namely conj, assoc and dissoc).

A world in a ref

unsorted — cgrand, 6 October 2011 @ 13 h 00 min

At times I struggle deciding on the granularity I should give to my refs. If I put a big map in a single ref (what I call a megaref), updates to unrelated parts of the map may conflict and cause transactions to retry, but it’s dead easy to snapshot the whole state. If I use a ref for every entry of the map, concurrency is excellent again but snapshotting the whole state may be tricky (you need to tune the ref to have a history long enough) when there is a large number of rapidly changing refs or when the system is slow/loaded.

What I’d like is an alter-in function, a mix of alter and update-in, with the following guarantee: two alter-ins conflict when their paths are either equal or prefix from one another.

This is not something new: this problem bugs me since 2008 I think. I had several (failed/unfinished and private) attempts to patch the STM to accommodate for a new reference type.

Some weeks ago I realized that I didn’t need to hack the STM to create such a new reference type.

The basic idea behind all my attempts was to use lock-striping. What I didn’t realize is that I can leverage the existing STM by using ref-striping!

Let’s say the whole map is stored in a single ref, the root ref and you have N guard refs (whose value is nil). When I want to alter the value for a given key, I compute its hash modulo N which gives me an index into the guards vector. I ref-set the corresponding guard ref (to nil, the actual value doesn’t matter) thus claiming exclusive commit rights for this key. Once that done I simply commute on the root ref being sure that the operation will only commute with non-conflicting other ops.

NB: An alternative design is to simply have N roots and no guards and to merge the N maps on deref — efficient merging would need a didcated map implementation. However it doesn’t help much with nested maps but it helps a lot with concurrency since in the other design everything is serialized on the single root. I think an hybrid solution (N roots, each roots having M guards) woud bring an interesting trade-off between concurrency, number of retries and ease of snapshotting.

To support nested maps, instead of a single key, one has to deal with a path, eg [:a :b :c]. Here the idea is to ensure guards for path prefixes ([:a] and [:a :b]) and to ref-set the guard for the full-path as before.

;; except ensure-in and alter-in, the others are here because
;; they can be optimized in a mega ref wth several roots
(defprotocol AssociativeRef
  (-alter-in [aref ks f args])
  (-commute-in [aref ks f args])
  (ref-set-in [aref ks v])
  (ensure-in [aref ks])
  (deref-in [aref ks]))

(defn alter-in [aref ks f & args]
  (-alter-in aref ks f args))

(defn commute-in [aref ks f & args]
  (-commute-in aref ks f args))

(defn- ensure-path [guard-for path]
  (loop [path path]
    (when-not (= [] path)
      (ensure (guard-for path))
      (recur (pop path)))))

(defn- guards-fn [guards]
  (let [n (count guards)]
    #(nth guards (mod (hash %) n))))

(deftype MegaRef [r guards]
  AssociativeRef
  (-alter-in [this ks f args]
    (if (seq ks)
      (let [guard-for (guards-fn guards)]
        (ensure-path guard-for (pop (vec ks)))
        (ref-set (guard-for ks) nil)
        (let [v (apply f (get-in @r ks) args)]
          ; v is precomputed to not evaluate it twice because of commute
          (commute r assoc-in ks v)))
      (throw (IllegalArgumentException. "ks must not be empty."))))
  (-commute-in [this ks f args]
    (apply commute r update-in ks f args))
  (ref-set-in [this ks v]
    (-alter-in this ks (constantly v) nil))
  (ensure-in [this ks]
    (ensure-path  (vec ks) (guards-fn guards))
    (deref-in this ks))
  (deref-in [this ks]
    (get-in @r ks))
  clojure.lang.IDeref
  (deref [this]
    @r))

(defn megaref [entries & options]
  (let [guards (:guards options 16)
        root-options (select-keys options [:validator :min-history :max-history])]
    (MegaRef. (apply ref (into {} entries) root-options)
              (vec (repeatedly guards #(ref nil))))))

NB: Write skew anomalies can now occur at the “sub ref” level: when you deref-in a path unrelated to the ones you update; ensure-in is the answer.

So, some questions:

  • Is this of interest to anyone except me? If yes, once mature (eg multiroot) should it belong in contrib?
  • Is my implementation sound? Do I rely on implementation details?
  • Should the STM expose a “token” facility so that I don’t have to use refs as guards?

Conway’s Game of Life

unsorted — cgrand, 19 August 2011 @ 20 h 01 min

APL is famous for having a 1-liner for Conway’s game of life.
Being very efficient at implementing a matrix-based solution of Conway’s game of life should come to no suprise from an array-oriented language.

The way you model data determines your code. Clojure encourages what I call relational-oriented programming. That is modeling with sets, natural identifiers (thanks to composite values) and maps-as-indexes.

If you pick the right representation for the state of the board, you end up with a succinct implementation:

(defn neighbours [[x y]]
  (for [dx [-1 0 1] dy (if (zero? dx) [-1 1] [-1 0 1])]
    [(+ dx x) (+ dy y)]))

(defn step [cells]
  (set (for [[loc n] (frequencies (mapcat neighbours cells))
             :when (or (= n 3) (and (= n 2) (cells loc)))]
         loc)))

Let’s see how it behaves with the “blinker” configuration:

(def board #{[1 0] [1 1] [1 2]})
; #'user/board
(take 5 (iterate step board))
; (#{[1 0] [1 1] [1 2]} #{[2 1] [1 1] [0 1]} #{[1 0] [1 1] [1 2]} #{[2 1] [1 1] [0 1]} #{[1 0] [1 1] [1 2]})

Great, it oscillates as expected!

From this step can be distilled a generic topology-agnostic life-like automatons stepper factory (phew!) but this is a subject for another post or — shameless plug — a book.

A flatter cond

utilities — cgrand, 17 June 2011 @ 14 h 44 min

Long time, no post. I’ve had two hiatus (one of this kind and one of that kind) and I’m still going through the pile of accumulated work (including the bird book — it’s a painted snipe).

To warm up, a little macro I find quite handy:

(ns flatter.cond
  (:refer-clojure :exclude [cond]))

(defmacro cond 
  "A variation on cond which sports let bindings:
     (cond 
       (odd? a) 1
       :let [a (quot a 2)]
       (odd? a) 2
       :else 3)"
  [& clauses]
  (when-let [[test expr & clauses] (seq clauses)]
    (if (= :let test)
      `(let ~expr (cond ~@clauses))
      `(if ~test ~expr (cond ~@clauses)))))

A* in Clojure

unsorted — cgrand, 4 September 2010 @ 4 h 22 min

Thanks to Mark Engelberg there’s now a nice priority map in contrib. With such a facility, it becomes really easy to write a nice implementation of the A* algorithm.

(ns net.cgrand.clj-me.a-star
  (:use [clojure.contrib.priority-map :only [priority-map]]))

(defn A*
 "Finds a path between start and goal inside the graph described by edges
  (a map of edge to distance); estimate is an heuristic for the actual
  distance. Accepts a named option: :monotonic (default to true).
  Returns the path if found or nil."
 [edges estimate start goal & {mono :monotonic :or {mono true}}]
  (let [f (memoize #(estimate % goal)) ; unsure the memoization is worthy
        neighbours (reduce (fn [m [a b]] (assoc m a (conj (m a #{}) b)))
                      {} (keys edges))]
    (loop [q (priority-map start (f start))
           preds {}
           shortest {start 0}
           done #{}]
      (when-let [[x hx] (peek q)]
        (if (= goal x)
          (reverse (take-while identity (iterate preds goal)))
          (let [dx (- hx (f x))
                bn (for [n (remove done (neighbours x))
                         :let [hn (+ dx (edges [x n]) (f n))
                               sn (shortest n Double/POSITIVE_INFINITY)]
                         :when (< hn sn)]
                     [n hn])]
            (recur (into (pop q) bn)
              (into preds (for [[n] bn] [n x]))
              (into shortest bn)
              (if mono (conj done x) done))))))))

Example:

(defn euclidian-distance [a b] ; multidimensional
  (Math/sqrt (reduce + (map #(let [c (- %1 %2)] (* c c)) a b))))

;; generate a grid graph whose outlying edges are one-way
(defn grid [x y w h]
  (into {}
    (for [i (range w) j (range h)
          :let [x0 (+ x i) y0 (+ y j) x1 (inc x0) y1 (inc y0)]]
      {[[x0 y0] [x1 y0]] 1
       [[x1 y0] [x1 y1]] 1
       [[x1 y1] [x0 y1]] 1
       [[x0 y1] [x0 y0]] 1})))

(def g (apply dissoc (grid 0 0 4 4) (keys (grid 1 1 2 2))))

(comment 
user=> (A* g euclidian-distance [0 3] [4 2])
([0 3] [1 3] [2 3] [3 3] [3 2] [4 2])
)

Shameless plug: if you are in Europe and want to learn Clojure, don’t forget to register for the course Lau and me will be giving in Frankfurt, October 26-28.

Quick update

unsorted — cgrand, 26 August 2010 @ 0 h 05 min

Long time no post. No code in this one though.
However I’m pleased to announce that Lau and me are going to give another Clojure course: October 26-28 in Frankfurt, beginners welcome. So if you want to learn Clojure, register!

Primitive types support for fns coming to a Clojure branch near you

unsorted — cgrand, 10 June 2010 @ 8 h 57 min

Rich Hickey is working on primitive support in the prim branch. As of now, one can write:

(defn ^:static fib ^long [^long n]
  (if (>= (long 1) n)
    (long 1)
    (+ (fib (dec n)) (fib (- n (long 2))))))

and computes (fib 38) on my laptop in 650ms where my (or even Rich’s) best attempt took about 2s! If I tweak the above code to use unchecked ops (regular arithmetic ops on primitive types in Clojure throw an exception on overflow, unchecked ones don’t), the performance comes real real close (< 5%) to the equivalent Java code.

What’s this ^:static thing?

Primitive (double and long only: they subsume any other type) args and return values are only allowed on statics (note that the return type hint is put on the arglist so as to allow different hints for different arities). Statics are still fns in vars but they are backed by a static method and, when called by name, a direct static method call is emitted rather than going through the var and the IFn interface — as such static calls replace direct binding.

(my-static 42) ; direct call
(apply my-static 42 nil) ; regular call through the var

About the syntax: ^:keyword is a new reader shorthand for ^{:keyword true} (and keep in mind that in Clojure 1.2 ^ is the new #^), and metadata are merged: ^:static ^long ^{:doc "docstring"} x is now equivalent to ^{:static true :tag long :doc "docstring"} x.

Interesting times... as usual in the Clojure community. Thanks Rich!

Primitive types support for fns

unsorted — cgrand, 4 June 2010 @ 10 h 25 min

Follow up See here.


Update: I was wrong and Rich Hickey set me right: I didn’t measure the gains I’m expecting because the inline-expanded form still go through the var lookup. See here (or the comments) for real gains (around 800ms on my laptop).


This is a quick and dirty hack to emulate primitive types support for globally-bound fns:

(defmacro defhintedfn [name args & body]
  (let [iface (gensym "HintedFn")
        hname (vary-meta name assoc :tag iface)
        vname (vary-meta name assoc
                :inline (fn [& args] `(.invoke ~hname ~@args)))]
    `(do
       (definterface ~iface
         (~(with-meta 'invoke {:tag (:tag name Object)}) ~args))
       (def ~vname nil) ; workaround for a soon-to-be-fixed bug
       (def ~vname
          (reify
            ~iface
              (invoke [this# ~@args]
                ~@body)
            clojure.lang.IFn
              (^Object invoke [this# ~@(map #(vary-meta % dissoc :tag) args)]
                 ~@body))))))

As you can see the trick is to generate a dedicated interface and to inline calls to the defined fn to go through the specific interface rather than through IFn.

Let’s define a dumb numeric fn generating tons of calls: the fibonacci function! Below it comes in three flavors: hinted as int, hinted as long and plain.

(defhintedfn ^int hfib [^int n] (if (>= (int 1) n) (int 1) (+ (hfib (dec n)) (hfib (- n (int 2))))))
(defhintedfn ^long lfib [^long n] (if (>= (long 1) n) (long 1) (+ (lfib (dec n)) (lfib (- n (long 2))))))
(defn fib [n] (if (>= 1 n) 1 (+ (fib (dec n)) (fib (- n 2)))))

And here are the timings:

user=> (time (hfib 38))
"Elapsed time: 2016.128098 msecs"
63245986
user=> (time (lfib 38))
"Elapsed time: 2896.46198 msecs"
63245986
user=> (time (fib 38))
"Elapsed time: 4704.449867 msecs"
63245986

Please note that hinted fns are still regular fns:

user=> (map hfib (range 10))
(1 1 2 3 5 8 13 21 34 55)

Destructuring, records, protocols and named arguments

optimization,pondering — cgrand, 8 May 2010 @ 12 h 19 min

Warning: this post is full of microbenchmarks so take it with a pinch of salt.

Destructuring a record

Field access through keyword-lookups on records is fast:

user=> (defrecord Foo [a b])
user.Foo
user=> (let [x (Foo. 1 2)] (dotimes [_ 5] (time (dotimes [_ 1e7] (let [a (:a x) b (:b x)] [a b])))))
"Elapsed time: 114.787424 msecs"
"Elapsed time: 102.568273 msecs"
"Elapsed time: 71.150593 msecs"
"Elapsed time: 72.217418 msecs"
"Elapsed time: 70.127489 msecs"
nil

But as far as I know destructure still emits (get x :k), let check:

user=> (let [x (Foo. 1 2)] (dotimes [_ 5] (time (dotimes [_ 1e7] (let [{:keys [a b]} x] [a b])))))
"Elapsed time: 968.616612 msecs"
"Elapsed time: 945.704133 msecs"
"Elapsed time: 911.290751 msecs"
"Elapsed time: 927.658125 msecs"
"Elapsed time: 916.796408 msecs"
nil

Actually it’s slightly slower than lookup on small maps:

user=> (let [x {:a 1 :b 2}] (dotimes [_ 5] (time (dotimes [_ 1e7] (let [{:keys [a b]} x] [a b])))))
"Elapsed time: 866.942377 msecs"
"Elapsed time: 746.273968 msecs"
"Elapsed time: 734.366239 msecs"
"Elapsed time: 729.346188 msecs"
"Elapsed time: 746.96393 msecs"
nil

One patch later

I patched destructure to emit keyword-lookups:

user=> (let [x (Foo. 1 2)] (dotimes [_ 5] (time (dotimes [_ 1e7] (let [{:keys [a b]} x] [a b])))))
"Elapsed time: 479.911064 msecs"
"Elapsed time: 488.895167 msecs"
"Elapsed time: 464.617431 msecs"
"Elapsed time: 480.944575 msecs"
"Elapsed time: 464.969779 msecs"
nil

It’s better but still slower than without destructuring. Let see what slows us:

user=> (macroexpand-1 '(let [{:keys [a b]} x] [a b]))
(let* [map__3838 x map__3838 (if (clojure.core/seq? map__3838) (clojure.core/apply clojure.core/hash-map map__3838) map__3838) b (:b map__3838) a (:a map__3838)] [a b])

My bet is on the if+seq? so I remove it:

user=> (let [x (Foo. 1 2)] (dotimes [_ 5] (time (dotimes [_ 1e7] (let* [map__3838 x map__3838 map__3838 b (:b map__3838) a (:a map__3838)] [a b])))))
"Elapsed time: 125.188397 msecs"
"Elapsed time: 103.041099 msecs"
"Elapsed time: 70.061558 msecs"
"Elapsed time: 70.793984 msecs"
"Elapsed time: 69.759146 msecs"
nil

This if+seq? allows for named arguments. I wonder if this behaviour should be an option of map-destructuring (eg {:keys [a b] :named-args true}). Anyway I had in mind that such a dispatch could be optimized with protocols.

Optimizing dispatch with protocols

user=> (defprotocol Seq (my-seq [this]) (my-seq? [this]))
Seq
user=> (extend-protocol Seq
clojure.lang.ISeq
(my-seq [this] (.seq this))
(my-seq? [this] true)
Object
(my-seq [this] (clojure.lang.RT/seq this))
(my-seq? [this] false))
nil

Let see how my-seq? compares to seq?

user=> (let [x (Foo. 1 2)] (dotimes [_ 5] (time (dotimes [_ 1e7] (let* [map__3838 x map__3838 (if (my-seq? map__3838) (clojure.core/apply clojure.core/hash-map map__3838) map__3838) b (:b map__3838) a (:a map__3838)] [a b])))))
"Elapsed time: 179.282982 msecs"
"Elapsed time: 161.781526 msecs"
"Elapsed time: 154.307042 msecs"
"Elapsed time: 155.567677 msecs"
"Elapsed time: 153.716604 msecs"
nil

Hence my-seq? is 3x faster than seq? which means that protocols are indeed speedy: yet another incredible piece of work by Rich Hickey!

I’m still exploring how low-level protocols fns behave in a concurrent setting, stay tuned!

Meanwhile don’t forget to register to the next European Clojure training session taught by Lau and me.

European Clojure Training, Brussels, late June

unsorted — cgrand, 14 April 2010 @ 12 h 29 min

Following a recent tweet, I’m setting up a Clojure training session with Lau B. Jensen and we’d like to know more about our potential attendees. Please fill this short survey:


(click here if you don’t see the above form.)

Next Page »
(c) 2012 Clojure and me | powered by WordPress with Barecity