These aren’t the reducing functions you are looking for

unsorted — cgrand, 8 October 2014 @ 1 h 12 min

Transducers are powerful and easy to grasp when they claim they transform reducing functions. However once you scratch their surface you quickly realize that’s not their true nature: they transform stateful processes.

In a previous post, I explained why seeded transduce forces transducers to return stateful reducing functions. However this can be fixed. The current implementation of transduce reads:

(defn transduce
  "reduce with a transformation of f (xf). If init is not
  supplied, (f) will be called to produce it. f should be a reducing
  step function that accepts both 1 and 2 arguments, if it accepts
  only 2 you can add the arity-1 with 'completing'. Returns the result
  of applying (the transformed) xf to init and the first item in coll,
  then applying xf to that result and the 2nd item, etc. If coll
  contains no items, returns init and f is not called. Note that
  certain transforms may inject or skip items."  {:added "1.7"}
  ([xform f coll] (transduce xform f (f) coll))
  ([xform f init coll]
     (let [f (xform f)
           ret (clojure.core.protocols/coll-reduce coll f init)]
       (f ret))))

To fix it you have to make the seeded case the special case and not the normal case:

(defn fseed [f init]
    ([] init)
    ([x] (f x))
    ([x y] (f x y))))

(defn transduce
  "reduce with a transformation of f (xf). If init is not
  supplied, (f) will be called to produce it. f should be a reducing
  step function that accepts both 1 and 2 arguments, if it accepts
  only 2 you can add the arity-1 with 'completing'. Returns the result
  of applying (the transformed) xf to init and the first item in coll,
  then applying xf to that result and the 2nd item, etc. If coll
  contains no items, returns init and f is not called. Note that
  certain transforms may inject or skip items."  {:added "1.7"}
  ([xform f init coll] (transduce xform (fseed f init) coll))
  ([xform f coll]
     (let [f (xform f)
           ret (clojure.core.protocols/coll-reduce coll f (f))]
       (f ret))))

By making the seeded reduce the special case, the init value can be wrapped in a composite init value by transducer-returned reducing functions. Now writing, for example, a pure take is possible.

This modification fixes the seeded reduce case but it requires allocating intermediate objects at each step which goes against the promise that transducers alleviate allocative pressure (promise inherited from reducers). So it’s fixable but for performance reasons transducer-returned reducing functions remain stateful.

Once you go stateful…

Once you assume transducer-returned reducing functions are stateful, you realize they have some cruft from their functional origins (this was discussed at the last Paris Clojure Meetup (fr))):

  • 0-arity always delegate to the 0-arity of the downstream reducing function,
  • in any arity, the accumulator must be used in a linear manner with the only allowed operation being the 2-arity of the downstream function.
  • don’t forget to check for reduced return values!

So the accumulator values are passed around but never used except by the most downstream reducing function: the one that was passed to transduce by the user! Why not, then, encapsulates the accumulator state in a process?

A process in this model has only two operations: process one input and complete which could be respectively mapped to 1-arity and 0-arity of a function:

  ([] ...) ; completes the process, return value is unspecified
  ([x] ...)) ; processes x, returns true when no more input should be fed in.

So now transduce could be written as:

(defn transduce [xform f init coll]
  (let [vacc (volatile! init)
        p (fn 
            ([x] (reduced? (vswap! vacc f x))))
        p (xform p)]
    (feed! p coll) ; transducers are now process -> process
    (f (let [acc @vacc] (if (reduced? acc) @acc acc)))))

Where feed! is the iteration primitive – for exposition it can be implemented using reduce but it’s backwards: reduce should be implemented on top of feed!.

(defn feed!
 "Returns true if the process can't process more input."
 [p coll]
  (reduce #(and (p %2) (reduced true)) false coll))

Now we can try to reimplement some transducers as process-transforming functions:

(defn map [f]
  (fn [p1]
      ([] (p1))
      ([x] (p1 (f x))))))

(defn filter [pred]
  (fn [p1]
      ([] (p1))
      ([x] (and (pred x) (p1 x))))))

(defn take [n]
  (fn [p1]
    (let [vn (volatile! (dec n))]   
        ([] (p1))
        ([x] (or (neg? @vn) (p1 x) (neg? (vswap! vn dec))))))))

(defn partition-by [f]
  (fn [p]
    (let [a (java.util.ArrayList.)
          pv (volatile! ::none)]
          (when-not (.isEmpty a)
            (let [v (vec (.toArray a))]
              ;;clear first!
              (.clear a)
              (p v)))
          (let [pval @pv
                val (f input)]
            (vreset! pv val)
            (if (or (identical? pval ::none)
                  (= val pval))
              (do (.add a input) false) ; .add returns true
              (let [v (vec (.toArray a))]
                (.clear a)
                (or (p v)
                  (do (.add a input) false))))))))))


To me the main advantage of transducers as process transformers is that their interface is a bit less complected; especially by not having to deal with reduced wrappers – because of that it may be easier to have them support primitive types.

The process model also seems to be less of a mismatch when reasoning about transducers in other contexts (e.g. sequences, channels).

A better name

There has been much discussion on how to best type them but not that much about how to name them in a more patterned way. What about BuilderDecoratorFactories?

The Rules of Transducer Club

unsorted — cgrand, 11 September 2014 @ 15 h 46 min

First rule: you don’t call a transducer.
Second rule: only composition is allowed.
Third rule: better stateful than pure.

Here are the rationale behind each rule:

First rule: you don’t call a transducer.

Transducers may be so-called “stateful” that is they create stateful reducing functions. As a consequence you should not hold onto them for too long (otherwise state goes sour…). And there’s no betetr way to not hold them for too long that to never get a hold on them at all!

That’s why transduce, sequence, iteration and chan all take a transducer to avoid you the perils of having to call it.

Second rule: only composition is allowed.

It’s a direct consequence of the first rule: you should only compose transducers.

Third rule: better stateful than pure.

This one is a bit more subtle and caused exclusively by transduce.

A reducing fn has now threes arities: [] -> acc, [acc x] -> acc and [acc] -> result.

When one wants to pass state from one step to the next there are two options: be pure and pass it in the accumulator (and you’ll use the 1-arity to clean up) or be stateful.

It turns out that the pure option is a bad one, for two reasons. First reason, it’s going to increase object churn. Second reason, it’s going to change the type of the accumulator and this is a problem because:

  • in transduce an init value may be specified and this init value must be of the type of the accumulator,
  • we don’t have a way to map from the result domain to the accumulator domain.

So it means the user has to be aware of an implementation detail of your transducer (the way you smuggle state in the accumulator) to craft a proper init value. It’s an abstraction leak, it’s bad. Be stateful.

Optimal justification of text with Dynamic Programming

unsorted — cgrand, 29 August 2014 @ 11 h 46 min

This is an exercise in dynamic programming I found on /ftp.

The goal is to tightly arrange a given sequence of n words within page margins, maximizing overall neatness. To be more precise, we wish to minimize the sum, over all lines except the last, of the cubes of the number of blank characters at the end of each line. See the comments in the code for more details.

The algorithm has O(n^2) time and space complexities.

Below is my take on it and I believe the complexity to be O(n*width). I achieve linearity in the words count by laying out the text back to front: the key insight is that the optimal layout of some words only depends on the amount of space left on the current line, it does not depend on the layout of the words before them.

(defn layout [words width]
  (let [cat (fn [word {u :ugliness [l & ls] :lines}] ; adds the word to the start of the first line
              {:ugliness u :lines (cons (cons word l) ls)})
        br (fn [rem {u :ugliness ls :lines}] ; adds a break (creates a new line)
              {:ugliness (+ u (* rem rem rem)) :lines (cons () ls)})
        layout ; layout words, knowing that the first line shouldn't have more than rem characters
          (fn [layout words rem]
            (if-let [[word & ws] (seq words)]
                (= rem width) ; blank line ahead
                  (cat word (layout ws (- rem (count word))))
                (< (count word) rem) ; enough room for a space and the current word
                  (cat word
                    (min-key :ugliness
                      (layout ws (- rem (count word) 1))
                      (br rem (layout ws width))))
                :else (br rem (layout words width)))
              {:ugliness 0 :lines (list ())}))
        mlayout (memoize layout)
        layout (fn self [words rem] (mlayout self words rem))]
    (:lines (layout words width))))

This is an exercise I prepared for a Lambda Next workshop but that we never used.

Macros, closures and unexpected object retention

unsorted — cgrand, 11 September 2013 @ 12 h 37 min

The common advice about macros is that they should emit as little code as possible and delegate to ancillary functions as soon as possible. Here is an example from

(defmacro transaction
  [& body]
  `(transaction* (fn [] ~@body)))

I still think this is good advice but it has unintended consequences. The problem with this piece of code is that all closed-over objects in body are going to be retained longer than expected, longer than they would have been retained if the macro had emitted all the logic implemented in transaction* instead of delegating to it. (See this discussion as an example of issues created by such code.)

The closure object has references to all closed-over objects and since a closure can be called many times, it can’t get rid of them. So the only time where they are going to be garbage collectible is once the closure itself becomes collectible… and a closure can’t be collected while it’s executing.

Helpfully there’s a low-level feature for that:

(defmacro transaction
  [& body]
  `(transaction* (^:once fn* [] ~@body)))

It instructs the compiler that the closure is one-shot and that closed-over references should be cleared, thus allowing referenced objects to be garbage collected before the closure returns.

This problem is not specific to macros but can easily be solved in most cases: the closure is an implementation detail and the macro writer knows enough about its life-cycle to fix it. However any regular closure (fn or reify) is going to prevent closed-overs to be garbage-collected while one of its (java) methods is running because the closure is referenced by the stack.

During the last LambdaNext workshop a delegate stumbled on such a case while experimenting with reducers (and incidentally it made me understand a memory issue I worked around last year):

=> (time (reduce + 0 (map identity (range 1e8))))
"Elapsed time: 5729.579 msecs"
=> (time (reduce + 0 (r/map identity (range 1e8))))
;; Interrupting...
Expression was interrupted: null

(Depending on your memory settings, you may have to tweak the length of the range to exhibit the problem; more details here)

If one modifies reducers to not use (java) methods but external extensions:

(in-ns 'clojure.core.reducers)

(defrecord Folder [coll xf])

(defn folder
  "Given a foldable collection, and a transformation function xf,
  returns a foldable collection, where any supplied reducing
  fn will be transformed by xf. xf is a function of reducing fn to
  reducing fn."
  {:added "1.5"}
  ([coll xf]
     (Folder. coll xf)))

(extend-type Folder
      (coll-reduce [fldr f1]
                   (clojure.core.protocols/coll-reduce (:coll fldr) ((:xf fldr) f1) (f1)))
      (coll-reduce [fldr f1 init]
                   (clojure.core.protocols/coll-reduce (:coll fldr) ((:xf fldr) f1) init))

      (coll-fold [fldr n combinef reducef]
                 (coll-fold (:coll fldr) n combinef ((:xf fldr) reducef))))

Then the problem disappears:

(in-ns 'user)
=> (time (reduce + 0 (r/map identity (range 1e8))))
"Elapsed time: 4437.012 msecs"

This is because the protocol methods is not a java method of the reducer object anymore and thus it can be reclaimed while the (protocol) method is executing.

So next time you have a memory issue, look for closures tying the life-cycle of their closed-overs to theirs!

Tarjan’s strongly connected components algorithm

unsorted — cgrand, 18 March 2013 @ 19 h 48 min

I dislike algorithms that are full of indices and mutations. Not because they are bad but because I always have the feeling that the core idea is buried. As such, Tarjan’s SCC algorithm irked me.

So I took the traditional algorithm, implemented it in Clojure with explicit environment passing, then I replaced indices by explicit stacks (thanks to persistence!) and after some tweaks, I realized that I’ve gone full circle and could switch to stacks lengths instead of stacks themselves and get rid of the loop. However the whole process made the code cleaner to my eye. You can look at the whole history.

Here is the resulting code:

(defn tarjan 
  "Returns the strongly connected components of a graph specified by its nodes
   and a successor function succs from node to nodes.
   The used algorithm is Tarjan's one."
  [nodes succs]
  (letfn [(sc [env node]
            ; env is a map from nodes to stack length or nil,
            ; nil means the node is known to belong to another SCC
            ; there are two special keys: ::stack for the current stack 
            ; and ::sccs for the current set of SCCs
            (if (contains? env node)
              (let [stack (::stack env)
                    n (count stack)
                    env (assoc env node n ::stack (conj stack node))
                    env (reduce (fn [env succ]
                                  (let [env (sc env succ)]
                                    (assoc env node (min (or (env succ) n) (env node)))))
                          env (succs node))]
                (if (= n (env node)) ; no link below us in the stack, call it a SCC
                  (let [nodes (::stack env)
                        scc (set (take (- (count nodes) n) nodes))
                        ; clear all stack lengths for these nodes since this SCC is done
                        env (reduce #(assoc %1 %2 nil) env scc)]
                    (assoc env ::stack stack ::sccs (conj (::sccs env) scc)))
    (::sccs (reduce sc {::stack () ::sccs #{}} nodes))))

As always, if you need some short-term help with Clojure (code review, consulting, training etc.), contact me!

From lazy seqs to reducers and back

unsorted — cgrand, 11 February 2013 @ 17 h 27 min

Sometimes in a seq pipeline, you know that some intermediate results are, well, intermediate and as such don’t need to be persistent but, on the whole, you still need the laziness.

You can’t always opt for reducers because while being non-persistent (and thus faster), they imply getting rid of laziness.

So you can’t mix and match transformations on sequences and on reducers when you care for laziness. Or, wait!, maybe you can.

At core, reducers are just a clever way to compose big functions while giving the user the illusion of manipulating collections. So we should be able to apply a reducer pipeline if we get hold of the composed function. This is exactly what the below seq-seq function does:

(defn reverse-conses 
  ([s tail] 
    (if (identical? (rest s) tail)
      (reverse-conses s tail tail)))
  ([s from-tail to-tail]
    (loop [f s b to-tail]
      (if (identical? f from-tail)
        (recur (rest f) (cons (first f) b))))))

(defn seq-seq [f s]
  (let [f1 (reduce #(cons %2 %1) nil 
              (f (reify clojure.core.protocols.CollReduce
                   (coll-reduce [this f1 init]
    ((fn this [s]
         (when-let [s (seq s)]
           (let [more (this (rest s)) 
                 x (f1 more (first s))]
             (if (reduced? x)
               (reverse-conses @x more nil)
               (reverse-conses x more)))))) s)))

(defmacro seq->> [s & forms]
  `(seq-seq (fn [n#] (->> n# ~@forms)) ~s))

Note that the captured function (f1) may be impure, so don’t share it!

Now one can specifies non-persistent lazy seq pipelines:

=> (seq->> (range) (r/map str) (r/take 25) (r/drop 5))
("5" "6" "7" "8" "9" "10" "11" "12" "13" "14" "15" "16" 
 "17" "18" "19" "20" "21" "22" "23" "24")

To prove laziness is at play here:

=> (take 2 (seq->> (range) (r/map #(str (doto % prn))) (r/take 25) (r/drop 5)))
"5" "6")

Of course seq-seq could be made to support chunked sequences and, if you try to play with it, beware of r/mapcat.


I added reverse-conses to account for the fact that when several items are added during a single call to f1 (eg during a r/mapcat “step”) they were added in the wrong order – if only everything was more set-like.

Clojure unconference, somewhere in France, early Spring.

unsorted — cgrand, 21 December 2012 @ 14 h 25 min

Some people are willing to set up a Clojure unconference in France in early Spring. Follow the clojure-fr mailing list for further discussion.

Extensible macros (A flatter cond follow-up)

unsorted — cgrand, 21 September 2012 @ 8 h 51 min

I pushed Utils out. It features reduce-by but also variations on cond (the ones introduced in the comments of A Flatter Cond) and if-let.

cond supports the following operators: :let, :when, :when-let and vectors in test position are treated as the binding form to a if-let (thus setting the local bindings) for the “then” expression not for the rest of the cond.

if-let supports :let to introduce new bindings without testing their expression for truth.

One notable twist to these macros is that thanks to net.cgrand.xmacros they are extensible: new operators can safely be participated to the macros (proper namespacing is enforced). See the cond implementation, abbreviated below:

(defmacro cond
 "An extensible variation on cond. Trailing else is supported.
keywords in test position are treated as operators.

Operators impls signature is [rhs else] where rhs is the \"then\"
expression next to the operator."
  [& clauses]
  (when-let [[op rhs & more-clauses] (seq clauses)]
    (if (next clauses)
      (x/expand-op op rhs `(cond ~@more-clauses))

(x/defdefault-op cond [test-expr then else]
  (if (vector? test-expr)
    `(if-let ~test-expr ~then ~else)
    `(if ~test-expr ~then ~else)))

(x/defop cond :let
  "Introduces local bindings."
  [bindings cont]
  `(let ~bindings ~cont))

(x/defop cond :when
  "Short-circuits the rest of the cond if false"
  [test-expr cont]
  `(when ~test-expr ~cont))

(x/defop cond :when-let
 "Short-circuits the rest of the cond if false and introduces local
  [bindings cont]
  `(when-let ~bindings ~cont))

Follow-up: A world in a ref

unsorted — cgrand, @ 8 h 35 min

I recently published Megaref a refined (and working) version of the idea described in A World in a Ref.

New and notesworthy:

  • actual working code,
  • the concurrency (number of guards) can be tuned at runtime,
  • tuning happens through an uniform interface (get-options, set-option!),
  • subrefs (and backward compatible replacements for usual STM functions) allow to easily migrate “polyref” code to megaref (see the conversion of Rich Hickey’s ant demo — mind the typo in the commit message).

Feedback welcome!

Fair conjunction: status report

unsorted — cgrand, 6 April 2012 @ 15 h 54 min
Shameless plugs:
Clojure Programming is out! Available now as an ebook and in a few days in paper!
Come to my Clojure training in Geneva it will be in a mixed language (French/English) setting — for an English-only training stay tuned!

Making conjunction fair is the reason why I messed up with core.logic, and this post document the approach I explore in the fair-conj2 branch.

In The Reasoned Scheduler I said that conjunction is easy: you wait for the first stream to provide a answer that you pass to the next goal. It looks easy but it suffers from being ordered: when the first goal diverges (produces an infinite stream of results or never halts) and the next goal is known to fail, the conjuction diverges too! This is unfair conjunction.

Changing the order of the goals would fix this particular example but like noted by William Byrd in Relational Programming in miniKanren: Techniques, Applications, and Implementations (page 53 and 54):

However, reordering goals has its disadvantages. For many programs, no ordering of goals will result in finite failure (see the remaining example in this chapter). Also, by committing to a certain goal ordering we are giving up on the declarative nature of relational programming: we are specifying how the program computes, rather than only what it computes. For these reasons, we should consider alternative solutions.

[T]he goal ordering we picked might diverge when we ask for a second answer, while the other ordering may fail finitely after producing a single answer.

So, to make conjunction fair (commutative), one has to not commit to an ordering of goal, which means that both goals must be executed in parallel (interleaved execution or even true parallelism). As soon as one of the goal produces an answer, this substitution has to be communicated to the other threadgoal so as to narrow its search space.

As a consequence of the disappearance of ordering, the distinction between a goal and a stream disappears too. I introduced a Search abstraction. A Search may yield a substitution (or nil)
and progress with step. It sounds awfully familiar with first and next.

Let’s name plus, join and narrow respectively the disjunction, conjunction and narrowing operators. Conjunction is distributive over disjunction as usual and narrowing is distributive over both.

Here is how the key expression is derived:

(join a b)
(join (plus (yield a) (step a)) b) ; decompose a 
(plus (join (yield a) b) (join (step a) b)) ; distribute join
(plus (narrow b (yield a)) (join (step a) b)) ; joining with a substitution (or nil) is narrowing
(plus (narrow b (yield a)) (join b (step a))) ; commute the operands of join for interleaving

This expression can be found nearly as-is in the code.

Narrowing is performed lazily: when one steps a suspended narrowing (a Narrow instance), the narrowing is propagated (thanks to its distributivity) to the children searches of its argument.

To make narrowing efficient, I introduced the min-yield function which, given a Search, returns the minimal substitution yielded by this search. That is: if this Search ever yield results they will be extensions of the min-yield substitution. This allows to discard a Search in its entirety when the narrowing substitution and the min-yield substitutions are not compatible!

To compute min-yield substitutions, the narrowing process leaves nodes (Scope instances) in the computation tree labeled with the narrowing substitution. By default min-yield returns empty-s the empty substitution.

This is the gist of my approach to fair conjunction. In a subsequent post, I’ll cover heuristics I use to prioritize execution of Searches.

Practically, what does it mean? It means that logic programs are easier to write, it also means that, at the moment, a program written with unfair conjunction in mind run slower with fair conjunction. Interestingly, under fair conjunction, naive pluso (p. 63) seems refutationally complete.

I intend to explore how constraints can be added to this branch and how it will impact (positively) performance.

If you feel like dabbling with this branch, I’ll be more than happy to hear your feedback. Thanks!

Next Page »
(c) 2018 Clojure and me | powered by WordPress with Barecity