Transducers are powerful and easy to grasp when they claim they transform reducing functions. However once you scratch their surface you quickly realize that’s not their true nature: they transform stateful processes.
In a previous post, I explained why seeded transduce forces transducers to return stateful reducing functions. However this can be fixed. The current implementation of transduce
reads:
(defn transduce
"reduce with a transformation of f (xf). If init is not
supplied, (f) will be called to produce it. f should be a reducing
step function that accepts both 1 and 2 arguments, if it accepts
only 2 you can add the arity-1 with 'completing'. Returns the result
of applying (the transformed) xf to init and the first item in coll,
then applying xf to that result and the 2nd item, etc. If coll
contains no items, returns init and f is not called. Note that
certain transforms may inject or skip items." {:added "1.7"}
([xform f coll] (transduce xform f (f) coll))
([xform f init coll]
(let [f (xform f)
ret (clojure.core.protocols/coll-reduce coll f init)]
(f ret))))
To fix it you have to make the seeded case the special case and not the normal case:
(defn fseed [f init]
(fn
([] init)
([x] (f x))
([x y] (f x y))))
(defn transduce
"reduce with a transformation of f (xf). If init is not
supplied, (f) will be called to produce it. f should be a reducing
step function that accepts both 1 and 2 arguments, if it accepts
only 2 you can add the arity-1 with 'completing'. Returns the result
of applying (the transformed) xf to init and the first item in coll,
then applying xf to that result and the 2nd item, etc. If coll
contains no items, returns init and f is not called. Note that
certain transforms may inject or skip items." {:added "1.7"}
([xform f init coll] (transduce xform (fseed f init) coll))
([xform f coll]
(let [f (xform f)
ret (clojure.core.protocols/coll-reduce coll f (f))]
(f ret))))
By making the seeded reduce the special case, the init value can be wrapped in a composite init value by transducer-returned reducing functions. Now writing, for example, a pure take
is possible.
This modification fixes the seeded reduce case but it requires allocating intermediate objects at each step which goes against the promise that transducers alleviate allocative pressure (promise inherited from reducers). So it’s fixable but for performance reasons transducer-returned reducing functions remain stateful.
Once you go stateful…
Once you assume transducer-returned reducing functions are stateful, you realize they have some cruft from their functional origins (this was discussed at the last Paris Clojure Meetup (fr))):
- 0-arity always delegate to the 0-arity of the downstream reducing function,
- in any arity, the accumulator must be used in a linear manner with the only allowed operation being the 2-arity of the downstream function.
- don’t forget to check for
reduced
return values!
So the accumulator values are passed around but never used except by the most downstream reducing function: the one that was passed to transduce
by the user! Why not, then, encapsulates the accumulator state in a process?
A process in this model has only two operations: process one input and complete which could be respectively mapped to 1-arity and 0-arity of a function:
(fn
([] ...) ; completes the process, return value is unspecified
([x] ...)) ; processes x, returns true when no more input should be fed in.
So now transduce
could be written as:
(defn transduce [xform f init coll]
(let [vacc (volatile! init)
p (fn
([])
([x] (reduced? (vswap! vacc f x))))
p (xform p)]
(feed! p coll) ; transducers are now process -> process
(p)
(f (let [acc @vacc] (if (reduced? acc) @acc acc)))))
Where feed!
is the iteration primitive – for exposition it can be implemented using reduce
but it’s backwards: reduce should be implemented on top of feed!
.
(defn feed!
"Returns true if the process can't process more input."
[p coll]
(reduce #(and (p %2) (reduced true)) false coll))
Now we can try to reimplement some transducers as process-transforming functions:
(defn map [f]
(fn [p1]
(fn
([] (p1))
([x] (p1 (f x))))))
(defn filter [pred]
(fn [p1]
(fn
([] (p1))
([x] (and (pred x) (p1 x))))))
(defn take [n]
(fn [p1]
(let [vn (volatile! (dec n))]
(fn
([] (p1))
([x] (or (neg? @vn) (p1 x) (neg? (vswap! vn dec))))))))
(defn partition-by [f]
(fn [p]
(let [a (java.util.ArrayList.)
pv (volatile! ::none)]
(fn
([]
(when-not (.isEmpty a)
(let [v (vec (.toArray a))]
;;clear first!
(.clear a)
(p v)))
(p))
([input]
(let [pval @pv
val (f input)]
(vreset! pv val)
(if (or (identical? pval ::none)
(= val pval))
(do (.add a input) false) ; .add returns true
(let [v (vec (.toArray a))]
(.clear a)
(or (p v)
(do (.add a input) false))))))))))
Conclusion
To me the main advantage of transducers as process transformers is that their interface is a bit less complected; especially by not having to deal with reduced wrappers – because of that it may be easier to have them support primitive types.
The process model also seems to be less of a mismatch when reasoning about transducers in other contexts (e.g. sequences, channels).
A better name
There has been much discussion on how to best type them but not that much about how to name them in a more patterned way. What about BuilderDecoratorFactories?