These aren’t the reducing functions you are looking for

unsorted — cgrand, 8 October 2014 @ 1 h 12 min

Transducers are powerful and easy to grasp when they claim they transform reducing functions. However once you scratch their surface you quickly realize that’s not their true nature: they transform stateful processes.

In a previous post, I explained why seeded transduce forces transducers to return stateful reducing functions. However this can be fixed. The current implementation of transduce reads:

(defn transduce
  "reduce with a transformation of f (xf). If init is not
  supplied, (f) will be called to produce it. f should be a reducing
  step function that accepts both 1 and 2 arguments, if it accepts
  only 2 you can add the arity-1 with 'completing'. Returns the result
  of applying (the transformed) xf to init and the first item in coll,
  then applying xf to that result and the 2nd item, etc. If coll
  contains no items, returns init and f is not called. Note that
  certain transforms may inject or skip items."  {:added "1.7"}
  ([xform f coll] (transduce xform f (f) coll))
  ([xform f init coll]
     (let [f (xform f)
           ret (clojure.core.protocols/coll-reduce coll f init)]
       (f ret))))

To fix it you have to make the seeded case the special case and not the normal case:

(defn fseed [f init]
  (fn
    ([] init)
    ([x] (f x))
    ([x y] (f x y))))

(defn transduce
  "reduce with a transformation of f (xf). If init is not
  supplied, (f) will be called to produce it. f should be a reducing
  step function that accepts both 1 and 2 arguments, if it accepts
  only 2 you can add the arity-1 with 'completing'. Returns the result
  of applying (the transformed) xf to init and the first item in coll,
  then applying xf to that result and the 2nd item, etc. If coll
  contains no items, returns init and f is not called. Note that
  certain transforms may inject or skip items."  {:added "1.7"}
  ([xform f init coll] (transduce xform (fseed f init) coll))
  ([xform f coll]
     (let [f (xform f)
           ret (clojure.core.protocols/coll-reduce coll f (f))]
       (f ret))))

By making the seeded reduce the special case, the init value can be wrapped in a composite init value by transducer-returned reducing functions. Now writing, for example, a pure take is possible.

This modification fixes the seeded reduce case but it requires allocating intermediate objects at each step which goes against the promise that transducers alleviate allocative pressure (promise inherited from reducers). So it’s fixable but for performance reasons transducer-returned reducing functions remain stateful.

Once you go stateful…

Once you assume transducer-returned reducing functions are stateful, you realize they have some cruft from their functional origins (this was discussed at the last Paris Clojure Meetup (fr))):

  • 0-arity always delegate to the 0-arity of the downstream reducing function,
  • in any arity, the accumulator must be used in a linear manner with the only allowed operation being the 2-arity of the downstream function.
  • don’t forget to check for reduced return values!

So the accumulator values are passed around but never used except by the most downstream reducing function: the one that was passed to transduce by the user! Why not, then, encapsulates the accumulator state in a process?

A process in this model has only two operations: process one input and complete which could be respectively mapped to 1-arity and 0-arity of a function:

(fn
  ([] ...) ; completes the process, return value is unspecified
  ([x] ...)) ; processes x, returns true when no more input should be fed in.

So now transduce could be written as:

(defn transduce [xform f init coll]
  (let [vacc (volatile! init)
        p (fn 
            ([])
            ([x] (reduced? (vswap! vacc f x))))
        p (xform p)]
    (feed! p coll) ; transducers are now process -> process
    (p)
    (f (let [acc @vacc] (if (reduced? acc) @acc acc)))))

Where feed! is the iteration primitive – for exposition it can be implemented using reduce but it’s backwards: reduce should be implemented on top of feed!.

(defn feed!
 "Returns true if the process can't process more input."
 [p coll]
  (reduce #(and (p %2) (reduced true)) false coll))

Now we can try to reimplement some transducers as process-transforming functions:

(defn map [f]
  (fn [p1]
    (fn
      ([] (p1))
      ([x] (p1 (f x))))))

(defn filter [pred]
  (fn [p1]
    (fn
      ([] (p1))
      ([x] (and (pred x) (p1 x))))))

(defn take [n]
  (fn [p1]
    (let [vn (volatile! (dec n))]   
      (fn
        ([] (p1))
        ([x] (or (neg? @vn) (p1 x) (neg? (vswap! vn dec))))))))

(defn partition-by [f]
  (fn [p]
    (let [a (java.util.ArrayList.)
          pv (volatile! ::none)]
      (fn
        ([]
          (when-not (.isEmpty a)
            (let [v (vec (.toArray a))]
              ;;clear first!
              (.clear a)
              (p v)))
          (p))
        ([input]
          (let [pval @pv
                val (f input)]
            (vreset! pv val)
            (if (or (identical? pval ::none)
                  (= val pval))
              (do (.add a input) false) ; .add returns true
              (let [v (vec (.toArray a))]
                (.clear a)
                (or (p v)
                  (do (.add a input) false))))))))))

Conclusion

To me the main advantage of transducers as process transformers is that their interface is a bit less complected; especially by not having to deal with reduced wrappers – because of that it may be easier to have them support primitive types.

The process model also seems to be less of a mismatch when reasoning about transducers in other contexts (e.g. sequences, channels).

A better name

There has been much discussion on how to best type them but not that much about how to name them in a more patterned way. What about BuilderDecoratorFactories?

(c) 2024 Clojure and me | powered by WordPress with Barecity