Decaying lists: log scale for lists

pondering — cgrand, 12 February 2013 @ 13 h 44 min

The concept of exponentially decaying lists by David Barbour piqued my interest and I implemented it. It can be summed up as: log scale for lists!

As for log scale, decaying lists allow to manage large range of values and thus to get a better understanding of the data.

So a decaying list grows logarithmically with the number of conjed items. It follows that some items are dropped when others are inserted. Let’s see:

=> (seq (into (decaying-list) (range 100)))
(99 98 97 96 95 94 92 91 90 89 88 87 86 84 83 80 70 65 61 59 57 49 44 30 29 27 25 23 22 18 12 4 0)

So most of the recent items are there but the older elements have a greater probability to have been dropped.

A decaying list is parametrized by its half-life. The default half-life is 20, it means that an item in the list has one chance out of two to “survive” the insertion of 20 other items.

=> (seq (into (decaying-list 5) (range 100))) ; a half-life of 5 conjs
(99 98 97 96 95 94 93 91 83 81 79 78 68 61 57 54 52 31 15)

You can know where the next decay (if any: nil is returned when no decay) is going to happen:

=> (def dl (into (decaying-list 5) (range 100)))
#'net.cgrand.decay/dl
=> (decay-loc dl)
[(93 95 97 98 99) (92 91 89 87 85 81 77 72 71 63 54 52 46 31 20 16)]

So here the decay is going to happen between 93 and 92 (the first items of the “left” and “right” sequences). By default the most recent one is kept.

=> (seq (conj dl 100))
(100 99 98 97 95 93 91 89 87 85 81 77 72 71 63 54 52 46 31 20 16)

Indeed 92 has been dropped. (Repeated calls to decay-loc always yield the same result.)

However we may elect to synthetize a new value rather than just keep one of the two candidates to decay. For example we can compute their average:

=> (seq (into (decaying-list 5 
                :collapse (fn [dl]
                            (let [[[l] [r]] (decay-loc dl)] 
                              (/ (+ l r) 2.0))))
          (range 100)))
(99 98 97 95.5 93.5 91.5 90 89 87.125 82.75 79.5 72.875 64.8125 60.875 58 50.732421875 27.0 17.75 11.25 2.8125)

Or something a bit more elaborate (and precise):

=> (defn stats [x]
     (if (map? x)
       x
       {:min x :max x :avg x :n 1}))
#'net.cgrand.decay/stats
=> (defn merge-stats [{mina :min maxa :max avga :avg na :n}
                      {minb :min maxb :max avgb :avg nb :n}]
     {:min (min mina minb)
      :max (max maxa maxb)
      :avg (/ (+ (* na avga) (* nb avgb)) (double (+ na nb)))
      :n (+ na nb)})
#'net.cgrand.decay/merge-stats
=> (seq (into (decaying-list 5 
                :collapse (fn [dl]
                            (let [[[l] [r]] (decay-loc dl)] 
                              (merge-stats (stats l) (stats r)))))
          (range 100)))
(99 98 97 {:min 92, :max 96, :avg 94.0, :n 5} 91 {:min 87, :max 90, :avg 88.5, :n 4} 86 85 84 83 {:min 80, :max 82, :avg 81.0, :n 3} {:min 78, :max 79, :avg 78.5, :n 2} 77 {:min 70, :max 76, :avg 73.0, :n 7} 69 68 {:min 49, :max 67, :avg 58.0, :n 19} {:min 41, :max 48, :avg 44.5, :n 8} {:min 36, :max 40, :avg 38.0, :n 5} 35 {:min 32, :max 34, :avg 33.0, :n 3} {:min 10, :max 31, :avg 20.5, :n 22} {:min 3, :max 9, :avg 6.0, :n 7} {:min 0, :max 2, :avg 1.0, :n 3})

Decaying lists can also maintain a state – that is updated after each conj or decay. Below the implementation, there is an example of state management.

You can also cap the length of the decaying list by using the :capacity option. A quick note on capacity: increasing the capacity by the half-life value (eg going from 1000 to 1050 when half-life is 50), doubles the range the decaying list remembers.

From lazy seqs to reducers and back

unsorted — cgrand, 11 February 2013 @ 17 h 27 min

Sometimes in a seq pipeline, you know that some intermediate results are, well, intermediate and as such don’t need to be persistent but, on the whole, you still need the laziness.

You can’t always opt for reducers because while being non-persistent (and thus faster), they imply getting rid of laziness.

So you can’t mix and match transformations on sequences and on reducers when you care for laziness. Or, wait!, maybe you can.

At core, reducers are just a clever way to compose big functions while giving the user the illusion of manipulating collections. So we should be able to apply a reducer pipeline if we get hold of the composed function. This is exactly what the below seq-seq function does:

(defn reverse-conses 
  ([s tail] 
    (if (identical? (rest s) tail)
      s
      (reverse-conses s tail tail)))
  ([s from-tail to-tail]
    (loop [f s b to-tail]
      (if (identical? f from-tail)
        b
        (recur (rest f) (cons (first f) b))))))

(defn seq-seq [f s]
  (let [f1 (reduce #(cons %2 %1) nil 
              (f (reify clojure.core.protocols.CollReduce
                   (coll-reduce [this f1 init]
                     f1))))]
    ((fn this [s]
       (lazy-seq 
         (when-let [s (seq s)]
           (let [more (this (rest s)) 
                 x (f1 more (first s))]
             (if (reduced? x)
               (reverse-conses @x more nil)
               (reverse-conses x more)))))) s)))

(defmacro seq->> [s & forms]
  `(seq-seq (fn [n#] (->> n# ~@forms)) ~s))

Note that the captured function (f1) may be impure, so don’t share it!

Now one can specifies non-persistent lazy seq pipelines:

=> (seq->> (range) (r/map str) (r/take 25) (r/drop 5))
("5" "6" "7" "8" "9" "10" "11" "12" "13" "14" "15" "16" 
 "17" "18" "19" "20" "21" "22" "23" "24")

To prove laziness is at play here:

=> (take 2 (seq->> (range) (r/map #(str (doto % prn))) (r/take 25) (r/drop 5)))
(0
1
2
3
4
5
6
"5" "6")

Of course seq-seq could be made to support chunked sequences and, if you try to play with it, beware of r/mapcat.

Update:

I added reverse-conses to account for the fact that when several items are added during a single call to f1 (eg during a r/mapcat “step”) they were added in the wrong order – if only everything was more set-like.

(c) 2018 Clojure and me | powered by WordPress with Barecity