Decaying lists: log scale for lists

pondering — cgrand, 12 February 2013 @ 13 h 44 min

The concept of exponentially decaying lists by David Barbour piqued my interest and I implemented it. It can be summed up as: log scale for lists!

As for log scale, decaying lists allow to manage large range of values and thus to get a better understanding of the data.

So a decaying list grows logarithmically with the number of conjed items. It follows that some items are dropped when others are inserted. Let’s see:

=> (seq (into (decaying-list) (range 100)))
(99 98 97 96 95 94 92 91 90 89 88 87 86 84 83 80 70 65 61 59 57 49 44 30 29 27 25 23 22 18 12 4 0)

So most of the recent items are there but the older elements have a greater probability to have been dropped.

A decaying list is parametrized by its half-life. The default half-life is 20, it means that an item in the list has one chance out of two to “survive” the insertion of 20 other items.

=> (seq (into (decaying-list 5) (range 100))) ; a half-life of 5 conjs
(99 98 97 96 95 94 93 91 83 81 79 78 68 61 57 54 52 31 15)

You can know where the next decay (if any: nil is returned when no decay) is going to happen:

=> (def dl (into (decaying-list 5) (range 100)))
#'net.cgrand.decay/dl
=> (decay-loc dl)
[(93 95 97 98 99) (92 91 89 87 85 81 77 72 71 63 54 52 46 31 20 16)]

So here the decay is going to happen between 93 and 92 (the first items of the “left” and “right” sequences). By default the most recent one is kept.

=> (seq (conj dl 100))
(100 99 98 97 95 93 91 89 87 85 81 77 72 71 63 54 52 46 31 20 16)

Indeed 92 has been dropped. (Repeated calls to decay-loc always yield the same result.)

However we may elect to synthetize a new value rather than just keep one of the two candidates to decay. For example we can compute their average:

=> (seq (into (decaying-list 5 
                :collapse (fn [dl]
                            (let [[[l] [r]] (decay-loc dl)] 
                              (/ (+ l r) 2.0))))
          (range 100)))
(99 98 97 95.5 93.5 91.5 90 89 87.125 82.75 79.5 72.875 64.8125 60.875 58 50.732421875 27.0 17.75 11.25 2.8125)

Or something a bit more elaborate (and precise):

=> (defn stats [x]
     (if (map? x)
       x
       {:min x :max x :avg x :n 1}))
#'net.cgrand.decay/stats
=> (defn merge-stats [{mina :min maxa :max avga :avg na :n}
                      {minb :min maxb :max avgb :avg nb :n}]
     {:min (min mina minb)
      :max (max maxa maxb)
      :avg (/ (+ (* na avga) (* nb avgb)) (double (+ na nb)))
      :n (+ na nb)})
#'net.cgrand.decay/merge-stats
=> (seq (into (decaying-list 5 
                :collapse (fn [dl]
                            (let [[[l] [r]] (decay-loc dl)] 
                              (merge-stats (stats l) (stats r)))))
          (range 100)))
(99 98 97 {:min 92, :max 96, :avg 94.0, :n 5} 91 {:min 87, :max 90, :avg 88.5, :n 4} 86 85 84 83 {:min 80, :max 82, :avg 81.0, :n 3} {:min 78, :max 79, :avg 78.5, :n 2} 77 {:min 70, :max 76, :avg 73.0, :n 7} 69 68 {:min 49, :max 67, :avg 58.0, :n 19} {:min 41, :max 48, :avg 44.5, :n 8} {:min 36, :max 40, :avg 38.0, :n 5} 35 {:min 32, :max 34, :avg 33.0, :n 3} {:min 10, :max 31, :avg 20.5, :n 22} {:min 3, :max 9, :avg 6.0, :n 7} {:min 0, :max 2, :avg 1.0, :n 3})

Decaying lists can also maintain a state – that is updated after each conj or decay. Below the implementation, there is an example of state management.

You can also cap the length of the decaying list by using the :capacity option. A quick note on capacity: increasing the capacity by the half-life value (eg going from 1000 to 1050 when half-life is 50), doubles the range the decaying list remembers.

7 Comments »

  1. I assume there is a typo in the response from evaluating (decay-loc dl) ? ’94’ is suddenly missing…

    Comment by Anders — 14 February 2013 @ 16 h 11 min
  2. @Anders, no it’s not a typo: multiple evaluations of (into (decaying-list 5) (range 100)) yield multiple results and this one happened to already have dropped 94 while the previous one didn’t. Each decaying-list has its own persistent PRNG which is initialized when the empty decaying list is created.

    Comment by cgrand — 14 February 2013 @ 16 h 16 min
  3. Nevermind, misread on my behalf.

    Comment by Anders — 14 February 2013 @ 16 h 18 min
  4. @cgrand, thanks. Skimmed a bit fast and thought ‘5’ was denoting threshold for how many elements in the front of the list were shielded from decay.

    Anyways, awesome stuff.

    Comment by Anders — 14 February 2013 @ 16 h 24 min
  5. Awesome stuff!

    Since I have only just quickly read through your post, this may be obvios.

    I was wondering if this decaying list could be used to create a decaying priority queue (dropping elements with low priority), or simular a decaying sorted map?

    (when searching huge state spaces such devices could be useful, especially if the “decaying” is cheap)

    Comment by Peter JC — 14 February 2013 @ 17 h 59 min
  6. @Peter conjing (including decaying) on a decaying list is O(1) on average BUT the constant is proportional to the half-life (so a smarter DS than a plain list should be used).
    That said, if you look at the example at the bottom of the gist you’ll see a decaying-memoize where the “priority” of a memoized entry decays over time: so an entry which used to be “hot” will slowly “cool down” and be expunged.

    Comment by cgrand — 14 February 2013 @ 18 h 17 min
  7. It was wonderful to see this!

    As you note, a better data structure than a plain list is appropriate. I would suggest a sequence implemented as a finger-tree (cf. Haskell’s Data.Sequence).

    For a couple applications, I’ve found a useful variation is to have the last K elements (K>=0) guaranteed to survive. This is trivially achieved by adding K to the delete index. By doing so, you get a very nice hybrid of a round-robin and a logarithmic decay.

    Comment by David Barbour — 10 March 2013 @ 4 h 53 min

RSS feed for comments on this post. TrackBack URI

Leave a comment

(c) 2014 Clojure and me | powered by WordPress with Barecity