Everybody loves the Sieve of Eratosthenes

unsorted — cgrand, 30 July 2009 @ 11 h 27 min

If I judge by traffic logs for this blog, many newcomers want to compute prime numbers in Clojure.

A recent thread on the mailing list prompted me to test an idea I had for a while: to use a map to implement the Sieve of Eratosthenes.

The first implementation was this one:

(defn primes [max]
  (let [enqueue (fn [sieve n factor]
                  (let [m (+ n factor)]
                    (assoc sieve m
                      (conj (sieve m) factor))))
        next-sieve (fn [sieve candidate]
                     (if-let [factors (sieve candidate)]
                       (reduce #(enqueue %1 candidate %2)
                         (dissoc sieve candidate)
                         factors)
                       (enqueue sieve candidate candidate)))]
    (apply concat (vals (reduce next-sieve {} (range 2 max))))))

where the sieve is a map from the next non-prime numbers to their factors. It’s so naive that even numbers are tested for primality but it doesn’t perform that bad: on my box, it takes 3s to compute all primes below 1,000,000 (it’s roughly as fast as clojure.contrib.lazy-seqs/primes when the seq isn’t yet memoized).

I wasn’t happy with the way I handled the case where a non-prime was already checked off (ie was a key of the map): I was conjing onto the list of prime factors for this number. If instead I tried to check off n+p (where n is the already known non-prime and p the current prime) or n+2p or n+3p… until I found a yet unknown non-prime I wouldn’t need to maintain a list of factors, nor to conj or reduce over it. And I was hoping that less allocations would yield better perfs.

Here is the second iteration:

(defn primes2 [max]
  (let [enqueue (fn [sieve n factor]
                  (let [m (+ n factor)]
                    (if (sieve m)
                      (recur sieve m factor)
                      (assoc sieve m factor))))
        next-sieve (fn [sieve candidate]
                     (if-let [factor (sieve candidate)]
                       (-> sieve
                         (dissoc candidate)
                         (enqueue candidate factor))
                       (enqueue sieve candidate candidate)))]
    (vals (reduce next-sieve {} (range 2 max)))))

and it computes all the primes below 1,000,000 in 1.8s instead of 3s but it still tests even numbers for primality.

primes3 is primes2 modified to only test odd numbers:

(defn primes3 [max]
  (let [enqueue (fn [sieve n factor]
                  (let [m (+ n (+ factor factor))]
                    (if (sieve m)
                      (recur sieve m factor)
                      (assoc sieve m factor))))
        next-sieve (fn [sieve candidate]
                     (if-let [factor (sieve candidate)]
                       (-> sieve
                         (dissoc candidate)
                         (enqueue candidate factor))
                       (enqueue sieve candidate candidate)))]
    (cons 2 (vals (reduce next-sieve {} (range 3 max 2))))))

and it computes the same list of primes in 1.5s.

Out of curiosity, I wrote a lazy version of primes3:

(defn lazy-primes3 []
  (letfn [(enqueue [sieve n step]
            (let [m (+ n step)]
              (if (sieve m)
                (recur sieve m step)
                (assoc sieve m step))))
          (next-sieve [sieve candidate]
            (if-let [step (sieve candidate)]
              (-> sieve
                (dissoc candidate)
                (enqueue candidate step))
              (enqueue sieve candidate (+ candidate candidate))))
          (next-primes [sieve candidate]
            (if (sieve candidate)
              (recur (next-sieve sieve candidate) (+ candidate 2))
              (cons candidate 
                (lazy-seq (next-primes (next-sieve sieve candidate) 
                            (+ candidate 2))))))]
    (cons 2 (lazy-seq (next-primes {} 3)))))

and, surprisingly (better locality?), it computes the primes below 1,000,000 in 1s.

Counting without numbers

unsorted — cgrand, 4 May 2009 @ 20 h 09 min

I was writing something along these lines:

(loop [state init, n (count some-seq)]
  (if (pos? n)
    (recur value (dec n))
    (ends here)))

when it struck me that seqs are numerals too!

(loop [state init, n some-seq]
  (if (seq n)
    (recur value (rest n))
    (ends here)))

or:

(loop [state init, n (seq some-seq)]
  (if n
    (recur value (next n))
    (ends here)))

Counting occurences (solution to the exercise)

unsorted — cgrand, 28 April 2009 @ 18 h 00 min
What does (merge-with + {:a 12} {:b 4} {:a 3 :b 7}) return?
{:b 11, :a 15} when there are several values for a key, these values are merged (two at a time) using the specified function — here they are summed.
Can you count occurrences of each value in a collection using merge-with?
(apply merge-with + (map (fn [x] {x 1}) coll)) or, using for: (apply merge-with + (for [x coll] {x 1})).

Counting occurences

unsorted — cgrand, 27 April 2009 @ 17 h 44 min

One asked me how to count occurrences of each value in a collection. I answered (reduce #(assoc %1 %2 (inc (%1 %2 0))) {} coll). Since it can take some time to get accustomed to the functional way of thought, here is how one can work such an expression out:

How to count all occurrences of 42?
(count (filter #{42} coll))
How to express count using reduce?
(defn my-count [coll] (reduce (fn [n _] (inc n)) 0 coll))
How to count all occurrences of 42 using reduce?
(reduce (fn [n _] (inc n)) 0 (filter #{42} coll))
Can you get rid of the filter?
(reduce (fn [n x] (if (= 42 x) (inc n) n)) 0 coll)
I’d like the result to be {42 occurences-count}.
(reduce (fn [m x] (if (= 42 x) (assoc m 42 (inc (m 42))) m)) {42 0} coll)
42 is hardcoded in four places, it’s bad!
(reduce (fn [m x] (if (= 42 x) (assoc m x (inc (m x))) m)) {42 0} coll)
Can’t you replace {42 0} with {}?
No (inc (m x)) would fail because (m x) would return nil.
How does one provide a default value when the key is not found ?
(a-map a-key default-value)
Can’t you replace {42 0} with {}?
(reduce (fn [m x] (if (= 42 x) (assoc m x (inc (m x 0))) m)) {} coll)
I changed my mind I’d like you to count occurrences of each value.
Easy! (reduce (fn [m x] (assoc m x (inc (m x 0)))) {} coll) or, terser, (reduce #(assoc %1 %2 (inc (%1 %2 0))) {} coll)

Exercise:

What does (merge-with + {:a 12} {:b 4} {:a 3 :b 7}) return?
Can you count occurrences of each value in a collection using merge-with?

Simple variations on state machines in Clojure

unsorted — cgrand, 3 March 2009 @ 15 h 00 min

Given a transition function that takes the current state and an input value as arguments then (reduce transition-fn initial-state input) returns the final state.

If you are interested in intermediate states, you can use clojure.contrib.seq-utils/reductions instead of reduce.

If you want an asynchronous state machine, you can use an agent:

(def agt (agent initial-state))
;then each time you get an input-value:
(send agt transition-fn input-value)

If you add a watch to the agent, you can react to state transitions.

rest vs drop

unsorted — cgrand, 23 February 2009 @ 18 h 20 min

Now that fully lazy sequences are in the trunk, (rest a-seq) and (drop 1 a-seq) aren’t equivalent anymore:

user=> (def s (map #(do (println %) %) (range 10)))
#'user/s
user=> (def d (drop 1 s))
#'user/d
user=> (def r (rest s))
0
#'user/r

As one can see rest needs to realize the first element of s while drop doesn’t. The corollary is that (drop n s) holds on the whole seq (including the nth first elements) while rest computes the first element and then discards it.

Shadowing global names

unsorted — cgrand, 31 January 2009 @ 10 h 45 min

A nice article on Clojure that gives one wrong tip: to use #'var-name to access a shadowed global.
Don’t do that! You’d better use namespace-where-defined-or-alias/var-name.
There are several reasons not to use #'var-name:

  • it doesn’t work if the value of your var is not invokable,
  • it evaluates to something different: it returns the var an not its value. It happens that vars proxy function calls to their values but a var can be rebound:
    (def a (map str (range 10)))
    (def b (map #'str (range 10)))
    (take 5 a)
    <i>("0" "1" "2" "3" "4")</i>
    (take 5 b)
    <i>("0" "1" "2" "3" "4")</i>
    (binding [str identity] (doall a))
    <i>("0" "1" "2" "3" "4" "5" "6" "7" "8" "9")</i>
    (binding [str identity] (doall b))
    <i>("0" "1" "2" "3" "4" <strong>5 6 7 8 9</strong>)</i>

NB: @#'var-name is better than #'var-name but, really, just use the namespaced symbol.

Enlive: yet another HTML templating library

unsorted — cgrand, 19 January 2009 @ 16 h 04 min
[UPDATE] I have rewritten Enlive, this posts doesn’t work with the actual Enlive.

Enlive is a selector based templating library.
Its main design goal is to decouple html and presentation code, that’s why Enlive templates are plain old html files (it will ease roundtripping with designers).
In code, you have to declare where (using CSS-like selectors) and how (using clojure code) to alter the html template.
The resulting templating functions are compiled (the html tree isn’t transformed at runtime) and yields a seq of strings.

Here is an example:

(deftemplate microblog-template "net/cgrand/enlive_html/example.html" [title posts]
  [:title] title
  [:h1] title
  [:div.no-msg] (when-not (seq posts) ~(html/show))
  [:div.post] (for [{:keys [title body]} posts]
           ~(at
              [:h2] title
              [:p] body)))

;; at the repl:
net.cgrand.enlive-html.examples=> (apply str (microblog-template "Hello user!"
             [{:title "post #1"
               :body "hello with dangerous chars: <>&"}
              {:title "post #2"
               :body "dolor ipsum"}]))

<em>"&lt;html>&lt;head>&lt;title>Hello user!&lt;/title>&lt;/head>
&lt;body>&lt;h1>Hello user!&lt;/h1>
&lt;div class=\"post\">&lt;h2>post #1&lt;/h2>
&lt;p>hello with dangerous chars: &amp;lt;&amp;gt;&amp;amp;&lt;/p>&lt;/div>
&lt;div class=\"post\">&lt;h2>post #2&lt;/h2>
&lt;p>dolor ipsum&lt;/p>&lt;/div>&lt;/body>&lt;/html>"</em>

(NB: manually edited to add linebreaks.)

(Disclaimer: original idea by Ozzilee)

Surprising

unsorted — cgrand, 19 November 2008 @ 21 h 46 min
user=> (time (dotimes [i 100000000] [i]))
"Elapsed time: 5928.406543 msecs"
nil
user=> (time (dotimes [i 100000000] #(i)))
"Elapsed time: 1774.025749 msecs"
nil

So, it seems that creating a closure is faster than creating a vector. Cool.

Clojure Golf: fib-seq

unsorted — cgrand, 30 October 2008 @ 0 h 45 min

Revisiting a classic:

(def fib-seq
  (lazy-cat [0 1] (map + fib-seq (rest fib-seq))))
« Previous PageNext Page »
(c) 2024 Clojure and me | powered by WordPress with Barecity