Screenscraping with Enlive

enlive — cgrand, 27 April 2009 @ 19 h 02 min

(select (html-resource (java.net.URL. "http://clojure-log.n01se.net/")) [:#main [:a (attr? :href)]]) returns a seq of link nodes.

5 Comments »

  1. Christophe,

    What’s the best way to get the flattened list of matching nodes? Using a zipper on the result of select?

    Thanks!
    David

    Comment by dnolen — 29 April 2009 @ 0 h 51 min
  2. David,

    select already returns a list of nodes, so I’m unsure about what you want to flatten. Can you be more precise?

    Comment by Christophe Grand — 29 April 2009 @ 7 h 19 min
  3. Oops sorry for the slow reply. I notice for example when I extract all divs from http://nytimes.com, I only get two divs. That is because all the other divs are nested in those two top level ones. I was just asking for guidance about the best way to traverse just the divs I’m interested in- hopefully I’m making sense here.

    Comment by dnolen — 29 April 2009 @ 17 h 55 min
  4. I fixed this bug this morning (CEST)

    Comment by Christophe Grand — 29 April 2009 @ 17 h 58 min
  5. [...] found a couple of examples showing how to get started but they both seemed to rely on the web page being at a HTTP URI rather [...]

RSS feed for comments on this post. TrackBack URI

Leave a comment

(c) 2024 Clojure and me | powered by WordPress with Barecity