Clojure and me » Screenscraping with Enlive

enlive — cgrand, 27 April 2009 @ 19 h 02 min

(select (html-resource (java.net.URL. "http://clojure-log.n01se.net/")) [:#main [:a (attr? :href)]]) returns a seq of link nodes.

5 Comments »

Christophe,

What’s the best way to get the flattened list of matching nodes? Using a zipper on the result of select?

Thanks!
David

Comment by dnolen — 29 April 2009 @ 0 h 51 min
David,

select already returns a list of nodes, so I’m unsure about what you want to flatten. Can you be more precise?

Comment by Christophe Grand — 29 April 2009 @ 7 h 19 min
Oops sorry for the slow reply. I notice for example when I extract all divs from http://nytimes.com, I only get two divs. That is because all the other divs are nested in those two top level ones. I was just asking for guidance about the best way to traverse just the divs I’m interested in- hopefully I’m making sense here.

Comment by dnolen — 29 April 2009 @ 17 h 55 min
I fixed this bug this morning (CEST)

Comment by Christophe Grand — 29 April 2009 @ 17 h 58 min
[...] found a couple of examples showing how to get started but they both seemed to rely on the web page being at a HTTP URI rather [...]

Pingback by Clojure/Enlive: Screen scraping a HTML file from disk at Mark Needham — 26 August 2013 @ 20 h 01 min