Screenscraping with Enlive
(select (html-resource (java.net.URL. "http://clojure-log.n01se.net/")) [:#main [:a (attr? :href)]])
returns a seq of link nodes.
(select (html-resource (java.net.URL. "http://clojure-log.n01se.net/")) [:#main [:a (attr? :href)]])
returns a seq of link nodes.
RSS feed for comments on this post. TrackBack URI
Christophe,
What’s the best way to get the flattened list of matching nodes? Using a zipper on the result of select?
Thanks!
David
David,
select already returns a list of nodes, so I’m unsure about what you want to flatten. Can you be more precise?
Oops sorry for the slow reply. I notice for example when I extract all divs from http://nytimes.com, I only get two divs. That is because all the other divs are nested in those two top level ones. I was just asking for guidance about the best way to traverse just the divs I’m interested in- hopefully I’m making sense here.
I fixed this bug this morning (CEST)
[...] found a couple of examples showing how to get started but they both seemed to rely on the web page being at a HTTP URI rather [...]
tigerccsop