Create a new Thresher. A Thresher controlls a scraping operation. Thresher handles rendering a page using the chosen rendering engine, passing the HTML of the rendered page back to the Node context, re-rendering it in the local Node jsdom, and running scraperJSON-defined scrapers on the rendered DOM. Thresher emits events during the scraping process: - 'error': if an error occurs - 'element': for each extracted element - 'result': the final result of a single scraping operation - 'rendered': when the HTML of the rendered DOM is returned from PhantomJS
Thresher inherits from EventEmitter
Bubble SpookyJS errors up to our interface, providing a clear context message and the SpookyJS message as detail.
Parameters:
generate SpookyJS settings
Parameters:
Returns an Object
(the settings)
Scrape a URL using a ScraperJSON-defined scraper.
Parameters:
scrapeUrl must be a String.
(the URL to scrape)
definition must be an Object.
(a dictionary defining the scraper)
validate arguments
let's get our scrape on
in SpookyJS scope
in rendered page scope
SpookyJS provides our bridge to CasperJS and PhantomJS