Scraper inherits from EventEmitter
validate a scraperJSON definition
url key must exist
elements key must exist
there must be at least 1 element
each element much have a selector
check if this scraper applies to a given URL
Annotate any elements that are depended on as follow-ons by other elements by setting their 'followme' property to true
TODO: maybe a better approach is to have a function that handles an object, checking if it's an element, and recursing if it has child elements
Load elements from a dictionary of nested objects to a dictionary of nested scrapers, also storing all elements in a flat array for rapid iteration
Scrape the provided doc with this scraper and return the results object
Scrape a specific element
extract element
run regex if applicable
save the result
process downloads
Download the resource specified by an element
rename downloaded file?
set download running
add it to the task ticker
Run regular expression on a captured element
Create a new Scraper with this url and the elements provided return the new scraper
Scraper.js
> license: MIT
Description
Scrapers can scrape DOMs. They are created from ScraperJSON definitions, and return scraped data as structured JSON. Scrapers emit the following events: *
error
: on any error. If not intercepted, these events will throw. *elementCaptured
(data): when an element is successfully captured. *elementCaptureFailed
(element): when element capture fails. *downloadComplete
*downloadError
Usage
The Scraper class is created from a ScraperJSON definition: var scraper = new Scraper(definition); The scraper is them executed on DOMs: scraper.scrapeDoc(doc);