The WebScraper type exposes the following members.
If not empty, all requested Urls' hostname must match at least one of the AllowedDomains patterns. Patterns may be added using glob wildcard strings or Regex
If not empty, all requested Urls must match at least one of the AllowedUrls patterns. Patterns may be added using glob wildcard strings or Regex
If not empty, no requested Urls' hostname may match any of the BannedDomains patterns. Patterns may be added using glob wildcard strings or Regex
If not empty, no requested Urls may match any of the BannedUrls patterns. Patterns may be added using glob wildcard strings or Regex
A unique string used to identify a crawl job.
The total number of files downloaded successfully with the DownloadImage and DownloadFile methods.
A list of http identities to be used to fetch web resources.
Each Identity may have a different proxy IP addresses, userAgent, http headers, persistent cookies, username and password.
Best practice is to create Identities in your WebScraper.Init Method and Add them to this WebScraper.Identities List.
The level of logging made by the WebScraper engine to the Console.
LogLevel.Critical is normally the most useful setting, allowing the developer to write their own, meaningful and application relevant messages inside of Parse methods.
LogLevel.ScrapedData is useful when coding and testing a new WebScraper.
Causes the WebScraper to always obey /robots.txt directives including url and path restrictions and crawl rates.
Path to a local directory where scraped data and state information will be saved.