Click or drag to resize

WebScraper Fields

The WebScraper type exposes the following members.

Fields
  NameDescription
Public fieldAllowedDomains
If not empty, all requested Urls' hostname must match at least one of the AllowedDomains patterns. Patterns may be added using glob wildcard strings or Regex
Public fieldAllowedUrls
If not empty, all requested Urls must match at least one of the AllowedUrls patterns. Patterns may be added using glob wildcard strings or Regex
Public fieldBannedDomains
If not empty, no requested Urls' hostname may match any of the BannedDomains patterns. Patterns may be added using glob wildcard strings or Regex
Public fieldBannedUrls
If not empty, no requested Urls may match any of the BannedUrls patterns. Patterns may be added using glob wildcard strings or Regex
Public fieldCrawlId
A unique string used to identify a crawl job.
Public fieldFilesDownloaded
The total number of files downloaded successfully with the DownloadImage and DownloadFile methods.
Public fieldIdentities
A list of http identities to be used to fetch web resources.

Each Identity may have a different proxy IP addresses, userAgent, http headers, persistent cookies, username and password.

Best practice is to create Identities in your WebScraper.Init Method and Add them to this WebScraper.Identities List.

Public fieldLoggingLevel
The level of logging made by the WebScraper engine to the Console.

LogLevel.Critical is normally the most useful setting, allowing the developer to write their own, meaningful and application relevant messages inside of Parse methods.

LogLevel.ScrapedData is useful when coding and testing a new WebScraper.

Public fieldObeyRobotsDotTxt
Causes the WebScraper to always obey /robots.txt directives including url and path restrictions and crawl rates.
Public fieldWorkingDirectory
Path to a local directory where scraped data and state information will be saved.
Top
See Also