IronWebScraper 教程 高級網頁擷取功能 Advanced Webscraping Features Darrius Serrant 更新日期:6月 9, 2025 Download IronWebScraper NuGet 下載 DLL 下載 Start Free Trial Copy for LLMs Copy for LLMs Copy page as Markdown for LLMs Open in ChatGPT Ask ChatGPT about this page Open in Gemini Ask Gemini about this page Open in Grok Ask Grok about this page Open in Perplexity Ask Perplexity about this page Share Share on Facebook Share on X (Twitter) Share on LinkedIn Copy URL Email article This article was translated from English: Does it need improvement? Translated View the article in English HttpIdentity 功能 一些網站系統需要用戶登入才能查看內容; 在這種情況下,我們可以使用 HttpIdentity。 以下是設置方法: // Create a new instance of HttpIdentity HttpIdentity id = new HttpIdentity(); // Set the network username and password for authentication id.NetworkUsername = "username"; id.NetworkPassword = "pwd"; // Add the identity to the collection of identities Identities.Add(id); // Create a new instance of HttpIdentity HttpIdentity id = new HttpIdentity(); // Set the network username and password for authentication id.NetworkUsername = "username"; id.NetworkPassword = "pwd"; // Add the identity to the collection of identities Identities.Add(id); ' Create a new instance of HttpIdentity Dim id As New HttpIdentity() ' Set the network username and password for authentication id.NetworkUsername = "username" id.NetworkPassword = "pwd" ' Add the identity to the collection of identities Identities.Add(id) $vbLabelText $csharpLabel IronWebScraper 中最令人印象深刻且強大的功能之一是能夠使用數千個獨特的用戶憑證和/或瀏覽器引擎,以多個登入會話的方式模擬或抓取網站。 public override void Init() { // Set the license key for IronWebScraper License.LicenseKey = "LicenseKey"; // Set the logging level to capture all logs this.LoggingLevel = WebScraper.LogLevel.All; // Assign the working directory for the output files this.WorkingDirectory = AppSetting.GetAppRoot() + @"\ShoppingSiteSample\Output\"; // Define an array of proxies var proxies = "IP-Proxy1:8080,IP-Proxy2:8081".Split(','); // Iterate over common Chrome desktop user agents foreach (var UA in IronWebScraper.CommonUserAgents.ChromeDesktopUserAgents) { // Iterate over the proxies foreach (var proxy in proxies) { // Add a new HTTP identity with specific user agent and proxy Identities.Add(new HttpIdentity() { UserAgent = UA, UseCookies = true, Proxy = proxy }); } } // Make an initial request to the website with a parse method this.Request("http://www.Website.com", Parse); } public override void Init() { // Set the license key for IronWebScraper License.LicenseKey = "LicenseKey"; // Set the logging level to capture all logs this.LoggingLevel = WebScraper.LogLevel.All; // Assign the working directory for the output files this.WorkingDirectory = AppSetting.GetAppRoot() + @"\ShoppingSiteSample\Output\"; // Define an array of proxies var proxies = "IP-Proxy1:8080,IP-Proxy2:8081".Split(','); // Iterate over common Chrome desktop user agents foreach (var UA in IronWebScraper.CommonUserAgents.ChromeDesktopUserAgents) { // Iterate over the proxies foreach (var proxy in proxies) { // Add a new HTTP identity with specific user agent and proxy Identities.Add(new HttpIdentity() { UserAgent = UA, UseCookies = true, Proxy = proxy }); } } // Make an initial request to the website with a parse method this.Request("http://www.Website.com", Parse); } Public Overrides Sub Init() ' Set the license key for IronWebScraper License.LicenseKey = "LicenseKey" ' Set the logging level to capture all logs Me.LoggingLevel = WebScraper.LogLevel.All ' Assign the working directory for the output files Me.WorkingDirectory = AppSetting.GetAppRoot() & "\ShoppingSiteSample\Output\" ' Define an array of proxies Dim proxies = "IP-Proxy1:8080,IP-Proxy2:8081".Split(","c) ' Iterate over common Chrome desktop user agents For Each UA In IronWebScraper.CommonUserAgents.ChromeDesktopUserAgents ' Iterate over the proxies For Each proxy In proxies ' Add a new HTTP identity with specific user agent and proxy Identities.Add(New HttpIdentity() With { .UserAgent = UA, .UseCookies = True, .Proxy = proxy }) Next proxy Next UA ' Make an initial request to the website with a parse method Me.Request("http://www.Website.com", Parse) End Sub $vbLabelText $csharpLabel 您有多種屬性可提供不同行為,從而防止網站封鎖您。 這些屬性包括: NetworkDomain:用於用戶驗證的網路域。 支持 Windows、NTLM、Kerberos、Linux、BSD 和 Mac OS X 網路。 必須與 NetworkUsername 和 NetworkPassword 一起使用。 NetworkUsername:用於用戶驗證的網路/http 用戶名。 支持 HTTP、Windows 網路、NTLM、Kerberos、Linux 網路、BSD 網路和 Mac OS。 NetworkPassword:用於用戶驗證的網路/http 密碼。 支持 HTTP、Windows 網路、NTLM、Kerberos、Linux 網路、BSD 網路和 Mac OS。 Proxy:設置代理設定。 UserAgent:設置瀏覽器引擎(例如,Chrome 桌面、Chrome 手機、Chrome 平板、IE 和 Firefox 等)。 HttpRequestHeaders:用於與此身份一起使用的自定義標頭值,它接受字典對象 Dictionary<string, string>。 UseCookies:啟用/禁用使用 Cookies。 IronWebScraper 使用隨機身份運行抓取器。 如果我們需要指定使用特定身份來解析頁面,我們可以這樣做: public override void Init() { // Set the license key for IronWebScraper License.LicenseKey = "LicenseKey"; // Set the logging level to capture all logs this.LoggingLevel = WebScraper.LogLevel.All; // Assign the working directory for the output files this.WorkingDirectory = AppSetting.GetAppRoot() + @"\ShoppingSiteSample\Output\"; // Create a new instance of HttpIdentity HttpIdentity identity = new HttpIdentity(); // Set the network username and password for authentication identity.NetworkUsername = "username"; identity.NetworkPassword = "pwd"; // Add the identity to the collection of identities Identities.Add(identity); // Make a request to the website with the specified identity this.Request("http://www.Website.com", Parse, identity); } public override void Init() { // Set the license key for IronWebScraper License.LicenseKey = "LicenseKey"; // Set the logging level to capture all logs this.LoggingLevel = WebScraper.LogLevel.All; // Assign the working directory for the output files this.WorkingDirectory = AppSetting.GetAppRoot() + @"\ShoppingSiteSample\Output\"; // Create a new instance of HttpIdentity HttpIdentity identity = new HttpIdentity(); // Set the network username and password for authentication identity.NetworkUsername = "username"; identity.NetworkPassword = "pwd"; // Add the identity to the collection of identities Identities.Add(identity); // Make a request to the website with the specified identity this.Request("http://www.Website.com", Parse, identity); } Public Overrides Sub Init() ' Set the license key for IronWebScraper License.LicenseKey = "LicenseKey" ' Set the logging level to capture all logs Me.LoggingLevel = WebScraper.LogLevel.All ' Assign the working directory for the output files Me.WorkingDirectory = AppSetting.GetAppRoot() & "\ShoppingSiteSample\Output\" ' Create a new instance of HttpIdentity Dim identity As New HttpIdentity() ' Set the network username and password for authentication identity.NetworkUsername = "username" identity.NetworkPassword = "pwd" ' Add the identity to the collection of identities Identities.Add(identity) ' Make a request to the website with the specified identity Me.Request("http://www.Website.com", Parse, identity) End Sub $vbLabelText $csharpLabel 啟用 Web 緩存功能 此功能用於緩存請求的頁面。 它通常在開發和測試階段中使用,使開發人員能夠緩存所需頁面以便在更新代碼後重用。 這使您能夠在重啟您的網頁抓取器後,在緩存的頁面上執行代碼,而無需每次都連接到線上網站(動作重播)。 您可以在 Init() 方法中使用它: // Enable web cache without an expiration time EnableWebCache(); // OR enable web cache with a specified expiration time EnableWebCache(new TimeSpan(1, 30, 30)); // Enable web cache without an expiration time EnableWebCache(); // OR enable web cache with a specified expiration time EnableWebCache(new TimeSpan(1, 30, 30)); ' Enable web cache without an expiration time EnableWebCache() ' OR enable web cache with a specified expiration time EnableWebCache(New TimeSpan(1, 30, 30)) $vbLabelText $csharpLabel 它會將您的緩存數據保存到工作目錄下的 WebCache 文件夾。 public override void Init() { // Set the license key for IronWebScraper License.LicenseKey = "LicenseKey"; // Set the logging level to capture all logs this.LoggingLevel = WebScraper.LogLevel.All; // Assign the working directory for the output files this.WorkingDirectory = AppSetting.GetAppRoot() + @"\ShoppingSiteSample\Output\"; // Enable web cache with a specific expiration time of 1 hour, 30 minutes, and 30 seconds EnableWebCache(new TimeSpan(1, 30, 30)); // Make an initial request to the website with a parse method this.Request("http://www.Website.com", Parse); } public override void Init() { // Set the license key for IronWebScraper License.LicenseKey = "LicenseKey"; // Set the logging level to capture all logs this.LoggingLevel = WebScraper.LogLevel.All; // Assign the working directory for the output files this.WorkingDirectory = AppSetting.GetAppRoot() + @"\ShoppingSiteSample\Output\"; // Enable web cache with a specific expiration time of 1 hour, 30 minutes, and 30 seconds EnableWebCache(new TimeSpan(1, 30, 30)); // Make an initial request to the website with a parse method this.Request("http://www.Website.com", Parse); } Public Overrides Sub Init() ' Set the license key for IronWebScraper License.LicenseKey = "LicenseKey" ' Set the logging level to capture all logs Me.LoggingLevel = WebScraper.LogLevel.All ' Assign the working directory for the output files Me.WorkingDirectory = AppSetting.GetAppRoot() & "\ShoppingSiteSample\Output\" ' Enable web cache with a specific expiration time of 1 hour, 30 minutes, and 30 seconds EnableWebCache(New TimeSpan(1, 30, 30)) ' Make an initial request to the website with a parse method Me.Request("http://www.Website.com", Parse) End Sub $vbLabelText $csharpLabel IronWebScraper 還具有功能,可以設置引擎啟動過程名稱藉由使用 Start(CrawlID),以使您的引擎在重啟代碼後繼續抓取。 static void Main(string[] args) { // Create an object from the Scraper class EngineScraper scrape = new EngineScraper(); // Start the scraping process with the specified crawl ID scrape.Start("enginestate"); } static void Main(string[] args) { // Create an object from the Scraper class EngineScraper scrape = new EngineScraper(); // Start the scraping process with the specified crawl ID scrape.Start("enginestate"); } Shared Sub Main(ByVal args() As String) ' Create an object from the Scraper class Dim scrape As New EngineScraper() ' Start the scraping process with the specified crawl ID scrape.Start("enginestate") End Sub $vbLabelText $csharpLabel 執行請求和響應將保存到工作目錄內的 SavedState 文件夾。 限流 我們可以控制每個域的最小和最大連接數量和連接速度。 public override void Init() { // Set the license key for IronWebScraper License.LicenseKey = "LicenseKey"; // Set the logging level to capture all logs this.LoggingLevel = WebScraper.LogLevel.All; // Assign the working directory for the output files this.WorkingDirectory = AppSetting.GetAppRoot() + @"\ShoppingSiteSample\Output\"; // Set the total number of allowed open HTTP requests (threads) this.MaxHttpConnectionLimit = 80; // Set minimum polite delay (pause) between requests to a given domain or IP address this.RateLimitPerHost = TimeSpan.FromMilliseconds(50); // Set the allowed number of concurrent HTTP requests (threads) per hostname or IP address this.OpenConnectionLimitPerHost = 25; // Do not obey the robots.txt files this.ObeyRobotsDotTxt = false; // Makes the WebScraper intelligently throttle requests not only by hostname, but also by host servers' IP addresses this.ThrottleMode = Throttle.ByDomainHostName; // Make an initial request to the website with a parse method this.Request("https://www.Website.com", Parse); } public override void Init() { // Set the license key for IronWebScraper License.LicenseKey = "LicenseKey"; // Set the logging level to capture all logs this.LoggingLevel = WebScraper.LogLevel.All; // Assign the working directory for the output files this.WorkingDirectory = AppSetting.GetAppRoot() + @"\ShoppingSiteSample\Output\"; // Set the total number of allowed open HTTP requests (threads) this.MaxHttpConnectionLimit = 80; // Set minimum polite delay (pause) between requests to a given domain or IP address this.RateLimitPerHost = TimeSpan.FromMilliseconds(50); // Set the allowed number of concurrent HTTP requests (threads) per hostname or IP address this.OpenConnectionLimitPerHost = 25; // Do not obey the robots.txt files this.ObeyRobotsDotTxt = false; // Makes the WebScraper intelligently throttle requests not only by hostname, but also by host servers' IP addresses this.ThrottleMode = Throttle.ByDomainHostName; // Make an initial request to the website with a parse method this.Request("https://www.Website.com", Parse); } Public Overrides Sub Init() ' Set the license key for IronWebScraper License.LicenseKey = "LicenseKey" ' Set the logging level to capture all logs Me.LoggingLevel = WebScraper.LogLevel.All ' Assign the working directory for the output files Me.WorkingDirectory = AppSetting.GetAppRoot() & "\ShoppingSiteSample\Output\" ' Set the total number of allowed open HTTP requests (threads) Me.MaxHttpConnectionLimit = 80 ' Set minimum polite delay (pause) between requests to a given domain or IP address Me.RateLimitPerHost = TimeSpan.FromMilliseconds(50) ' Set the allowed number of concurrent HTTP requests (threads) per hostname or IP address Me.OpenConnectionLimitPerHost = 25 ' Do not obey the robots.txt files Me.ObeyRobotsDotTxt = False ' Makes the WebScraper intelligently throttle requests not only by hostname, but also by host servers' IP addresses Me.ThrottleMode = Throttle.ByDomainHostName ' Make an initial request to the website with a parse method Me.Request("https://www.Website.com", Parse) End Sub $vbLabelText $csharpLabel 限流 properties MaxHttpConnectionLimit 允許的開啟 HTTP 請求(執行緒)總數 RateLimitPerHost 每個域名或 IP 地址的最小禮貌延遲或暫停(毫秒) OpenConnectionLimitPerHost 每個主機名允許的同時 HTTP 請求(執行緒)數量 ThrottleMode 使 WebScraper 智能限流請求不僅按主機名,還按主機服務器的 IP 地址。 如果多個被抓取的域名托管在同一機器上,這是一種禮貌行為。 開始使用 IronWebscraper 立即開始在您的項目中使用 IronWebScraper 並免費試用。 第一步: 免費啟動 常見問題解答 如何在 C# 中驗證需要登錄的網站上的用戶? 您可以利用 IronWebScraper 中的 HttpIdentity 功能來通過設置屬性如 NetworkDomain、NetworkUsername 和 NetworkPassword 來驗證用戶。 在開發過程中使用網頁快取有什麼好處? 網頁快取功能允許您快取請求的頁面以供重複使用,這有助於節省時間和資源,避免重複連接到實際網站,特別是在開發和測試階段。 如何管理網路抓取中的多個登錄會話? IronWebScraper 允許使用數千種獨特的用戶憑據和瀏覽器引擎來模擬多個登錄會話,這有助於防止網站檢測和封鎖抓取器。 網路抓取中有哪些高級的節流選項? IronWebScraper 提供 ThrottleMode 設置,可以根據主機名稱和 IP 地址智能地管理請求節流,確保與共享主機環境的良好互動。 如何在 IronWebScraper 中使用代理? 要使用代理,請定義代理陣列並將它們與 IronWebScraper 中的 HttpIdentity 實例關聯,允許請求經由不同的 IP 地址路由以實現匿名性和訪問控制。 IronWebScraper 如何處理請求延遲以防止伺服器過載? IronWebScraper 中的 RateLimitPerHost 設定指定了對特定域或 IP 地址的請求之間的最小延遲,通過間隔請求來幫助防止伺服器過載。 網路抓取中斷後是否可以恢復? 是的,IronWebScraper 可以通過使用 Start(CrawlID) 方法恢復抓取,該方法會保存執行狀態並從上次保存的點繼續。 如何控制網路抓取器中的並發 HTTP 連接數量? 在 IronWebScraper 中,您可以設置 MaxHttpConnectionLimit 屬性以控制允許開放的 HTTP 請求總數,幫助管理伺服器負載和資源。 有哪些選項可用來記錄網路抓取活動? IronWebScraper 允許您使用 LoggingLevel 屬性設置日誌級別,提供全面的日誌記錄以便在抓取操作期間進行詳細分析和故障排除。 Darrius Serrant 立即與工程團隊聊天 全棧軟件工程師 (WebOps) Darrius Serrant 擁有邁阿密大學計算機科學學士學位,目前任職於 Iron Software 的全栈 WebOps 市場營銷工程師。從小就迷上編碼,他認為計算既神秘又可接近,是創意和解決問題的完美媒介。在 Iron Software,Darrius 喜歡創造新事物,並簡化複雜概念以便於理解。作為我們的駐場開發者之一,他也自願教學生,分享他的專業知識給下一代。對 Darrius 來說,工作令人滿意因為它被重視且有實際影響。 準備好開始了嗎? Nuget 下載 122,916 | 版本: 2025.11 剛剛發布 免費 NuGet 下載 總下載量:122,916 查看許可證