IronWebScraper 教學 進階 Webscraping 功能 C# 中的進階網頁抓取功能 Darrius Serrant 更新:6月 9, 2025 下載 IronWebScraper NuGet 下載 DLL 下載 開始免費試用 法學碩士副本 法學碩士副本 將頁面複製為 Markdown 格式,用於 LLMs 在 ChatGPT 中打開 請向 ChatGPT 諮詢此頁面 在雙子座打開 請向 Gemini 詢問此頁面 在雙子座打開 請向 Gemini 詢問此頁面 打開困惑 向 Perplexity 詢問有關此頁面的信息 分享 在 Facebook 上分享 分享到 X(Twitter) 在 LinkedIn 上分享 複製連結 電子郵件文章 This article was translated from English: Does it need improvement? Translated View the article in English HttpIdentity 功能 某些網站系統要求使用者登入才能查看內容; 在這種情況下,我們可以使用HttpIdentity 。 以下是設定方法: // Create a new instance of HttpIdentity HttpIdentity id = new HttpIdentity(); // Set the network username and password for authentication id.NetworkUsername = "username"; id.NetworkPassword = "pwd"; // Add the identity to the collection of identities Identities.Add(id); // Create a new instance of HttpIdentity HttpIdentity id = new HttpIdentity(); // Set the network username and password for authentication id.NetworkUsername = "username"; id.NetworkPassword = "pwd"; // Add the identity to the collection of identities Identities.Add(id); ' Create a new instance of HttpIdentity Dim id As New HttpIdentity() ' Set the network username and password for authentication id.NetworkUsername = "username" id.NetworkPassword = "pwd" ' Add the identity to the collection of identities Identities.Add(id) $vbLabelText $csharpLabel IronWebScraper 最令人印象深刻和最強大的功能之一是能夠使用數千個獨特的使用者憑證和/或瀏覽器引擎,透過多個登入工作階段來欺騙或抓取網站。 public override void Init() { // Set the license key for IronWebScraper License.LicenseKey = "LicenseKey"; // Set the logging level to capture all logs this.LoggingLevel = WebScraper.LogLevel.All; // Assign the working directory for the output files this.WorkingDirectory = AppSetting.GetAppRoot() + @"\ShoppingSiteSample\Output\"; // Define an array of proxies var proxies = "IP-Proxy1:8080,IP-Proxy2:8081".Split(','); // Iterate over common Chrome desktop user agents foreach (var UA in IronWebScraper.CommonUserAgents.ChromeDesktopUserAgents) { // Iterate over the proxies foreach (var proxy in proxies) { // Add a new HTTP identity with specific user agent and proxy Identities.Add(new HttpIdentity() { UserAgent = UA, UseCookies = true, Proxy = proxy }); } } // Make an initial request to the website with a parse method this.Request("http://www.Website.com", Parse); } public override void Init() { // Set the license key for IronWebScraper License.LicenseKey = "LicenseKey"; // Set the logging level to capture all logs this.LoggingLevel = WebScraper.LogLevel.All; // Assign the working directory for the output files this.WorkingDirectory = AppSetting.GetAppRoot() + @"\ShoppingSiteSample\Output\"; // Define an array of proxies var proxies = "IP-Proxy1:8080,IP-Proxy2:8081".Split(','); // Iterate over common Chrome desktop user agents foreach (var UA in IronWebScraper.CommonUserAgents.ChromeDesktopUserAgents) { // Iterate over the proxies foreach (var proxy in proxies) { // Add a new HTTP identity with specific user agent and proxy Identities.Add(new HttpIdentity() { UserAgent = UA, UseCookies = true, Proxy = proxy }); } } // Make an initial request to the website with a parse method this.Request("http://www.Website.com", Parse); } Public Overrides Sub Init() ' Set the license key for IronWebScraper License.LicenseKey = "LicenseKey" ' Set the logging level to capture all logs Me.LoggingLevel = WebScraper.LogLevel.All ' Assign the working directory for the output files Me.WorkingDirectory = AppSetting.GetAppRoot() & "\ShoppingSiteSample\Output\" ' Define an array of proxies Dim proxies = "IP-Proxy1:8080,IP-Proxy2:8081".Split(","c) ' Iterate over common Chrome desktop user agents For Each UA In IronWebScraper.CommonUserAgents.ChromeDesktopUserAgents ' Iterate over the proxies For Each proxy In proxies ' Add a new HTTP identity with specific user agent and proxy Identities.Add(New HttpIdentity() With { .UserAgent = UA, .UseCookies = True, .Proxy = proxy }) Next proxy Next UA ' Make an initial request to the website with a parse method Me.Request("http://www.Website.com", Parse) End Sub $vbLabelText $csharpLabel 您擁有多種屬性,可以賦予您不同的行為,從而防止網站封鎖您。 這些特性包括: NetworkDomain :用於使用者驗證的網路網域。 支援 Windows、NTLM、Kerberos、Linux、BSD 和 Mac OS X 網路。 必須與NetworkUsername和NetworkPassword一起使用。 NetworkUsername :用於使用者驗證的網路/http使用者名稱。 支援 HTTP、Windows 網路、NTLM、Kerberos、Linux 網路、BSD 網路和 Mac OS。 NetworkPassword :用於使用者驗證的網路/http密碼。 支援 HTTP、Windows 網路、NTLM、Kerberos、Linux 網路、BSD 網路和 Mac OS。 Proxy :設定代理設定。 UserAgent :用於設定瀏覽器引擎(例如,Chrome 桌面版、Chrome 行動版、Chrome 平板電腦版、IE 和 Firefox 等)。 HttpRequestHeaders :用於指定將與此標識一起使用的自訂標頭值,它接受一個字典物件Dictionary<string, string> 。 UseCookies :啟用/停用使用 cookies。 IronWebScraper 使用隨機身分來運行爬蟲。 如果我們需要指定使用特定標識來解析頁面,我們可以這樣做: public override void Init() { // Set the license key for IronWebScraper License.LicenseKey = "LicenseKey"; // Set the logging level to capture all logs this.LoggingLevel = WebScraper.LogLevel.All; // Assign the working directory for the output files this.WorkingDirectory = AppSetting.GetAppRoot() + @"\ShoppingSiteSample\Output\"; // Create a new instance of HttpIdentity HttpIdentity identity = new HttpIdentity(); // Set the network username and password for authentication identity.NetworkUsername = "username"; identity.NetworkPassword = "pwd"; // Add the identity to the collection of identities Identities.Add(identity); // Make a request to the website with the specified identity this.Request("http://www.Website.com", Parse, identity); } public override void Init() { // Set the license key for IronWebScraper License.LicenseKey = "LicenseKey"; // Set the logging level to capture all logs this.LoggingLevel = WebScraper.LogLevel.All; // Assign the working directory for the output files this.WorkingDirectory = AppSetting.GetAppRoot() + @"\ShoppingSiteSample\Output\"; // Create a new instance of HttpIdentity HttpIdentity identity = new HttpIdentity(); // Set the network username and password for authentication identity.NetworkUsername = "username"; identity.NetworkPassword = "pwd"; // Add the identity to the collection of identities Identities.Add(identity); // Make a request to the website with the specified identity this.Request("http://www.Website.com", Parse, identity); } Public Overrides Sub Init() ' Set the license key for IronWebScraper License.LicenseKey = "LicenseKey" ' Set the logging level to capture all logs Me.LoggingLevel = WebScraper.LogLevel.All ' Assign the working directory for the output files Me.WorkingDirectory = AppSetting.GetAppRoot() & "\ShoppingSiteSample\Output\" ' Create a new instance of HttpIdentity Dim identity As New HttpIdentity() ' Set the network username and password for authentication identity.NetworkUsername = "username" identity.NetworkPassword = "pwd" ' Add the identity to the collection of identities Identities.Add(identity) ' Make a request to the website with the specified identity Me.Request("http://www.Website.com", Parse, identity) End Sub $vbLabelText $csharpLabel 啟用網頁快取功能 此功能用於快取請求的頁面。 它常用於開發和測試階段,使開發人員能夠快取所需的頁面以便在更新程式碼後重複使用。 這樣,即使重新啟動網路爬蟲,也可以在快取的頁面上執行程式碼,而無需每次都連接到即時網站(動作重播)。 你可以在Init()方法中使用它: // Enable web cache without an expiration time EnableWebCache(); // OR enable web cache with a specified expiration time EnableWebCache(new TimeSpan(1, 30, 30)); // Enable web cache without an expiration time EnableWebCache(); // OR enable web cache with a specified expiration time EnableWebCache(new TimeSpan(1, 30, 30)); ' Enable web cache without an expiration time EnableWebCache() ' OR enable web cache with a specified expiration time EnableWebCache(New TimeSpan(1, 30, 30)) $vbLabelText $csharpLabel 它會將快取資料保存到工作目錄下的 WebCache 資料夾中。 public override void Init() { // Set the license key for IronWebScraper License.LicenseKey = "LicenseKey"; // Set the logging level to capture all logs this.LoggingLevel = WebScraper.LogLevel.All; // Assign the working directory for the output files this.WorkingDirectory = AppSetting.GetAppRoot() + @"\ShoppingSiteSample\Output\"; // Enable web cache with a specific expiration time of 1 hour, 30 minutes, and 30 seconds EnableWebCache(new TimeSpan(1, 30, 30)); // Make an initial request to the website with a parse method this.Request("http://www.Website.com", Parse); } public override void Init() { // Set the license key for IronWebScraper License.LicenseKey = "LicenseKey"; // Set the logging level to capture all logs this.LoggingLevel = WebScraper.LogLevel.All; // Assign the working directory for the output files this.WorkingDirectory = AppSetting.GetAppRoot() + @"\ShoppingSiteSample\Output\"; // Enable web cache with a specific expiration time of 1 hour, 30 minutes, and 30 seconds EnableWebCache(new TimeSpan(1, 30, 30)); // Make an initial request to the website with a parse method this.Request("http://www.Website.com", Parse); } Public Overrides Sub Init() ' Set the license key for IronWebScraper License.LicenseKey = "LicenseKey" ' Set the logging level to capture all logs Me.LoggingLevel = WebScraper.LogLevel.All ' Assign the working directory for the output files Me.WorkingDirectory = AppSetting.GetAppRoot() & "\ShoppingSiteSample\Output\" ' Enable web cache with a specific expiration time of 1 hour, 30 minutes, and 30 seconds EnableWebCache(New TimeSpan(1, 30, 30)) ' Make an initial request to the website with a parse method Me.Request("http://www.Website.com", Parse) End Sub $vbLabelText $csharpLabel IronWebScraper 還具有一些功能,可透過使用Start(CrawlID)設定引擎啟動進程名稱,讓引擎在重新啟動程式碼後繼續抓取。 static void Main(string[] args) { // Create an object from the Scraper class EngineScraper scrape = new EngineScraper(); // Start the scraping process with the specified crawl ID scrape.Start("enginestate"); } static void Main(string[] args) { // Create an object from the Scraper class EngineScraper scrape = new EngineScraper(); // Start the scraping process with the specified crawl ID scrape.Start("enginestate"); } Shared Sub Main(ByVal args() As String) ' Create an object from the Scraper class Dim scrape As New EngineScraper() ' Start the scraping process with the specified crawl ID scrape.Start("enginestate") End Sub $vbLabelText $csharpLabel 執行請求和回應將保存在工作目錄下的 SavedState 資料夾中。 節流 我們可以控制每個域的最小和最大連接數以及連接速度。 public override void Init() { // Set the license key for IronWebScraper License.LicenseKey = "LicenseKey"; // Set the logging level to capture all logs this.LoggingLevel = WebScraper.LogLevel.All; // Assign the working directory for the output files this.WorkingDirectory = AppSetting.GetAppRoot() + @"\ShoppingSiteSample\Output\"; // Set the total number of allowed open HTTP requests (threads) this.MaxHttpConnectionLimit = 80; // Set minimum polite delay (pause) between requests to a given domain or IP address this.RateLimitPerHost = TimeSpan.FromMilliseconds(50); // Set the allowed number of concurrent HTTP requests (threads) per hostname or IP address this.OpenConnectionLimitPerHost = 25; // Do not obey the robots.txt files this.ObeyRobotsDotTxt = false; // Makes the WebScraper intelligently throttle requests not only by hostname, but also by host servers' IP addresses this.ThrottleMode = Throttle.ByDomainHostName; // Make an initial request to the website with a parse method this.Request("https://www.Website.com", Parse); } public override void Init() { // Set the license key for IronWebScraper License.LicenseKey = "LicenseKey"; // Set the logging level to capture all logs this.LoggingLevel = WebScraper.LogLevel.All; // Assign the working directory for the output files this.WorkingDirectory = AppSetting.GetAppRoot() + @"\ShoppingSiteSample\Output\"; // Set the total number of allowed open HTTP requests (threads) this.MaxHttpConnectionLimit = 80; // Set minimum polite delay (pause) between requests to a given domain or IP address this.RateLimitPerHost = TimeSpan.FromMilliseconds(50); // Set the allowed number of concurrent HTTP requests (threads) per hostname or IP address this.OpenConnectionLimitPerHost = 25; // Do not obey the robots.txt files this.ObeyRobotsDotTxt = false; // Makes the WebScraper intelligently throttle requests not only by hostname, but also by host servers' IP addresses this.ThrottleMode = Throttle.ByDomainHostName; // Make an initial request to the website with a parse method this.Request("https://www.Website.com", Parse); } Public Overrides Sub Init() ' Set the license key for IronWebScraper License.LicenseKey = "LicenseKey" ' Set the logging level to capture all logs Me.LoggingLevel = WebScraper.LogLevel.All ' Assign the working directory for the output files Me.WorkingDirectory = AppSetting.GetAppRoot() & "\ShoppingSiteSample\Output\" ' Set the total number of allowed open HTTP requests (threads) Me.MaxHttpConnectionLimit = 80 ' Set minimum polite delay (pause) between requests to a given domain or IP address Me.RateLimitPerHost = TimeSpan.FromMilliseconds(50) ' Set the allowed number of concurrent HTTP requests (threads) per hostname or IP address Me.OpenConnectionLimitPerHost = 25 ' Do not obey the robots.txt files Me.ObeyRobotsDotTxt = False ' Makes the WebScraper intelligently throttle requests not only by hostname, but also by host servers' IP addresses Me.ThrottleMode = Throttle.ByDomainHostName ' Make an initial request to the website with a parse method Me.Request("https://www.Website.com", Parse) End Sub $vbLabelText $csharpLabel 節流特性 MaxHttpConnectionLimit 允許開啟的 HTTP 請求(線程)總數 RateLimitPerHost 向特定網域名稱或 IP 位址發出請求之間最短的禮貌性延遲或暫停時間(以毫秒為單位)。 OpenConnectionLimitPerHost 每個主機名稱允許的並發 HTTP 請求(線程)數量 ThrottleMode 讓 WebScraper 不僅能根據主機名,還能根據主機伺服器的 IP 位址智慧地限制請求。 如果多個抓取的網域託管在同一台機器上,這樣做比較禮貌。 開始使用 IronWebscraper 立即開始在您的項目中使用 IronWebScraper 並免費試用。 第一步: 免費啟動 常見問題解答 如何在需要 C# 登入的網站上驗證使用者? 您可以利用 IronWebScraper 中的 HttpIdentity 功能,透過設定 NetworkDomain, NetworkUsername 和 NetworkPassword 等屬性來驗證使用者。 在開發過程中使用網頁快取有什麼好處? 網頁快取功能可讓您快取請求的網頁以供重複使用,避免重複連線至即時網站,有助於節省時間和資源,在開發和測試階段尤其有用。 如何在網路搜刮中管理多重登入階段? IronWebScraper 允許使用數以千計的獨特使用者憑證和瀏覽器引擎來模擬多重登入階段,這有助於防止網站偵測和封鎖 scraper。 網頁搜刮中有哪些進階的節流選項? IronWebScraper 提供 ThrottleMode 設定,可根據主機名稱和 IP 位址智能管理請求節流,確保與共用主機環境的互動。 如何在 IronWebScraper 中使用代理? 要使用代理,請定義代理陣列,並將它們與 IronWebScraper 中的 HttpIdentity 實體關聯,允許透過不同的 IP 位址路由請求,以實現匿名性和存取控制。 IronWebScraper 如何處理請求延遲以防止伺服器超載? IronWebscraper 中的 RateLimitPerHost 設定指定了向特定網域或 IP 位址提出請求之間的最小延遲時間,有助於透過隔離請求來防止伺服器超載。 網路掃描中斷後可以恢復嗎? 是的,IronWebScraper 可以在中斷後使用 Start(CrawlID) 方法恢復刮擦,該方法會保存執行狀態,並從最後保存的點恢復。 如何控制 web scraper 中的 HTTP 並發連線數? 在 IronWebScraper 中,您可以設定 MaxHttpConnectionLimit 屬性,以控制允許開啟的 HTTP 請求總數,協助管理伺服器負載和資源。 有哪些選項可用於記錄網頁搜刮活動? IronWebScraper 允許您使用 LoggingLevel 屬性來設定記錄等級,以啟用全面的記錄,在刮除作業期間進行詳細分析和故障排除。 Darrius Serrant 立即與工程團隊聊天 全棧軟件工程師 (WebOps) Darrius Serrant 擁有邁阿密大學計算機科學學士學位,目前任職於 Iron Software 的全栈 WebOps 市場營銷工程師。從小就迷上編碼,他認為計算既神秘又可接近,是創意和解決問題的完美媒介。在 Iron Software,Darrius 喜歡創造新事物,並簡化複雜概念以便於理解。作為我們的駐場開發者之一,他也自願教學生,分享他的專業知識給下一代。對 Darrius 來說,工作令人滿意因為它被重視且有實際影響。 準備好開始了嗎? Nuget 下載 125,527 | Version: 2025.11 剛發表 免費下載 NuGet 下載總數:125,527 檢視授權