Advanced Webscraping Features in C

Q: Jak mogę uwierzytelniać użytkowników na stronach wymagających loginu w C#?

Możesz wykorzystać funkcję HttpIdentity w IronWebScraper do uwierzytelniania użytkowników, ustawiając właściwości takie jak NetworkDomain , NetworkUsername i NetworkPassword .

Q: Jakie są zaawansowane opcje limitacji w web scrapingu?

IronWebScraper oferuje opcję ThrottleMode , która inteligentnie zarządza limitacją żądań na podstawie nazw hostów i adresów IP, zapewniając respekt dla środowisk współdzielonego hostingu.

Q: Jak IronWebScraper radzi sobie z opóźnieniami żądań, aby zapobiec przeciążeniu serwera?

Ustawienie RateLimitPerHost w IronWebScraper określa minimalne opóźnienie między żądaniami do konkretnej domeny lub adresu IP, pomagając zapobiec przeciążeniu serwera przez rozstawienie żądań.

Q: Jak mogę kontrolować liczbę równoczesnych połączeń HTTP w web scraper?

W IronWebScraper możesz ustawić właściwość MaxHttpConnectionLimit , aby kontrolować całkowitą liczbę dozwolonych otwartych żądań HTTP, pomagając zarządzać obciążeniem serwera i zasobami.

Darrius Serrant

Zaktualizowano:czerwca 28, 2026

Translated

View the article in English

Funkcja HttpIdentity

Niektóre systemy stron internetowych wymagają, aby użytkownik był zalogowany, aby wyświetlić zawartość; w tym przypadku mozemy uzyc HttpIdentity. Oto jak to skonfigurować:

// Create a new instance of HttpIdentity
HttpIdentity id = new HttpIdentity();

// Set the network username and password for authentication
id.NetworkUsername = "username";
id.NetworkPassword = "pwd";

// Add the identity to the collection of identities
Identities.Add(id);

// Create a new instance of HttpIdentity
HttpIdentity id = new HttpIdentity();

// Set the network username and password for authentication
id.NetworkUsername = "username";
id.NetworkPassword = "pwd";

// Add the identity to the collection of identities
Identities.Add(id);

' Create a new instance of HttpIdentity
Dim id As New HttpIdentity()

' Set the network username and password for authentication
id.NetworkUsername = "username"
id.NetworkPassword = "pwd"

' Add the identity to the collection of identities
Identities.Add(id)

$vbLabelText $csharpLabel

Jedną z najbardziej imponujących i potężnych funkcji w IronWebScraper jest możliwość użycia tysięcy unikalnych danych logowania użytkownika i/lub silników przeglądarek do symulowania lub zbierania danych z witryn korzystając z wielu sesji logowania.

public override void Init()
{
    // Set the license key for IronWebScraper
    License.LicenseKey = "LicenseKey";

    // Set the logging level to capture all logs
    this.LoggingLevel = WebScraper.LogLevel.All;

    // Assign the working directory for the output files
    this.WorkingDirectory = AppSetting.GetAppRoot() + @"\ShoppingSiteSample\Output\";

    // Define an array of proxies
    var proxies = "IP-Proxy1:8080,IP-Proxy2:8081".Split(',');

    // Iterate over common Chrome desktop user agents
    foreach (var UA in IronWebScraper.CommonUserAgents.ChromeDesktopUserAgents)
    {
        // Iterate over the proxies
        foreach (var proxy in proxies)
        {
            // Add a new HTTP identity with specific user agent and proxy
            Identities.Add(new HttpIdentity()
            {
                UserAgent = UA,
                UseCookies = true,
                Proxy = proxy
            });
        }
    }

    // Make an initial request to the website with a parse method
    this.Request("http://www.Website.com", Parse);
}

public override void Init()
{
    // Set the license key for IronWebScraper
    License.LicenseKey = "LicenseKey";

    // Set the logging level to capture all logs
    this.LoggingLevel = WebScraper.LogLevel.All;

    // Assign the working directory for the output files
    this.WorkingDirectory = AppSetting.GetAppRoot() + @"\ShoppingSiteSample\Output\";

    // Define an array of proxies
    var proxies = "IP-Proxy1:8080,IP-Proxy2:8081".Split(',');

    // Iterate over common Chrome desktop user agents
    foreach (var UA in IronWebScraper.CommonUserAgents.ChromeDesktopUserAgents)
    {
        // Iterate over the proxies
        foreach (var proxy in proxies)
        {
            // Add a new HTTP identity with specific user agent and proxy
            Identities.Add(new HttpIdentity()
            {
                UserAgent = UA,
                UseCookies = true,
                Proxy = proxy
            });
        }
    }

    // Make an initial request to the website with a parse method
    this.Request("http://www.Website.com", Parse);
}

Public Overrides Sub Init()
	' Set the license key for IronWebScraper
	License.LicenseKey = "LicenseKey"

	' Set the logging level to capture all logs
	Me.LoggingLevel = WebScraper.LogLevel.All

	' Assign the working directory for the output files
	Me.WorkingDirectory = AppSetting.GetAppRoot() & "\ShoppingSiteSample\Output\"

	' Define an array of proxies
	Dim proxies = "IP-Proxy1:8080,IP-Proxy2:8081".Split(","c)

	' Iterate over common Chrome desktop user agents
	For Each UA In IronWebScraper.CommonUserAgents.ChromeDesktopUserAgents
		' Iterate over the proxies
		For Each proxy In proxies
			' Add a new HTTP identity with specific user agent and proxy
			Identities.Add(New HttpIdentity() With {
				.UserAgent = UA,
				.UseCookies = True,
				.Proxy = proxy
			})
		Next proxy
	Next UA

	' Make an initial request to the website with a parse method
	Me.Request("http://www.Website.com", Parse)
End Sub

$vbLabelText $csharpLabel

Masz wiele właściwości dających różne zachowania, uniemożliwiając witrynom blokowanie cię.

Niektóre z tych właściwości to:

NetworkDomain: Domena sieciowa uzywana do uwierzytelniania uzytkownika. Obsługuje sieci Windows, NTLM, Kerberos, Linux, BSD i Mac OS X. Nalezy uzywac z NetworkUsername i NetworkPassword.
NetworkUsername: Sieciowa/nazwe uzytkownika http, ktora bedzie uzywana do uwierzytelniania uzytkownika. Obsługuje HTTP, sieci Windows, NTLM, Kerberos, sieci Linux, sieci BSD i Mac OS.
NetworkPassword: Sieciowe/haslo http, ktore bedzie uzywane do uwierzytelniania uzytkownika. Obsługuje HTTP, sieci Windows, NTLM, Kerberos, sieci Linux, sieci BSD i Mac OS.
Proxy: Ustawianie ustawien proxy.
UserAgent: Ustawianie silnika przegladarki (np. Chrome desktop, Chrome mobile, Chrome tablet, IE, i Firefox itd.).
HttpRequestHeaders: Dla niestandardowych wartosci naglowka, ktore beda uzywane z ta tozsamoscia, akceptuje obiekt slownikowy Dictionary<string, string>.
UseCookies: Wlaczenie/wylaczanie uzywania cookies.

IronWebScraper uruchamia scraper korzystając z losowych tożsamości. Jeśli musimy określić użycie konkretnej tożsamości do analizy strony, możemy to zrobić:

public override void Init()
{
    // Set the license key for IronWebScraper
    License.LicenseKey = "LicenseKey";

    // Set the logging level to capture all logs
    this.LoggingLevel = WebScraper.LogLevel.All;

    // Assign the working directory for the output files
    this.WorkingDirectory = AppSetting.GetAppRoot() + @"\ShoppingSiteSample\Output\";

    // Create a new instance of HttpIdentity
    HttpIdentity identity = new HttpIdentity();

    // Set the network username and password for authentication
    identity.NetworkUsername = "username";
    identity.NetworkPassword = "pwd";

    // Add the identity to the collection of identities
    Identities.Add(identity);

    // Make a request to the website with the specified identity
    this.Request("http://www.Website.com", Parse, identity);
}

public override void Init()
{
    // Set the license key for IronWebScraper
    License.LicenseKey = "LicenseKey";

    // Set the logging level to capture all logs
    this.LoggingLevel = WebScraper.LogLevel.All;

    // Assign the working directory for the output files
    this.WorkingDirectory = AppSetting.GetAppRoot() + @"\ShoppingSiteSample\Output\";

    // Create a new instance of HttpIdentity
    HttpIdentity identity = new HttpIdentity();

    // Set the network username and password for authentication
    identity.NetworkUsername = "username";
    identity.NetworkPassword = "pwd";

    // Add the identity to the collection of identities
    Identities.Add(identity);

    // Make a request to the website with the specified identity
    this.Request("http://www.Website.com", Parse, identity);
}

Public Overrides Sub Init()
	' Set the license key for IronWebScraper
	License.LicenseKey = "LicenseKey"

	' Set the logging level to capture all logs
	Me.LoggingLevel = WebScraper.LogLevel.All

	' Assign the working directory for the output files
	Me.WorkingDirectory = AppSetting.GetAppRoot() & "\ShoppingSiteSample\Output\"

	' Create a new instance of HttpIdentity
	Dim identity As New HttpIdentity()

	' Set the network username and password for authentication
	identity.NetworkUsername = "username"
	identity.NetworkPassword = "pwd"

	' Add the identity to the collection of identities
	Identities.Add(identity)

	' Make a request to the website with the specified identity
	Me.Request("http://www.Website.com", Parse, identity)
End Sub

$vbLabelText $csharpLabel

Włącz funkcję pamięci podręcznej

Ta funkcja służy do buforowania żądanych stron. Często używana w fazach rozwoju i testowania, umożliwiając programistom buforowanie wymaganych stron do ponownego użycia po zaktualizowaniu kodu. Umożliwia wykonanie kodu na stronie buforowanej po ponownym uruchomieniu webscrapera bez potrzeby łączenia się z aktywną stroną za każdym razem (action-replay).

Mozesz go uzyc w metodzie Init():

// Enable web cache without an expiration time
EnableWebCache();

// OR enable web cache with a specified expiration time
EnableWebCache(new TimeSpan(1, 30, 30));

// Enable web cache without an expiration time
EnableWebCache();

// OR enable web cache with a specified expiration time
EnableWebCache(new TimeSpan(1, 30, 30));

' Enable web cache without an expiration time
EnableWebCache()

' OR enable web cache with a specified expiration time
EnableWebCache(New TimeSpan(1, 30, 30))

$vbLabelText $csharpLabel

Beda zapisywac Twoje dane z cache do folderu WebCache w katalogu roboczym.

public override void Init()
{
    // Set the license key for IronWebScraper
    License.LicenseKey = "LicenseKey";

    // Set the logging level to capture all logs
    this.LoggingLevel = WebScraper.LogLevel.All;

    // Assign the working directory for the output files
    this.WorkingDirectory = AppSetting.GetAppRoot() + @"\ShoppingSiteSample\Output\";

    // Enable web cache with a specific expiration time of 1 hour, 30 minutes, and 30 seconds
    EnableWebCache(new TimeSpan(1, 30, 30));

    // Make an initial request to the website with a parse method
    this.Request("http://www.Website.com", Parse);
}

public override void Init()
{
    // Set the license key for IronWebScraper
    License.LicenseKey = "LicenseKey";

    // Set the logging level to capture all logs
    this.LoggingLevel = WebScraper.LogLevel.All;

    // Assign the working directory for the output files
    this.WorkingDirectory = AppSetting.GetAppRoot() + @"\ShoppingSiteSample\Output\";

    // Enable web cache with a specific expiration time of 1 hour, 30 minutes, and 30 seconds
    EnableWebCache(new TimeSpan(1, 30, 30));

    // Make an initial request to the website with a parse method
    this.Request("http://www.Website.com", Parse);
}

Public Overrides Sub Init()
	' Set the license key for IronWebScraper
	License.LicenseKey = "LicenseKey"

	' Set the logging level to capture all logs
	Me.LoggingLevel = WebScraper.LogLevel.All

	' Assign the working directory for the output files
	Me.WorkingDirectory = AppSetting.GetAppRoot() & "\ShoppingSiteSample\Output\"

	' Enable web cache with a specific expiration time of 1 hour, 30 minutes, and 30 seconds
	EnableWebCache(New TimeSpan(1, 30, 30))

	' Make an initial request to the website with a parse method
	Me.Request("http://www.Website.com", Parse)
End Sub

$vbLabelText $csharpLabel

IronWebScraper ma rowniez funkcje pozwalajace na kontynuowanie pobierania danych po ponownym uruchomieniu kodu, ustawiajac nazwe procesu uruchamiania silnika za pomoca Start(CrawlID).

static void Main(string[] args)
{
    // Create an object from the Scraper class
    EngineScraper scrape = new EngineScraper();

    // Start the scraping process with the specified crawl ID
    scrape.Start("enginestate");
}

static void Main(string[] args)
{
    // Create an object from the Scraper class
    EngineScraper scrape = new EngineScraper();

    // Start the scraping process with the specified crawl ID
    scrape.Start("enginestate");
}

Shared Sub Main(ByVal args() As String)
	' Create an object from the Scraper class
	Dim scrape As New EngineScraper()

	' Start the scraping process with the specified crawl ID
	scrape.Start("enginestate")
End Sub

$vbLabelText $csharpLabel

Zadanie i odpowiedz do wykonania beda zapisywane w folderze SavedState wewnatrz katalogu roboczego.

Regulacja obciążenia

Możemy kontrolować minimalną i maksymalną liczbę połączeń oraz prędkość połączeń na domenę.

public override void Init()
{
    // Set the license key for IronWebScraper
    License.LicenseKey = "LicenseKey";

    // Set the logging level to capture all logs
    this.LoggingLevel = WebScraper.LogLevel.All;

    // Assign the working directory for the output files
    this.WorkingDirectory = AppSetting.GetAppRoot() + @"\ShoppingSiteSample\Output\";

    // Set the total number of allowed open HTTP requests (threads)
    this.MaxHttpConnectionLimit = 80;

    // Set minimum polite delay (pause) between requests to a given domain or IP address
    this.RateLimitPerHost = TimeSpan.FromMilliseconds(50);

    // Set the allowed number of concurrent HTTP requests (threads) per hostname or IP address
    this.OpenConnectionLimitPerHost = 25;

    // Do not obey the robots.txt files
    this.ObeyRobotsDotTxt = false;

    // Makes the WebScraper intelligently throttle requests not only by hostname, but also by host servers' IP addresses
    this.ThrottleMode = Throttle.ByDomainHostName;

    // Make an initial request to the website with a parse method
    this.Request("https://www.Website.com", Parse);
}

public override void Init()
{
    // Set the license key for IronWebScraper
    License.LicenseKey = "LicenseKey";

    // Set the logging level to capture all logs
    this.LoggingLevel = WebScraper.LogLevel.All;

    // Assign the working directory for the output files
    this.WorkingDirectory = AppSetting.GetAppRoot() + @"\ShoppingSiteSample\Output\";

    // Set the total number of allowed open HTTP requests (threads)
    this.MaxHttpConnectionLimit = 80;

    // Set minimum polite delay (pause) between requests to a given domain or IP address
    this.RateLimitPerHost = TimeSpan.FromMilliseconds(50);

    // Set the allowed number of concurrent HTTP requests (threads) per hostname or IP address
    this.OpenConnectionLimitPerHost = 25;

    // Do not obey the robots.txt files
    this.ObeyRobotsDotTxt = false;

    // Makes the WebScraper intelligently throttle requests not only by hostname, but also by host servers' IP addresses
    this.ThrottleMode = Throttle.ByDomainHostName;

    // Make an initial request to the website with a parse method
    this.Request("https://www.Website.com", Parse);
}

Public Overrides Sub Init()
	' Set the license key for IronWebScraper
	License.LicenseKey = "LicenseKey"

	' Set the logging level to capture all logs
	Me.LoggingLevel = WebScraper.LogLevel.All

	' Assign the working directory for the output files
	Me.WorkingDirectory = AppSetting.GetAppRoot() & "\ShoppingSiteSample\Output\"

	' Set the total number of allowed open HTTP requests (threads)
	Me.MaxHttpConnectionLimit = 80

	' Set minimum polite delay (pause) between requests to a given domain or IP address
	Me.RateLimitPerHost = TimeSpan.FromMilliseconds(50)

	' Set the allowed number of concurrent HTTP requests (threads) per hostname or IP address
	Me.OpenConnectionLimitPerHost = 25

	' Do not obey the robots.txt files
	Me.ObeyRobotsDotTxt = False

	' Makes the WebScraper intelligently throttle requests not only by hostname, but also by host servers' IP addresses
	Me.ThrottleMode = Throttle.ByDomainHostName

	' Make an initial request to the website with a parse method
	Me.Request("https://www.Website.com", Parse)
End Sub

$vbLabelText $csharpLabel

Właściwości regulacji obciążenia

MaxHttpConnectionLimit Całkowita liczba dozwolonych otwartych żądań HTTP (wątków)
RateLimitPerHost Minimalne uprzejme opóźnienie lub pauza (w milisekundach) pomiędzy żądaniami do danej domeny lub adresu IP
OpenConnectionLimitPerHost Dozwolona liczba równoczesnych żądań HTTP (wątków) na nazwę hosta
ThrottleMode WebScraper inteligentnie reguluje żądania nie tylko według nazwy hosta, ale także według adresów IP serwerów hosta. Jest to uprzejme w przypadku, gdy wiele skanowanych domen jest hostowanych na tej samej maszynie.

Rozpocznij korzystanie z IronWebScraper

Rozpocznij używanie IronWebScraper w swoim projekcie już dziś dzięki darmowej wersji próbnej.

Pierwszy krok:

Często Zadawane Pytania

Jak mogę uwierzytelniać użytkowników na stronach wymagających loginu w C#?

Możesz wykorzystać funkcję HttpIdentity w IronWebScraper do uwierzytelniania użytkowników, ustawiając właściwości takie jak NetworkDomain, NetworkUsername i NetworkPassword.

Jaka jest korzyść z używania web cache podczas tworzenia?

Funkcja web cache pozwala na buforowanie żądanych stron do ponownego użycia, co pomaga zaoszczędzić czas i zasoby, unikając powtarzających się połączeń do żywych stron, szczególnie przydatne w fazach tworzenia i testowania.

Jak mogę zarządzać wieloma sesjami logowania w web scrapingu?

IronWebScraper pozwala na użycie tysięcy unikalnych danych uwierzytelniania użytkowników i silników przeglądarek do symulacji wielu sesji logowania, co pomaga zapobiec wykrywaniu i blokowaniu scraper przez strony internetowe.

Jakie są zaawansowane opcje limitacji w web scrapingu?

IronWebScraper oferuje opcję ThrottleMode, która inteligentnie zarządza limitacją żądań na podstawie nazw hostów i adresów IP, zapewniając respekt dla środowisk współdzielonego hostingu.

Jak mogę używać proxy z IronWebScraper?

Aby użyć proxy, zdefiniuj tablicę proxy i powiąż je z instancjami HttpIdentity w IronWebScraper, co pozwala przekierować żądania przez różne adresy IP dla anonimowości i kontroli dostępu.

Jak IronWebScraper radzi sobie z opóźnieniami żądań, aby zapobiec przeciążeniu serwera?

Ustawienie RateLimitPerHost w IronWebScraper określa minimalne opóźnienie między żądaniami do konkretnej domeny lub adresu IP, pomagając zapobiec przeciążeniu serwera przez rozstawienie żądań.

Czy web scraping można wznowić po przerwaniu?

Tak, IronWebScraper może wznowić scraping po przerwaniu, korzystając z metody Start(CrawlID), która zapisuje stan wykonania i wznawia od ostatniego zapisanego punktu.

Jak mogę kontrolować liczbę równoczesnych połączeń HTTP w web scraper?

W IronWebScraper możesz ustawić właściwość MaxHttpConnectionLimit, aby kontrolować całkowitą liczbę dozwolonych otwartych żądań HTTP, pomagając zarządzać obciążeniem serwera i zasobami.

Jakie opcje są dostępne do logowania aktywności web scrapingu?

IronWebScraper pozwala ustawić poziom logowania przy użyciu właściwości LoggingLevel, umożliwiając kompleksowe logowanie do szczegółowej analizy oraz diagnostyki podczas operacji zscrapowania.

Darrius Serrant

Czat z zespołem inżynierów teraz

Inżynier oprogramowania Full Stack (WebOps)

Darrius Serrant posiada tytuł licencjata z informatyki z Uniwersytetu Miami i pracuje jako Full Stack WebOps Marketing Engineer w Iron Software. Już od młodych lat zainteresował się kodowaniem, postrzegając informatykę jako zarówno tajemniczą, jak i dostępną, co czyni ją doskonałym medium dla kreatywności ...

Czytaj więcej

Gotowy, aby rozpocząć?

Nuget Pliki do pobrania 141,288 | Wersja: 2026.7 właśnie wydany

Zobacz licencje

Wciąż przewijasz?

Czy chcesz szybko dowodu? PM > Install-Package IronWebScraper
uruchom przykład obserwuj, jak twoja docelowa strona przekształca się w dane strukturalne.

Zobacz licencje

Zwycięstwo klienta:

Podkreślaj programistę:

Webinary:

Rozpocznij darmowy 30-dniowy okres próbny

Na tej stronie

Advanced Webscraping Features in C

Funkcja HttpIdentity

Włącz funkcję pamięci podręcznej

Regulacja obciążenia

Właściwości regulacji obciążenia

Rozpocznij korzystanie z IronWebScraper

Często Zadawane Pytania

Jak mogę uwierzytelniać użytkowników na stronach wymagających loginu w C#?

Jaka jest korzyść z używania web cache podczas tworzenia?

Jak mogę zarządzać wieloma sesjami logowania w web scrapingu?

Jakie są zaawansowane opcje limitacji w web scrapingu?

Jak mogę używać proxy z IronWebScraper?

Jak IronWebScraper radzi sobie z opóźnieniami żądań, aby zapobiec przeciążeniu serwera?

Czy web scraping można wznowić po przerwaniu?

Jak mogę kontrolować liczbę równoczesnych połączeń HTTP w web scraper?

Jakie opcje są dostępne do logowania aktywności web scrapingu?

Wciąż przewijasz?

Twój klucz licencyjny został dostarczony do Twojej skrzynki odbiorczej

Twoje zgłoszenie demo jest przetwarzane.

Zespół wsparcia Iron

Rozpocznij darmowy 30-dniowy okres próbny

Na tej stronie

Advanced Webscraping Features in C

Funkcja HttpIdentity

Włącz funkcję pamięci podręcznej

Regulacja obciążenia

Właściwości regulacji obciążenia

Rozpocznij korzystanie z IronWebScraper

Często Zadawane Pytania

Jak mogę uwierzytelniać użytkowników na stronach wymagających loginu w C#?

Jaka jest korzyść z używania web cache podczas tworzenia?

Jak mogę zarządzać wieloma sesjami logowania w web scrapingu?

Jakie są zaawansowane opcje limitacji w web scrapingu?

Jak mogę używać proxy z IronWebScraper?

Jak IronWebScraper radzi sobie z opóźnieniami żądań, aby zapobiec przeciążeniu serwera?

Czy web scraping można wznowić po przerwaniu?

Jak mogę kontrolować liczbę równoczesnych połączeń HTTP w web scraper?

Jakie opcje są dostępne do logowania aktywności web scrapingu?

Wciąż przewijasz?

Następny krok: Rozpocznij darmową 30-dniową wersję próbną

Thank You

Następny krok: Rozpocznij darmową 30-dniową wersję próbną

Chcesz BEZPŁATNIE wdrożyć IronSuite w rzeczywistym projekcie?

Co jest w zestawie?

Twój klucz licencyjny został dostarczony do Twojej skrzynki odbiorczej

Twoje zgłoszenie demo jest przetwarzane.

Zaufane przez miliony inżynierów na całym świecie

Zespół wsparcia Iron