Recursos Avançados de Web Scraping em C

Q: Como posso autenticar usuários em sites que exigem login usando C#?

Você pode utilizar o recurso HttpIdentity no IronWebScraper para autenticar usuários configurando propriedades como NetworkDomain , NetworkUsername e NetworkPassword .

Q: Quais são as opções avançadas de limitação de taxa na extração de dados da web?

O IronWebScraper oferece uma configuração ThrottleMode que gerencia de forma inteligente a limitação de requisições com base em nomes de host e endereços IP, garantindo uma interação adequada com ambientes de hospedagem compartilhada.

Q: Como o IronWebScraper lida com atrasos nas requisições para evitar sobrecarga do servidor?

A configuração RateLimitPerHost no IronWebScraper especifica o atraso mínimo entre as solicitações para um domínio ou endereço IP específico, ajudando a evitar a sobrecarga do servidor ao espaçar as solicitações.

Q: Como faço para controlar o número de conexões HTTP simultâneas em um web scraper?

No IronWebScraper, você pode definir a propriedade MaxHttpConnectionLimit para controlar o número total de solicitações HTTP abertas permitidas, ajudando a gerenciar a carga e os recursos do servidor.

Curtis Chau

Atualizado:janeiro 31, 2026

Translated

View the article in English

Recurso HttpIdentity

Alguns sistemas de websites exigem que o usuário faça login para visualizar o conteúdo; neste caso, podemos usar um HttpIdentity. Veja como você pode configurar:

// Create a new instance of HttpIdentity
HttpIdentity id = new HttpIdentity();

// Set the network username and password for authentication
id.NetworkUsername = "username";
id.NetworkPassword = "pwd";

// Add the identity to the collection of identities
Identities.Add(id);

// Create a new instance of HttpIdentity
HttpIdentity id = new HttpIdentity();

// Set the network username and password for authentication
id.NetworkUsername = "username";
id.NetworkPassword = "pwd";

// Add the identity to the collection of identities
Identities.Add(id);

' Create a new instance of HttpIdentity
Dim id As New HttpIdentity()

' Set the network username and password for authentication
id.NetworkUsername = "username"
id.NetworkPassword = "pwd"

' Add the identity to the collection of identities
Identities.Add(id)

$vbLabelText $csharpLabel

Uma das funcionalidades mais poderosas do IronWebScraper para web scraping avançado é a capacidade de usar milhares de credenciais de usuário exclusivas e/ou mecanismos de navegador para falsificar ou extrair dados de sites usando várias sessões de login.

public override void Init()
{
    // Set the license key for IronWebScraper
    License.LicenseKey = "LicenseKey";

    // Set the logging level to capture all logs
    this.LoggingLevel = WebScraper.LogLevel.All;

    // Assign the working directory for the output files
    this.WorkingDirectory = AppSetting.GetAppRoot() + @"\ShoppingSiteSample\Output\";

    // Define an array of proxies
    var proxies = "IP-Proxy1:8080,IP-Proxy2:8081".Split(',');

    // Iterate over common Chrome desktop user agents
    foreach (var UA in IronWebScraper.CommonUserAgents.ChromeDesktopUserAgents)
    {
        // Iterate over the proxies
        foreach (var proxy in proxies)
        {
            // Add a new HTTP identity with specific user agent and proxy
            Identities.Add(new HttpIdentity()
            {
                UserAgent = UA,
                UseCookies = true,
                Proxy = proxy
            });
        }
    }

    // Make an initial request to the website with a parse method
    this.Request("http://www.Website.com", Parse);
}

public override void Init()
{
    // Set the license key for IronWebScraper
    License.LicenseKey = "LicenseKey";

    // Set the logging level to capture all logs
    this.LoggingLevel = WebScraper.LogLevel.All;

    // Assign the working directory for the output files
    this.WorkingDirectory = AppSetting.GetAppRoot() + @"\ShoppingSiteSample\Output\";

    // Define an array of proxies
    var proxies = "IP-Proxy1:8080,IP-Proxy2:8081".Split(',');

    // Iterate over common Chrome desktop user agents
    foreach (var UA in IronWebScraper.CommonUserAgents.ChromeDesktopUserAgents)
    {
        // Iterate over the proxies
        foreach (var proxy in proxies)
        {
            // Add a new HTTP identity with specific user agent and proxy
            Identities.Add(new HttpIdentity()
            {
                UserAgent = UA,
                UseCookies = true,
                Proxy = proxy
            });
        }
    }

    // Make an initial request to the website with a parse method
    this.Request("http://www.Website.com", Parse);
}

Public Overrides Sub Init()
	' Set the license key for IronWebScraper
	License.LicenseKey = "LicenseKey"

	' Set the logging level to capture all logs
	Me.LoggingLevel = WebScraper.LogLevel.All

	' Assign the working directory for the output files
	Me.WorkingDirectory = AppSetting.GetAppRoot() & "\ShoppingSiteSample\Output\"

	' Define an array of proxies
	Dim proxies = "IP-Proxy1:8080,IP-Proxy2:8081".Split(","c)

	' Iterate over common Chrome desktop user agents
	For Each UA In IronWebScraper.CommonUserAgents.ChromeDesktopUserAgents
		' Iterate over the proxies
		For Each proxy In proxies
			' Add a new HTTP identity with specific user agent and proxy
			Identities.Add(New HttpIdentity() With {
				.UserAgent = UA,
				.UseCookies = True,
				.Proxy = proxy
			})
		Next proxy
	Next UA

	' Make an initial request to the website with a parse method
	Me.Request("http://www.Website.com", Parse)
End Sub

$vbLabelText $csharpLabel

Você possui diversas propriedades que lhe conferem comportamentos diferentes, impedindo assim que sites o bloqueiem.

Algumas dessas propriedades incluem:

NetworkDomain: O domínio da rede a ser usado para autenticação do usuário. Compatível com redes Windows, NTLM, Kerberos, Linux, BSD e Mac OS X. Deve ser usado com NetworkUsername e NetworkPassword.
NetworkUsername: O nome de usuário de rede/http a ser usado para autenticação do usuário. Compatível com HTTP, redes Windows, NTLM, Kerberos, redes Linux, redes BSD e Mac OS.
NetworkPassword: A senha de rede/http a ser usada para autenticação do usuário. Compatível com HTTP, redes Windows, NTLM, Kerberos, redes Linux, redes BSD e Mac OS.
Proxy: Para definir configurações de proxy.
UserAgent: Para definir um mecanismo de navegador (por exemplo, Chrome desktop, Chrome mobile, Chrome tablet, IE, e Firefox, etc.).
HttpRequestHeaders: Para valores de cabeçalho personalizados que serão usados com esta identidade, aceita um objeto de dicionário Dictionary<string, string>.
UseCookies: Ativar/desativar o uso de cookies.

O IronWebScraper executa o scraper usando identidades aleatórias. Se precisarmos especificar o uso de uma identidade específica para analisar uma página, podemos fazê-lo:

public override void Init()
{
    // Set the license key for IronWebScraper
    License.LicenseKey = "LicenseKey";

    // Set the logging level to capture all logs
    this.LoggingLevel = WebScraper.LogLevel.All;

    // Assign the working directory for the output files
    this.WorkingDirectory = AppSetting.GetAppRoot() + @"\ShoppingSiteSample\Output\";

    // Create a new instance of HttpIdentity
    HttpIdentity identity = new HttpIdentity();

    // Set the network username and password for authentication
    identity.NetworkUsername = "username";
    identity.NetworkPassword = "pwd";

    // Add the identity to the collection of identities
    Identities.Add(identity);

    // Make a request to the website with the specified identity
    this.Request("http://www.Website.com", Parse, identity);
}

public override void Init()
{
    // Set the license key for IronWebScraper
    License.LicenseKey = "LicenseKey";

    // Set the logging level to capture all logs
    this.LoggingLevel = WebScraper.LogLevel.All;

    // Assign the working directory for the output files
    this.WorkingDirectory = AppSetting.GetAppRoot() + @"\ShoppingSiteSample\Output\";

    // Create a new instance of HttpIdentity
    HttpIdentity identity = new HttpIdentity();

    // Set the network username and password for authentication
    identity.NetworkUsername = "username";
    identity.NetworkPassword = "pwd";

    // Add the identity to the collection of identities
    Identities.Add(identity);

    // Make a request to the website with the specified identity
    this.Request("http://www.Website.com", Parse, identity);
}

Public Overrides Sub Init()
	' Set the license key for IronWebScraper
	License.LicenseKey = "LicenseKey"

	' Set the logging level to capture all logs
	Me.LoggingLevel = WebScraper.LogLevel.All

	' Assign the working directory for the output files
	Me.WorkingDirectory = AppSetting.GetAppRoot() & "\ShoppingSiteSample\Output\"

	' Create a new instance of HttpIdentity
	Dim identity As New HttpIdentity()

	' Set the network username and password for authentication
	identity.NetworkUsername = "username"
	identity.NetworkPassword = "pwd"

	' Add the identity to the collection of identities
	Identities.Add(identity)

	' Make a request to the website with the specified identity
	Me.Request("http://www.Website.com", Parse, identity)
End Sub

$vbLabelText $csharpLabel

Ative o recurso de cache da Web

Essa funcionalidade é usada para armazenar em cache as páginas solicitadas. É frequentemente utilizado nas fases de desenvolvimento e teste de web crawler, permitindo que os desenvolvedores armazenem em cache as páginas necessárias para reutilização após a atualização do código. Isso permite que você execute seu código em páginas em cache após reiniciar seu web scraper, sem precisar se conectar ao site ativo todas as vezes (ação-replay).

Você pode usá-lo no método Init():

// Enable web cache without an expiration time
EnableWebCache();

// OR enable web cache with a specified expiration time
EnableWebCache(new TimeSpan(1, 30, 30));

// Enable web cache without an expiration time
EnableWebCache();

// OR enable web cache with a specified expiration time
EnableWebCache(new TimeSpan(1, 30, 30));

' Enable web cache without an expiration time
EnableWebCache()

' OR enable web cache with a specified expiration time
EnableWebCache(New TimeSpan(1, 30, 30))

$vbLabelText $csharpLabel

Ele salvará seus dados em cache na pasta WebCache sob a pasta do diretório de trabalho.

public override void Init()
{
    // Set the license key for IronWebScraper
    License.LicenseKey = "LicenseKey";

    // Set the logging level to capture all logs
    this.LoggingLevel = WebScraper.LogLevel.All;

    // Assign the working directory for the output files
    this.WorkingDirectory = AppSetting.GetAppRoot() + @"\ShoppingSiteSample\Output\";

    // Enable web cache with a specific expiration time of 1 hour, 30 minutes, and 30 seconds
    EnableWebCache(new TimeSpan(1, 30, 30));

    // Make an initial request to the website with a parse method
    this.Request("http://www.Website.com", Parse);
}

public override void Init()
{
    // Set the license key for IronWebScraper
    License.LicenseKey = "LicenseKey";

    // Set the logging level to capture all logs
    this.LoggingLevel = WebScraper.LogLevel.All;

    // Assign the working directory for the output files
    this.WorkingDirectory = AppSetting.GetAppRoot() + @"\ShoppingSiteSample\Output\";

    // Enable web cache with a specific expiration time of 1 hour, 30 minutes, and 30 seconds
    EnableWebCache(new TimeSpan(1, 30, 30));

    // Make an initial request to the website with a parse method
    this.Request("http://www.Website.com", Parse);
}

Public Overrides Sub Init()
	' Set the license key for IronWebScraper
	License.LicenseKey = "LicenseKey"

	' Set the logging level to capture all logs
	Me.LoggingLevel = WebScraper.LogLevel.All

	' Assign the working directory for the output files
	Me.WorkingDirectory = AppSetting.GetAppRoot() & "\ShoppingSiteSample\Output\"

	' Enable web cache with a specific expiration time of 1 hour, 30 minutes, and 30 seconds
	EnableWebCache(New TimeSpan(1, 30, 30))

	' Make an initial request to the website with a parse method
	Me.Request("http://www.Website.com", Parse)
End Sub

$vbLabelText $csharpLabel

O IronWebScraper também possui recursos para habilitar seu mecanismo a continuar a raspagem após reiniciar o código configurando o nome do processo de inicialização do mecanismo usando Start(CrawlID).

static void Main(string[] args)
{
    // Create an object from the Scraper class
    EngineScraper scrape = new EngineScraper();

    // Start the scraping process with the specified crawl ID
    scrape.Start("enginestate");
}

static void Main(string[] args)
{
    // Create an object from the Scraper class
    EngineScraper scrape = new EngineScraper();

    // Start the scraping process with the specified crawl ID
    scrape.Start("enginestate");
}

Shared Sub Main(ByVal args() As String)
	' Create an object from the Scraper class
	Dim scrape As New EngineScraper()

	' Start the scraping process with the specified crawl ID
	scrape.Start("enginestate")
End Sub

$vbLabelText $csharpLabel

A solicitação de execução e a resposta serão salvas na pasta SavedState dentro do diretório de trabalho.

Limitação de velocidade

Podemos controlar o número mínimo e máximo de conexões e a velocidade de conexão por domínio.

public override void Init()
{
    // Set the license key for IronWebScraper
    License.LicenseKey = "LicenseKey";

    // Set the logging level to capture all logs
    this.LoggingLevel = WebScraper.LogLevel.All;

    // Assign the working directory for the output files
    this.WorkingDirectory = AppSetting.GetAppRoot() + @"\ShoppingSiteSample\Output\";

    // Set the total number of allowed open HTTP requests (threads)
    this.MaxHttpConnectionLimit = 80;

    // Set minimum polite delay (pause) between requests to a given domain or IP address
    this.RateLimitPerHost = TimeSpan.FromMilliseconds(50);

    // Set the allowed number of concurrent HTTP requests (threads) per hostname or IP address
    this.OpenConnectionLimitPerHost = 25;

    // Do not obey the robots.txt files
    this.ObeyRobotsDotTxt = false;

    // Makes the WebScraper intelligently throttle requests not only by hostname, but also by host servers' IP addresses
    this.ThrottleMode = Throttle.ByDomainHostName;

    // Make an initial request to the website with a parse method
    this.Request("https://www.Website.com", Parse);
}

public override void Init()
{
    // Set the license key for IronWebScraper
    License.LicenseKey = "LicenseKey";

    // Set the logging level to capture all logs
    this.LoggingLevel = WebScraper.LogLevel.All;

    // Assign the working directory for the output files
    this.WorkingDirectory = AppSetting.GetAppRoot() + @"\ShoppingSiteSample\Output\";

    // Set the total number of allowed open HTTP requests (threads)
    this.MaxHttpConnectionLimit = 80;

    // Set minimum polite delay (pause) between requests to a given domain or IP address
    this.RateLimitPerHost = TimeSpan.FromMilliseconds(50);

    // Set the allowed number of concurrent HTTP requests (threads) per hostname or IP address
    this.OpenConnectionLimitPerHost = 25;

    // Do not obey the robots.txt files
    this.ObeyRobotsDotTxt = false;

    // Makes the WebScraper intelligently throttle requests not only by hostname, but also by host servers' IP addresses
    this.ThrottleMode = Throttle.ByDomainHostName;

    // Make an initial request to the website with a parse method
    this.Request("https://www.Website.com", Parse);
}

Public Overrides Sub Init()
	' Set the license key for IronWebScraper
	License.LicenseKey = "LicenseKey"

	' Set the logging level to capture all logs
	Me.LoggingLevel = WebScraper.LogLevel.All

	' Assign the working directory for the output files
	Me.WorkingDirectory = AppSetting.GetAppRoot() & "\ShoppingSiteSample\Output\"

	' Set the total number of allowed open HTTP requests (threads)
	Me.MaxHttpConnectionLimit = 80

	' Set minimum polite delay (pause) between requests to a given domain or IP address
	Me.RateLimitPerHost = TimeSpan.FromMilliseconds(50)

	' Set the allowed number of concurrent HTTP requests (threads) per hostname or IP address
	Me.OpenConnectionLimitPerHost = 25

	' Do not obey the robots.txt files
	Me.ObeyRobotsDotTxt = False

	' Makes the WebScraper intelligently throttle requests not only by hostname, but also by host servers' IP addresses
	Me.ThrottleMode = Throttle.ByDomainHostName

	' Make an initial request to the website with a parse method
	Me.Request("https://www.Website.com", Parse)
End Sub

$vbLabelText $csharpLabel

Propriedades de estrangulamento

MaxHttpConnectionLimit
Número total de requisições HTTP abertas permitidas (threads)
RateLimitPerHost
Tempo mínimo de espera ou pausa (em milissegundos) entre solicitações para um determinado domínio ou endereço IP.
OpenConnectionLimitPerHost
Número permitido de requisições HTTP simultâneas (threads) por nome de host.
ThrottleMode
Faz com que o WebScraper limite as solicitações de forma inteligente, não apenas pelo nome do host, mas também pelos endereços IP dos servidores de hospedagem. Isso é uma medida de cortesia caso vários domínios coletados estejam hospedados na mesma máquina.

Comece a usar o IronWebscraper

!{--010011000100100101000010010100100100000101010010010110010101111101010011010101000100000101010010010101000101111101010001010010010010010010100000101001100010111110100001001001100010011110100001101001011--}

Perguntas frequentes

Como posso autenticar usuários em sites que exigem login usando C#?

Você pode utilizar o recurso HttpIdentity no IronWebScraper para autenticar usuários configurando propriedades como NetworkDomain , NetworkUsername e NetworkPassword .

Qual a vantagem de usar cache web durante o desenvolvimento?

O recurso de cache da web permite armazenar em cache as páginas solicitadas para reutilização, o que ajuda a economizar tempo e recursos, evitando conexões repetidas a sites ativos, sendo especialmente útil durante as fases de desenvolvimento e teste.

Como posso gerenciar várias sessões de login em web scraping?

O IronWebScraper permite o uso de milhares de credenciais de usuário exclusivas e mecanismos de navegador para simular várias sessões de login, o que ajuda a impedir que os sites detectem e bloqueiem o programa.

Quais são as opções avançadas de limitação de taxa na extração de dados da web?

O IronWebScraper oferece uma configuração ThrottleMode que gerencia de forma inteligente a limitação de requisições com base em nomes de host e endereços IP, garantindo uma interação adequada com ambientes de hospedagem compartilhada.

Como posso usar um proxy com o IronWebScraper?

Para usar um proxy, defina uma matriz de proxies e associe-os a instâncias HttpIdentity no IronWebScraper, permitindo que as solicitações sejam roteadas por meio de diferentes endereços IP para anonimato e controle de acesso.

Como o IronWebScraper lida com atrasos nas requisições para evitar sobrecarga do servidor?

A configuração RateLimitPerHost no IronWebScraper especifica o atraso mínimo entre as solicitações para um domínio ou endereço IP específico, ajudando a evitar a sobrecarga do servidor ao espaçar as solicitações.

É possível retomar a extração de dados da web após uma interrupção?

Sim, o IronWebScraper pode retomar a coleta de dados após uma interrupção usando o método Start(CrawlID) , que salva o estado da execução e retoma do último ponto salvo.

Como faço para controlar o número de conexões HTTP simultâneas em um web scraper?

No IronWebScraper, você pode definir a propriedade MaxHttpConnectionLimit para controlar o número total de solicitações HTTP abertas permitidas, ajudando a gerenciar a carga e os recursos do servidor.

Quais opções estão disponíveis para registrar atividades de web scraping?

O IronWebScraper permite definir o nível de registro usando a propriedade LoggingLevel , possibilitando um registro abrangente para análise detalhada e solução de problemas durante as operações de extração de dados.

Curtis Chau

Converse agora mesmo com a equipe de engenharia.

Redator Técnico

Curtis Chau é bacharel em Ciência da Computação (Universidade Carleton) e se especializa em desenvolvimento front-end, com experiência em Node.js, TypeScript, JavaScript e React. Apaixonado por criar interfaces de usuário intuitivas e esteticamente agradáveis, Curtis gosta de trabalhar com frameworks modernos e criar manuais ...

Ainda está rolando a tela?

Quer provas rápidas? PM > Install-Package IronWebScraper
executar um exemplo Observe como seu site alvo se transforma em dados estruturados.

Ver licenças

Destaque do cliente:

Destaque do desenvolvedor:

Webinários:

Experimente gratuitamente por 30 dias.

Nesta página

Recursos Avançados de Web Scraping em C

Recurso HttpIdentity

Ative o recurso de cache da Web

Limitação de velocidade

Propriedades de estrangulamento

Comece a usar o IronWebscraper

Perguntas frequentes

Como posso autenticar usuários em sites que exigem login usando C#?

Qual a vantagem de usar cache web durante o desenvolvimento?

Como posso gerenciar várias sessões de login em web scraping?

Quais são as opções avançadas de limitação de taxa na extração de dados da web?

Como posso usar um proxy com o IronWebScraper?

Como o IronWebScraper lida com atrasos nas requisições para evitar sobrecarga do servidor?

É possível retomar a extração de dados da web após uma interrupção?

Como faço para controlar o número de conexões HTTP simultâneas em um web scraper?

Quais opções estão disponíveis para registrar atividades de web scraping?

Ainda está rolando a tela?

Equipe de suporte de ferro

Experimente gratuitamente por 30 dias.

Nesta página

Recursos Avançados de Web Scraping em C

Recurso HttpIdentity

Ative o recurso de cache da Web

Limitação de velocidade

Propriedades de estrangulamento

Comece a usar o IronWebscraper

Perguntas frequentes

Como posso autenticar usuários em sites que exigem login usando C#?

Qual a vantagem de usar cache web durante o desenvolvimento?

Como posso gerenciar várias sessões de login em web scraping?

Quais são as opções avançadas de limitação de taxa na extração de dados da web?

Como posso usar um proxy com o IronWebScraper?

Como o IronWebScraper lida com atrasos nas requisições para evitar sobrecarga do servidor?

É possível retomar a extração de dados da web após uma interrupção?

Como faço para controlar o número de conexões HTTP simultâneas em um web scraper?

Quais opções estão disponíveis para registrar atividades de web scraping?

Ainda está rolando a tela?

Próximo passo: Inicie o teste gratuito de 30 dias.

Próximo passo: Inicie o teste gratuito de 30 dias.

Aprovado por milhões de engenheiros em todo o mundo.

Equipe de suporte de ferro