Test in a live environment
Test in production without watermarks.
Works wherever you need it to.
Designed for C#, F#, & VB.NET running on .NET 9, 8, 7, 6, Core, Standard, or Framework
using IronWebScraper;
public class Program
{
private static void Main(string[] args)
{
var ScrapeJob = new BlogScraper();
ScrapeJob.Start();
}
}
public class BlogScraper : WebScraper
{
public override void Init()
{
LoggingLevel = LogLevel.All;
Request("https://www.zyte.com/blog/", Parse);
}
public override void Parse(Response response)
{
foreach (HtmlNode title_link in response.Css(".oxy-post-title"))
{
string strTitle = title_link.TextContentClean;
Scrape(new ScrapedData() { { "Title", strTitle } });
}
if (response.CssExists("div.oxy-easy-posts-pages > a[href]"))
{
string next_page = response.Css("div.oxy-easy-posts-pages > a[href]")[0].Attributes["href"];
Request(next_page, Parse);
}
}
}
Imports IronWebScraper
Public Class Program
Public Shared Sub Main(ByVal args() As String)
Dim ScrapeJob = New BlogScraper()
ScrapeJob.Start()
End Sub
End Class
Public Class BlogScraper
Inherits WebScraper
Public Overrides Sub Init()
LoggingLevel = LogLevel.All
Request("https://www.zyte.com/blog/", AddressOf Parse)
End Sub
Public Overrides Sub Parse(ByVal response As Response)
For Each title_link As HtmlNode In response.Css(".oxy-post-title")
Dim strTitle As String = title_link.TextContentClean
Scrape(New ScrapedData() From {
{ "Title", strTitle }
})
Next title_link
If response.CssExists("div.oxy-easy-posts-pages > a[href]") Then
Dim next_page As String = response.Css("div.oxy-easy-posts-pages > a[href]")(0).Attributes("href")
Request(next_page, AddressOf Parse)
End If
End Sub
End Class
<p>IronWebScraper provides a powerful framework to extract data and files from websites using C# code.</p> <ol> <li>Install IronWebScraper to your Project using <a href="https://www.nuget.org/packages/IronWebScraper/" target="_blank" rel="nofollow noopener noreferrer">NuGet</a></li> <li>Create a Class Extending <code>WebScraper</code></li> <li>Create an <code>Init</code> method that uses the <code>Request</code> method to parse at least one URL.</li> <li>Create a <code>Parse</code> method to process the requests, and indeed <code>Request</code> more pages. Use response.Css to work with HTML elements using jQuery style CSS selectors</li> <li>In your application please create and instance of your web scraping class and call the <code>Start();</code> method</li> <li>Read our <a href="/csharp/webscraper/tutorials/webscraping-in-c-sharp/" target="_blank" rel="nofollow noopener noreferrer">C# webscraping tutorials</a> to learn how to create advanced web crawlers using IronWebScraper</li> </ol>
Whether it's product, integration or licensing queries, the Iron product development team is on hand to support all of your questions. Get in touch and start a dialog with Iron to make the most of our library in your project.
Ask a QuestionJust write a single C# web-scraper class to scrape thousands or even millions of web pages into C# Class Instances, JSON or Downloaded Files. IronWebScraper allows you to code concise, linear workflows simulating human browsing behavior. IronWebScraper will run your code as a swarm of virtual web browsers, massively paralleled, yet polite and fault tolerant.
Get Started with DocumentationIronWebScraper must be programmed to know how to handle each “type” of page it encounters. This is achieved in a very concise manner using CSS Selectors or XPath expressions and can be fully customized in C#. This freedom allows you to decide which pages to scrape within a website, and what to do with the data extracted. Each method can be debugged and watched neatly in Visual Studio.
Follow a TutorialIronWebScraper deals with multithreading and web-requests to allow for hundreds of concurrent threads without the developer needing to manage them. Politeness can be set to throttle requests, so reducing risk of excessive load on target web servers.
Up and Running with WebScraperIronWebScraper can use one or multiple “identities” - sessions that simulate real world human requests. Each request may programmatically or randomly assign its own Identity, User Agent, Cookies, Logins and even IP addresses. Requests are set as auto-unique with a combination of URL, parse method and post variables.
See API ReferenceIronWebScraper uses advanced caching to allow developers to change their code “on the fly” and replay every previous request without contacting the internet. Every scrape job is autosaved and can be resumed in the event of an exception or power outage.
WebScraper Setup InstructionsIronWebScraper puts Web Scraping tools in your own hands quickly with a Visual Studio installer. Whether installing directly from NuGet within visual studio or downloading the DLL, you’ll be setup in no time. Just one DLL and no dependencies.
PM > Install-Package IronWebScraper Download DLLVB C# ASP.NET
See how Ahmed uses IronWebScraper in his projects to migrate content from one site to another. Sample Projects and Code provided for scraping ecommerce and blog websites
View Ahmed's WebScraping TutorialIron's team have over 10 years experience in the .NET software component market.
9 .NET API products for your office documents