如何用 C 從網站中提取數據#

This article was translated from English: Does it need improvement?
Translated
View the article in English

艾哈邁德·阿布爾馬格德

IronWebscraper 是用於網絡抓取、網絡數據提取和網絡內容解析的 .NET 程式庫。 它是一個易於使用的庫,可以添加到 Microsoft Visual Studio 項目中,以用於開發和生產。

IronWebscraper 擁有許多獨特的功能和能力,例如控制允許和禁止的頁面、對象、媒體等。它還允許管理多個身份、網絡緩存以及許多其他功能,我們將在本教程中進行介紹。

開始使用IronWebscraper

立即在您的專案中使用IronWebScraper,並享受免費試用。

第一步:
green arrow pointer


目標受眾

本教程針對擁有基礎或高階程式設計技能的軟體開發人員,旨在幫助他們構建和實施具有先進抓取功能的解決方案。(網站爬取、網站數據收集與提取、網站內容解析、網頁數據抓取).

網頁抓取從來不是一件簡單的事情,C# 或 .NET 程式環境中也沒有主流的框架可供使用。Iron Web Scraper 的誕生就是為了改變這一狀況。

所需技能

  1. 使用 Microsoft 程式設計語言(如 C# 或 VB.NET)的基本編程技巧基礎。

  2. 基本了解網絡技術(HTML、JavaScript、JQuery、CSS等。)以及它們如何運作

  3. 基本的 DOM、XPath、HTML 和 CSS 選擇器知識

工具

  1. Microsoft Visual Studio 2010 或更高版本

  2. 瀏覽器(如 Chrome 的網頁檢查器或 Firefox 的 Firebug)的網頁開發者擴充功能

為什麼要進行網頁抓取?

(原因和概念)

如果您想要建立一個具有以下能力的產品或解決方案:

  1. 提取網站數據

  2. 從多個網站比較內容、價格、功能等。

  3. 掃描和快取網站內容

    如果您基於上述一個或多個原因,那麼IronWebscraper是一個非常適合您需求的函式庫。

如何安裝IronWebScraper?

在您創建新項目之後(見附錄 A)您可以通過使用 NuGet 自動插入 IronWebScraper 庫或手動安裝 DLL 將 IronWebScraper 庫添加到您的項目中。

使用 NuGet 安裝

要使用 NuGet 透過視覺介面將 IronWebScraper 庫添加到我們的項目中,我们可以这样做。(NuGet 套件管理器)或使用套件管理控制台命令。

使用 NuGet 套件管理器

  1. 使用滑鼠 -> 右鍵點擊項目名稱 -> 選擇管理 NuGet 套件

    AddIronWebscraperUsingGUI related to 使用 NuGet 套件管理器

  2. 從瀏覽標籤 -> 搜尋 IronWebScraper -> 安裝

    AddIronWebscraperUsingGUI2 related to 使用 NuGet 套件管理器

  3. 点选确定

    AddIronWebscraperUsingGUI3 related to 使用 NuGet 套件管理器

  4. 完成了。

    AddIronWebscraperUsingGUI4 related to 使用 NuGet 套件管理器

使用 NuGet 套件控制台

  1. 從工具 -> NuGet 套件管理器 -> 套件管理器控制台

    AddIronWebscraperUsingConsole related to 使用 NuGet 套件控制台

  2. 選擇類別庫專案作為預設專案

  3. 執行命令 -> Install-Package IronWebScraper

    AddIronWebscraperUsingConsole1 related to 使用 NuGet 套件控制台

手動安裝

  1. 前往ironsoftware.com

  2. 點擊IronWebScraper或直接使用URL訪問其頁面。https://ironsoftware.com/csharp/webscraper

  3. 點擊下載DLL。

  4. 解壓縮已下載的壓縮檔案

  5. 在 Visual Studio 中右鍵點擊專案 -> 加入 -> 參考 -> 瀏覽

    AddIronWebscraperUsingDll related to 手動安裝

  6. 前往提取的文件夾 -> netstandard2.0 -> 並選擇所有 .dll 文件

    AddIronWebscraperUsingDll2 related to 手動安裝

  7. 完成了!

HelloScraper - 我們的第一個 IronWebScraper 範例

像往常一樣,我們將開始實現 Hello Scraper 應用程式,以使用 IronWebScraper 進行我們的第一步。

  • 我們已創建一個名為“IronWebScraperSample”的新控制台應用程式。

    建立 IronWebScraper 範例的步驟

  1. 創建一個資料夾,並將其命名為“HelloScraperSample”。

  2. 然後新建一個類並命名為“HelloScraper”

    HelloScraperAddClass related to HelloScraper - 我們的第一個 IronWebScraper 範例

  3. 將此程式碼片段新增到HelloScraper
public class HelloScraper : WebScraper
{
        /// <summary>
        /// Override this method initialize your web-scraper.
        /// Important tasks will be to Request at least one start url... and set allowed/banned domain or url patterns.
        /// </summary>
        public override void Init()
        {
            License.LicenseKey = "LicenseKey"; // Write License Key
            this.LoggingLevel = WebScraper.LogLevel.All; // All Events Are Logged
            this.Request("https://blog.scrapinghub.com", Parse);
        }

        /// <summary>
        /// Override this method to create the default Response handler for your web scraper.
        /// If you have multiple page types, you can add additional similar methods.
        /// </summary>
        /// <param name="response">The http Response object to parse</param>
        public override void Parse(Response response)
        {
            // set working directory for the project
            this.WorkingDirectory = AppSetting.GetAppRoot()+ @"\HelloScraperSample\Output\";
            // Loop on all Links
            foreach (var title_link in response.Css("h2.entry-title a"))
            {
                // Read Link Text
                string strTitle = title_link.TextContentClean;
                // Save Result to File
                Scrape(new ScrapedData() { { "Title", strTitle } }, "HelloScraper.json");
            }
            // Loop On All Links
            if (response.CssExists("div.prev-post > a [href]"))
            {
                // Get Link URL
                var next_page = response.Css("div.prev-post > a [href]")[0].Attributes ["href"];
                // Scrape Next URL
                this.Request(next_page, Parse);
            }
        }
}
public class HelloScraper : WebScraper
{
        /// <summary>
        /// Override this method initialize your web-scraper.
        /// Important tasks will be to Request at least one start url... and set allowed/banned domain or url patterns.
        /// </summary>
        public override void Init()
        {
            License.LicenseKey = "LicenseKey"; // Write License Key
            this.LoggingLevel = WebScraper.LogLevel.All; // All Events Are Logged
            this.Request("https://blog.scrapinghub.com", Parse);
        }

        /// <summary>
        /// Override this method to create the default Response handler for your web scraper.
        /// If you have multiple page types, you can add additional similar methods.
        /// </summary>
        /// <param name="response">The http Response object to parse</param>
        public override void Parse(Response response)
        {
            // set working directory for the project
            this.WorkingDirectory = AppSetting.GetAppRoot()+ @"\HelloScraperSample\Output\";
            // Loop on all Links
            foreach (var title_link in response.Css("h2.entry-title a"))
            {
                // Read Link Text
                string strTitle = title_link.TextContentClean;
                // Save Result to File
                Scrape(new ScrapedData() { { "Title", strTitle } }, "HelloScraper.json");
            }
            // Loop On All Links
            if (response.CssExists("div.prev-post > a [href]"))
            {
                // Get Link URL
                var next_page = response.Css("div.prev-post > a [href]")[0].Attributes ["href"];
                // Scrape Next URL
                this.Request(next_page, Parse);
            }
        }
}
IRON VB CONVERTER ERROR developers@ironsoftware.com
VB   C#
  1. 現在開始抓取,將此代碼片段添加到主要部分
static void Main(string [] args)
{
        // Create Object From Hello Scrape class
        HelloScraperSample.HelloScraper scrape = new HelloScraperSample.HelloScraper();
            // Start Scraping
            scrape.Start();
}
static void Main(string [] args)
{
        // Create Object From Hello Scrape class
        HelloScraperSample.HelloScraper scrape = new HelloScraperSample.HelloScraper();
            // Start Scraping
            scrape.Start();
}
IRON VB CONVERTER ERROR developers@ironsoftware.com
VB   C#
  1. 結果將會被儲存為 WebSraper.WorkingDirecty/classname.Json 格式的文件。

    HelloScraperFrmFileResult related to HelloScraper - 我們的第一個 IronWebScraper 範例

代碼概述

Scrape.Start()=> 觸發如下的抓取邏輯:

  1. 調用 Init()首先用於初始化變數、抓取屬性和行為屬性的方法。

  2. 如我們所見,它將起始頁面設定為Request。(請提供您想要翻譯的內容。https://blog.scrapinghub.com", 解析)和解析(回應 )定義為用來解析回應的過程。

  3. Webscraper 同時管理:http 和執行緒... 保持您的所有代碼易於調試和同步。

  4. 解析方法在初始化後開始()解析該頁面。
    1. 您可以使用(CSS選擇器、JavaScript DOM、XPath)來尋找元素。

    2. 選定的元素被轉換為 ScrapedData 類型,你可以將它們轉換為任何自定義類型,如 (Product, Employee, News 等)。

    3. 使用 Json 格式保存在 ("bin/Scrape/") 目錄中的文件中的對象。或者,您可以將文件的路徑設置為參數,稍後在其他示例中我們將看到。

IronWebScraper 程式庫功能與選項

您可以在使用手動安裝方法下載的zip文件中找到更新的文檔。(IronWebScraper Documentation.chm 檔案)

或者您可以查看線上文件庫的最新更新情況https://ironsoftware.com/csharp/webscraper/object-reference/

要在您的專案中開始使用 IronWebscraper,您必須繼承自(IronWebScraper.WebScraper)擴展您的類別庫並為其添加抓取功能的類別。

您還必須實施{初始化(), 解析(回應 )}方法。

namespace IronWebScraperEngine
{
    public class NewsScraper : IronWebScraper.WebScraper
    {
        public override void Init()
        {
            throw new NotImplementedException();
        }

        public override void Parse(Response response)
        {
            throw new NotImplementedException();
        }
    }
}
namespace IronWebScraperEngine
{
    public class NewsScraper : IronWebScraper.WebScraper
    {
        public override void Init()
        {
            throw new NotImplementedException();
        }

        public override void Parse(Response response)
        {
            throw new NotImplementedException();
        }
    }
}
Namespace IronWebScraperEngine
	Public Class NewsScraper
		Inherits IronWebScraper.WebScraper

		Public Overrides Sub Init()
			Throw New NotImplementedException()
		End Sub

		Public Overrides Sub Parse(ByVal response As Response)
			Throw New NotImplementedException()
		End Sub
	End Class
End Namespace
VB   C#
屬性 \ 函數類型描述
初始化()方法用於設置爬蟲
解析 (Response response)方法用於實現爬蟲將使用的邏輯及其處理方式。
以下表格包含IronWebScraper庫提供的方法和屬性列表
注意:可以針對不同頁面的行為或結構實現多種方法
  • 禁止的網址
  • 允許的網址
  • 禁止域
集合用于禁止/允许/网址和/或域名
範例: BannedUrls.Add ("*.zip", "*.exe", "*.gz", "*.pdf");
注意:
  • 您可以使用 ( * 和/或 ? ) 萬用字元
  • 您可以使用字串和正則表達式
  • 禁止URL, 允許URL, 禁止域名, 允許域名
  • BannedUrls.Add ("*.zip", "*.exe", "*.gz", "*.pdf");
  • *? 全域銷售萬用字元
  • 字串和正規表達式
  • 您可以通過重載方法來覆蓋此行為:public virtual bool AcceptUrl (string url)
ObeyRobotsDotTxt布林值用於啟用或禁用讀取和遵循robots.txt的指令
public override bool ObeyRobotsDotTxtForHost (string Host)方法用於啟用或禁用讀取及遵循robots.txt指令或不對某些域名的指令
抓取方法
ScrapeUnique方法
調節模式枚舉
啟用網頁快取()方法
啟用網路快取 (TimeSpan 快取持續時間)方法
最大HTTP連接限制整数
每個主機的RateLimit時間跨度
每個主機的開放連接限制整数
ObeyRobotsDotTxt布林值
調節模式枚舉型別列舉選項:
  • ByIpAddress
  • 按域名主機名
設置特定網站抓取速率限制 (string hostName, TimeSpan crawlRate)方法
身份标识集合用來擷取網路資源的 HttpIdentity()列表。

每個身份可能有不同的代理 IP 地址、用戶代理、HTTP 標頭、持久性 cookie、用戶名和密碼。
最佳做法是在您的 WebScraper.Init 方法中創建 Identities,並將它們添加到此 WebScraper.Identities 列表中。
工作目錄字串設定用於所有抓取相關數據將儲存到磁碟的工作目錄。


實際案例和練習

抓取線上電影網站

讓我們從一個真實世界的網站開始另一個例子。我們將選擇抓取一個電影網站。

讓我們新增一個類別並命名為「MovieScraper」:

MovieScrapaerAddClass related to 抓取線上電影網站

現在讓我們來看看我們將要抓取的網站:

123movies related to 抓取線上電影網站

這是我們在網站上看到的首頁 HTML 的一部分:

<div id="movie-featured" class="movies-list movies-list-full tab-pane in fade active">
    <div data-movie-id="20746" class="ml-item">
        <a href="https://website.com/film/king-arthur-legend-of-the-sword-20746/">
            <span class="mli-quality">CAM</span>
            <img data-original="https://img.gocdn.online/2017/05/16/poster/2116d6719c710eabe83b377463230fbe-king-arthur-legend-of-the-sword.jpg" 
                 class="lazy thumb mli-thumb" alt="King Arthur: Legend of the Sword"
                  src="https://img.gocdn.online/2017/05/16/poster/2116d6719c710eabe83b377463230fbe-king-arthur-legend-of-the-sword.jpg" 
                 style="display: inline-block;">
            <span class="mli-info"><h2>King Arthur: Legend of the Sword</h2></span>
        </a>
    </div>
    <div data-movie-id="20724" class="ml-item">
        <a href="https://website.com/film/snatched-20724/" >
            <span class="mli-quality">CAM</span>
            <img data-original="https://img.gocdn.online/2017/05/16/poster/5ef66403dc331009bdb5aa37cfe819ba-snatched.jpg" 
                 class="lazy thumb mli-thumb" alt="Snatched" 
                 src="https://img.gocdn.online/2017/05/16/poster/5ef66403dc331009bdb5aa37cfe819ba-snatched.jpg" 
                 style="display: inline-block;">
            <span class="mli-info"><h2>Snatched</h2></span>
        </a>
    </div>
</div>
<div id="movie-featured" class="movies-list movies-list-full tab-pane in fade active">
    <div data-movie-id="20746" class="ml-item">
        <a href="https://website.com/film/king-arthur-legend-of-the-sword-20746/">
            <span class="mli-quality">CAM</span>
            <img data-original="https://img.gocdn.online/2017/05/16/poster/2116d6719c710eabe83b377463230fbe-king-arthur-legend-of-the-sword.jpg" 
                 class="lazy thumb mli-thumb" alt="King Arthur: Legend of the Sword"
                  src="https://img.gocdn.online/2017/05/16/poster/2116d6719c710eabe83b377463230fbe-king-arthur-legend-of-the-sword.jpg" 
                 style="display: inline-block;">
            <span class="mli-info"><h2>King Arthur: Legend of the Sword</h2></span>
        </a>
    </div>
    <div data-movie-id="20724" class="ml-item">
        <a href="https://website.com/film/snatched-20724/" >
            <span class="mli-quality">CAM</span>
            <img data-original="https://img.gocdn.online/2017/05/16/poster/5ef66403dc331009bdb5aa37cfe819ba-snatched.jpg" 
                 class="lazy thumb mli-thumb" alt="Snatched" 
                 src="https://img.gocdn.online/2017/05/16/poster/5ef66403dc331009bdb5aa37cfe819ba-snatched.jpg" 
                 style="display: inline-block;">
            <span class="mli-info"><h2>Snatched</h2></span>
        </a>
    </div>
</div>
HTML

正如我們所見,我們有電影 ID、標題和詳細頁面的鏈接。

讓我們開始抓取這組數據:

public class MovieScraper : WebScraper
{
    public override void Init()
    {
        License.LicenseKey = "LicenseKey";
        this.LoggingLevel = WebScraper.LogLevel.All;
        this.WorkingDirectory = AppSetting.GetAppRoot() + @"\MovieSample\Output\";
        this.Request("www.website.com", Parse);
    }
    public override void Parse(Response response)
    {
        foreach (var Divs in response.Css("#movie-featured > div"))
        {
            if (Divs.Attributes ["class"] != "clearfix")
            {
                var MovieId = Divs.GetAttribute("data-movie-id");
                var link = Divs.Css("a")[0];
                var MovieTitle = link.TextContentClean;
                Scrape(new ScrapedData() { { "MovieId", MovieId }, { "MovieTitle", MovieTitle } }, "Movie.Jsonl");
            }
        }           
    }
}
public class MovieScraper : WebScraper
{
    public override void Init()
    {
        License.LicenseKey = "LicenseKey";
        this.LoggingLevel = WebScraper.LogLevel.All;
        this.WorkingDirectory = AppSetting.GetAppRoot() + @"\MovieSample\Output\";
        this.Request("www.website.com", Parse);
    }
    public override void Parse(Response response)
    {
        foreach (var Divs in response.Css("#movie-featured > div"))
        {
            if (Divs.Attributes ["class"] != "clearfix")
            {
                var MovieId = Divs.GetAttribute("data-movie-id");
                var link = Divs.Css("a")[0];
                var MovieTitle = link.TextContentClean;
                Scrape(new ScrapedData() { { "MovieId", MovieId }, { "MovieTitle", MovieTitle } }, "Movie.Jsonl");
            }
        }           
    }
}
Public Class MovieScraper
	Inherits WebScraper

	Public Overrides Sub Init()
		License.LicenseKey = "LicenseKey"
		Me.LoggingLevel = WebScraper.LogLevel.All
		Me.WorkingDirectory = AppSetting.GetAppRoot() & "\MovieSample\Output\"
		Me.Request("www.website.com", AddressOf Parse)
	End Sub
	Public Overrides Sub Parse(ByVal response As Response)
		For Each Divs In response.Css("#movie-featured > div")
			If Divs.Attributes ("class") <> "clearfix" Then
				Dim MovieId = Divs.GetAttribute("data-movie-id")
				Dim link = Divs.Css("a")(0)
				Dim MovieTitle = link.TextContentClean
				Scrape(New ScrapedData() From {
					{ "MovieId", MovieId },
					{ "MovieTitle", MovieTitle }
				},
				"Movie.Jsonl")
			End If
		Next Divs
	End Sub
End Class
VB   C#

這段代碼有什麼新變化?

工作目錄屬性用於設置所有抓取數據及其相關文件的主要工作目錄。

讓我們做得更多。

如果我們需要構建類型化對象以保存格式化對象中的抓取數據,該怎麼辦?

讓我們實現一個電影類,用來存放我們格式化的數據:

public class Movie
{
    public int Id { get; set; }
    public string Title { get; set; }
    public string URL { get; set; }

}
public class Movie
{
    public int Id { get; set; }
    public string Title { get; set; }
    public string URL { get; set; }

}
Public Class Movie
	Public Property Id() As Integer
	Public Property Title() As String
	Public Property URL() As String

End Class
VB   C#

現在我們將更新我們的代碼:

public class MovieScraper : WebScraper
{
    public override void Init()
    {
        License.LicenseKey = "LicenseKey";
        this.LoggingLevel = WebScraper.LogLevel.All;
        this.WorkingDirectory = AppSetting.GetAppRoot() + @"\MovieSample\Output\";
        this.Request("https://website.com/", Parse);
    }
    public override void Parse(Response response)
    {
        foreach (var Divs in response.Css("#movie-featured > div"))
        {
            if (Divs.Attributes ["class"] != "clearfix")
            {
                var movie = new Movie();
                movie.Id = Convert.ToInt32( Divs.GetAttribute("data-movie-id"));
                var link = Divs.Css("a")[0];
                movie.Title = link.TextContentClean;
                movie.URL = link.Attributes ["href"];
                Scrape(movie, "Movie.Jsonl");
            }
        }
    }
}
public class MovieScraper : WebScraper
{
    public override void Init()
    {
        License.LicenseKey = "LicenseKey";
        this.LoggingLevel = WebScraper.LogLevel.All;
        this.WorkingDirectory = AppSetting.GetAppRoot() + @"\MovieSample\Output\";
        this.Request("https://website.com/", Parse);
    }
    public override void Parse(Response response)
    {
        foreach (var Divs in response.Css("#movie-featured > div"))
        {
            if (Divs.Attributes ["class"] != "clearfix")
            {
                var movie = new Movie();
                movie.Id = Convert.ToInt32( Divs.GetAttribute("data-movie-id"));
                var link = Divs.Css("a")[0];
                movie.Title = link.TextContentClean;
                movie.URL = link.Attributes ["href"];
                Scrape(movie, "Movie.Jsonl");
            }
        }
    }
}
Public Class MovieScraper
	Inherits WebScraper

	Public Overrides Sub Init()
		License.LicenseKey = "LicenseKey"
		Me.LoggingLevel = WebScraper.LogLevel.All
		Me.WorkingDirectory = AppSetting.GetAppRoot() & "\MovieSample\Output\"
		Me.Request("https://website.com/", AddressOf Parse)
	End Sub
	Public Overrides Sub Parse(ByVal response As Response)
		For Each Divs In response.Css("#movie-featured > div")
			If Divs.Attributes ("class") <> "clearfix" Then
				Dim movie As New Movie()
				movie.Id = Convert.ToInt32(Divs.GetAttribute("data-movie-id"))
				Dim link = Divs.Css("a")(0)
				movie.Title = link.TextContentClean
				movie.URL = link.Attributes ("href")
				Scrape(movie, "Movie.Jsonl")
			End If
		Next Divs
	End Sub
End Class
VB   C#

有什麼新消息?

  1. 我們實現Movie Class來保存我們抓取的數據

  2. 我們將電影對象傳遞給 Scrape 方法,它了解我們的格式並以我們在此處看到的已定義格式保存:

    MovieResultMovieClass related to 抓取線上電影網站

    讓我們開始抓取更詳細的頁面。

電影頁面看起來像這樣:

MovieDetailsSample related to 抓取線上電影網站

<div class="mvi-content">
    <div class="thumb mvic-thumb"
         style="background-image: url(https://img.gocdn.online/2017/04/28/poster/5a08e94ba02118f22dc30f298c603210-guardians-of-the-galaxy-vol-2.jpg);"></div>
    <div class="mvic-desc">
        <h3>Guardians of the Galaxy Vol. 2</h3>        
        <div class="desc">
            Set to the backdrop of Awesome Mixtape #2, Marvel's Guardians of the Galaxy Vol. 2 continues the team's adventures as they travel throughout the cosmos to help Peter Quill learn more about his true parentage.
        </div>
        <div class="mvic-info">
            <div class="mvici-left">
                <p>
                    <strong>Genre: </strong>
                    <a href="https://Domain/genre/action/" title="Action">Action</a>,
                    <a href="https://Domain/genre/adventure/" title="Adventure">Adventure</a>,
                    <a href="https://Domain/genre/sci-fi/" title="Sci-Fi">Sci-Fi</a>
                </p>
                <p>
                    <strong>Actor: </strong>
                    <a target="_blank" href="https://Domain/actor/chris-pratt" title="Chris Pratt">Chris Pratt</a>,
                    <a target="_blank" href="https://Domain/actor/-zoe-saldana" title="Zoe Saldana">Zoe Saldana</a>,
                    <a target="_blank" href="https://Domain/actor/-dave-bautista-" title="Dave Bautista">Dave Bautista</a>
                </p>
                <p>
                    <strong>Director: </strong>
                    <a href="#" title="James Gunn">James Gunn</a>
                </p>
                <p>
                    <strong>Country: </strong>
                    <a href="https://Domain/country/us" title="United States">United States</a>
                </p>
            </div>
            <div class="mvici-right">
                <p><strong>Duration:</strong> 136 min</p>
                <p><strong>Quality:</strong> <span class="quality">CAM</span></p>
                <p><strong>Release:</strong> 2017</p>
                <p><strong>IMDb:</strong> 8.3</p>
            </div>
            <div class="clearfix"></div>
        </div>
        <div class="clearfix"></div>
    </div>
    <div class="clearfix"></div>
</div>
<div class="mvi-content">
    <div class="thumb mvic-thumb"
         style="background-image: url(https://img.gocdn.online/2017/04/28/poster/5a08e94ba02118f22dc30f298c603210-guardians-of-the-galaxy-vol-2.jpg);"></div>
    <div class="mvic-desc">
        <h3>Guardians of the Galaxy Vol. 2</h3>        
        <div class="desc">
            Set to the backdrop of Awesome Mixtape #2, Marvel's Guardians of the Galaxy Vol. 2 continues the team's adventures as they travel throughout the cosmos to help Peter Quill learn more about his true parentage.
        </div>
        <div class="mvic-info">
            <div class="mvici-left">
                <p>
                    <strong>Genre: </strong>
                    <a href="https://Domain/genre/action/" title="Action">Action</a>,
                    <a href="https://Domain/genre/adventure/" title="Adventure">Adventure</a>,
                    <a href="https://Domain/genre/sci-fi/" title="Sci-Fi">Sci-Fi</a>
                </p>
                <p>
                    <strong>Actor: </strong>
                    <a target="_blank" href="https://Domain/actor/chris-pratt" title="Chris Pratt">Chris Pratt</a>,
                    <a target="_blank" href="https://Domain/actor/-zoe-saldana" title="Zoe Saldana">Zoe Saldana</a>,
                    <a target="_blank" href="https://Domain/actor/-dave-bautista-" title="Dave Bautista">Dave Bautista</a>
                </p>
                <p>
                    <strong>Director: </strong>
                    <a href="#" title="James Gunn">James Gunn</a>
                </p>
                <p>
                    <strong>Country: </strong>
                    <a href="https://Domain/country/us" title="United States">United States</a>
                </p>
            </div>
            <div class="mvici-right">
                <p><strong>Duration:</strong> 136 min</p>
                <p><strong>Quality:</strong> <span class="quality">CAM</span></p>
                <p><strong>Release:</strong> 2017</p>
                <p><strong>IMDb:</strong> 8.3</p>
            </div>
            <div class="clearfix"></div>
        </div>
        <div class="clearfix"></div>
    </div>
    <div class="clearfix"></div>
</div>
HTML

我們可以為我們的電影類別擴展新的屬性(描述, 類型, 演員, 導演, 國家, 片長, IMDB 評分)但我們將使用(描述, 類型, 演員)僅供我們的範例使用。

public class Movie
{
    public int Id { get; set; }
    public string Title { get; set; }
    public string URL { get; set; }
    public string Description { get; set; }
    public List<string> Genre { get; set; }
    public List<string> Actor { get; set; }

}
public class Movie
{
    public int Id { get; set; }
    public string Title { get; set; }
    public string URL { get; set; }
    public string Description { get; set; }
    public List<string> Genre { get; set; }
    public List<string> Actor { get; set; }

}
Public Class Movie
	Public Property Id() As Integer
	Public Property Title() As String
	Public Property URL() As String
	Public Property Description() As String
	Public Property Genre() As List(Of String)
	Public Property Actor() As List(Of String)

End Class
VB   C#

現在我們將導航到詳細頁面進行抓取。

IronWebScraper 讓您可以增加更多抓取功能,以抓取不同類型的頁面格式。

如我們在此所見:

public class MovieScraper : WebScraper
{
    public override void Init()
    {
        License.LicenseKey = "LicenseKey";
        this.LoggingLevel = WebScraper.LogLevel.All;
        this.WorkingDirectory = AppSetting.GetAppRoot() + @"\MovieSample\Output\";
        this.Request("https://domain/", Parse);
    }
    public override void Parse(Response response)
    {
        foreach (var Divs in response.Css("#movie-featured > div"))
        {
            if (Divs.Attributes ["class"] != "clearfix")
            {
                var movie = new Movie();
                movie.Id = Convert.ToInt32( Divs.GetAttribute("data-movie-id"));
                var link = Divs.Css("a")[0];
                movie.Title = link.TextContentClean;
                movie.URL = link.Attributes ["href"];
                this.Request(movie.URL, ParseDetails, new MetaData() { { "movie", movie } });// to scrap Detailed Page
            }
        }           
    }
    public void ParseDetails(Response response)
    {
        var movie = response.MetaData.Get<Movie>("movie");
        var Div = response.Css("div.mvic-desc")[0];
        movie.Description = Div.Css("div.desc")[0].TextContentClean;
        foreach(var Genre in Div.Css("div > p > a"))
        {
            movie.Genre.Add(Genre.TextContentClean);
        }
        foreach (var Actor in Div.Css("div > p:nth-child(2) > a"))
        {
            movie.Actor.Add(Actor.TextContentClean);
        }
        Scrape(movie, "Movie.Jsonl");
    }
}
public class MovieScraper : WebScraper
{
    public override void Init()
    {
        License.LicenseKey = "LicenseKey";
        this.LoggingLevel = WebScraper.LogLevel.All;
        this.WorkingDirectory = AppSetting.GetAppRoot() + @"\MovieSample\Output\";
        this.Request("https://domain/", Parse);
    }
    public override void Parse(Response response)
    {
        foreach (var Divs in response.Css("#movie-featured > div"))
        {
            if (Divs.Attributes ["class"] != "clearfix")
            {
                var movie = new Movie();
                movie.Id = Convert.ToInt32( Divs.GetAttribute("data-movie-id"));
                var link = Divs.Css("a")[0];
                movie.Title = link.TextContentClean;
                movie.URL = link.Attributes ["href"];
                this.Request(movie.URL, ParseDetails, new MetaData() { { "movie", movie } });// to scrap Detailed Page
            }
        }           
    }
    public void ParseDetails(Response response)
    {
        var movie = response.MetaData.Get<Movie>("movie");
        var Div = response.Css("div.mvic-desc")[0];
        movie.Description = Div.Css("div.desc")[0].TextContentClean;
        foreach(var Genre in Div.Css("div > p > a"))
        {
            movie.Genre.Add(Genre.TextContentClean);
        }
        foreach (var Actor in Div.Css("div > p:nth-child(2) > a"))
        {
            movie.Actor.Add(Actor.TextContentClean);
        }
        Scrape(movie, "Movie.Jsonl");
    }
}
Public Class MovieScraper
	Inherits WebScraper

	Public Overrides Sub Init()
		License.LicenseKey = "LicenseKey"
		Me.LoggingLevel = WebScraper.LogLevel.All
		Me.WorkingDirectory = AppSetting.GetAppRoot() & "\MovieSample\Output\"
		Me.Request("https://domain/", AddressOf Parse)
	End Sub
	Public Overrides Sub Parse(ByVal response As Response)
		For Each Divs In response.Css("#movie-featured > div")
			If Divs.Attributes ("class") <> "clearfix" Then
				Dim movie As New Movie()
				movie.Id = Convert.ToInt32(Divs.GetAttribute("data-movie-id"))
				Dim link = Divs.Css("a")(0)
				movie.Title = link.TextContentClean
				movie.URL = link.Attributes ("href")
				Me.Request(movie.URL, AddressOf ParseDetails, New MetaData() From {
					{ "movie", movie }
				}) ' to scrap Detailed Page
			End If
		Next Divs
	End Sub
	Public Sub ParseDetails(ByVal response As Response)
		Dim movie = response.MetaData.Get(Of Movie)("movie")
		Dim Div = response.Css("div.mvic-desc")(0)
		movie.Description = Div.Css("div.desc")(0).TextContentClean
		For Each Genre In Div.Css("div > p > a")
			movie.Genre.Add(Genre.TextContentClean)
		Next Genre
		For Each Actor In Div.Css("div > p:nth-child(2) > a")
			movie.Actor.Add(Actor.TextContentClean)
		Next Actor
		Scrape(movie, "Movie.Jsonl")
	End Sub
End Class
VB   C#

有什麼新消息?

  1. 我們可以添加抓取功能。(解析詳細資訊)抓取詳細頁面

  2. 我們將生成檔案的 Scrape 功能移至新功能。

  3. 我們使用了IronWebScraper功能(元數據)將我們的電影物件傳遞給新的抓取函數

  4. 我們抓取了網頁並將我們的電影對象數據保存到一個文件中。

    MovieResultMovieClass1 related to 抓取線上電影網站

從購物網站抓取內容

我们选择一个购物网站来抓取其内容。

ShoppingSite related to 從購物網站抓取內容

從圖片中可以看到,我們有一個左側欄,其中包含了網站產品類別的鏈接。

因此,我們的第一步是調查網站的HTML,並計劃我們希望如何抓取它。

ShoppingSiteLeftBar related to 從購物網站抓取內容

時尚網站類別有子類別(男士,女士,兒童)

<li class="menu-item" data-id="">
    <a href="https://domain.com/fashion-by-/" class="main-category">
        <i class="cat-icon osh-font-fashion"></i> <span class="nav-subTxt">FASHION </span> <i class="osh-font-light-arrow-left"></i><i class="osh-font-light-arrow-right"></i>
    </a> <div class="navLayerWrapper" style="width: 633px; display: none;"><div class="submenu"><div class="column"><div class="categories"><a class="category" href="https://domain.com/fashion-by-/?sort=newest&amp;dir=desc&amp;viewType=gridView3">New Arrivals !</a>  </div><div class="categories"><a class="category" href="https://domain.com/men-fashion/">Men</a>   <a class="subcategory" href="https://domain.com/mens-shoes/">Shoes</a>   <a class="subcategory" href="https://domain.com/mens-clothing/">Clothing</a>   <a class="subcategory" href="https://domain.com/mens-accessories/">Accessories</a>  </div><div class="categories"><a class="category" href="https://domain.com/women-fashion/">Women</a>   <a class="subcategory" href="https://domain.com/womens-shoes/">Shoes</a>   <a class="subcategory" href="https://domain.com/womens-clothing/">Clothing</a>   <a class="subcategory" href="https://domain.com/womens-accessories/">Accessories</a>  </div><div class="categories"><a class="category" href="https://domain.com/girls-boys-fashion/">Kids</a>   <a class="subcategory" href="https://domain.com/boys-fashion/">Boys</a>   <a class="subcategory" href="https://domain.com/girls/">Girls</a>  </div><div class="categories"><a class="category" href="https://domain.com/maternity-clothes/">Maternity Clothes</a>  </div></div><div class="column"><div class="categories"> <span class="category defaultCursor">Men Best Sellers</span>  <a class="subcategory" href="https://domain.com/mens-casual-shoes/">Casual Shoes</a>   <a class="subcategory" href="https://domain.com/mens-sneakers/">Sneakers</a>   <a class="subcategory" href="https://domain.com/mens-t-shirts/">T-shirts</a>   <a class="subcategory" href="https://domain.com/mens-polos/">Polos</a>  </div><div class="categories"> <span class="category defaultCursor">Women Best Sellers</span>  <a class="subcategory" href="https://domain.com/womens-sandals/">Sandals</a>   <a class="subcategory" href="https://domain.com/womens-sneakers/">Sneakers</a>   <a class="subcategory" href="https://domain.com/women-dresses/">Dresses</a>   <a class="subcategory" href="https://domain.com/women-tops/">Tops</a>  </div><div class="categories"><a class="category" href="https://domain.com/womens-curvy-clothing/">Women's Curvy Clothing</a>  </div><div class="categories"><a class="category" href="https://domain.com/fashion-bundles/v/">Fashion Bundles</a>  </div><div class="categories"><a class="category" href="https://domain.com/hijab-fashion/">Hijab Fashion</a>  </div></div><div class="column"><div class="categories"><a class="category" href="https://domain.com/brands/fashion-by-/">SEE ALL BRANDS</a>   <a class="subcategory" href="https://domain.com/adidas/">Adidas</a>   <a class="subcategory" href="https://domain.com/converse/">Converse</a>   <a class="subcategory" href="https://domain.com/ravin/">Ravin</a>   <a class="subcategory" href="https://domain.com/dejavu/">Dejavu</a>   <a class="subcategory" href="https://domain.com/agu/">Agu</a>   <a class="subcategory" href="https://domain.com/activ/">Activ</a>   <a class="subcategory" href="https://domain.com/oxford--bellini--tie-house--milano/">Tie House</a>   <a class="subcategory" href="https://domain.com/shoe-room/">Shoe Room</a>   <a class="subcategory" href="https://domain.com/town-team/">Town Team</a>  </div></div></div></div>
</li>
<li class="menu-item" data-id="">
    <a href="https://domain.com/fashion-by-/" class="main-category">
        <i class="cat-icon osh-font-fashion"></i> <span class="nav-subTxt">FASHION </span> <i class="osh-font-light-arrow-left"></i><i class="osh-font-light-arrow-right"></i>
    </a> <div class="navLayerWrapper" style="width: 633px; display: none;"><div class="submenu"><div class="column"><div class="categories"><a class="category" href="https://domain.com/fashion-by-/?sort=newest&amp;dir=desc&amp;viewType=gridView3">New Arrivals !</a>  </div><div class="categories"><a class="category" href="https://domain.com/men-fashion/">Men</a>   <a class="subcategory" href="https://domain.com/mens-shoes/">Shoes</a>   <a class="subcategory" href="https://domain.com/mens-clothing/">Clothing</a>   <a class="subcategory" href="https://domain.com/mens-accessories/">Accessories</a>  </div><div class="categories"><a class="category" href="https://domain.com/women-fashion/">Women</a>   <a class="subcategory" href="https://domain.com/womens-shoes/">Shoes</a>   <a class="subcategory" href="https://domain.com/womens-clothing/">Clothing</a>   <a class="subcategory" href="https://domain.com/womens-accessories/">Accessories</a>  </div><div class="categories"><a class="category" href="https://domain.com/girls-boys-fashion/">Kids</a>   <a class="subcategory" href="https://domain.com/boys-fashion/">Boys</a>   <a class="subcategory" href="https://domain.com/girls/">Girls</a>  </div><div class="categories"><a class="category" href="https://domain.com/maternity-clothes/">Maternity Clothes</a>  </div></div><div class="column"><div class="categories"> <span class="category defaultCursor">Men Best Sellers</span>  <a class="subcategory" href="https://domain.com/mens-casual-shoes/">Casual Shoes</a>   <a class="subcategory" href="https://domain.com/mens-sneakers/">Sneakers</a>   <a class="subcategory" href="https://domain.com/mens-t-shirts/">T-shirts</a>   <a class="subcategory" href="https://domain.com/mens-polos/">Polos</a>  </div><div class="categories"> <span class="category defaultCursor">Women Best Sellers</span>  <a class="subcategory" href="https://domain.com/womens-sandals/">Sandals</a>   <a class="subcategory" href="https://domain.com/womens-sneakers/">Sneakers</a>   <a class="subcategory" href="https://domain.com/women-dresses/">Dresses</a>   <a class="subcategory" href="https://domain.com/women-tops/">Tops</a>  </div><div class="categories"><a class="category" href="https://domain.com/womens-curvy-clothing/">Women's Curvy Clothing</a>  </div><div class="categories"><a class="category" href="https://domain.com/fashion-bundles/v/">Fashion Bundles</a>  </div><div class="categories"><a class="category" href="https://domain.com/hijab-fashion/">Hijab Fashion</a>  </div></div><div class="column"><div class="categories"><a class="category" href="https://domain.com/brands/fashion-by-/">SEE ALL BRANDS</a>   <a class="subcategory" href="https://domain.com/adidas/">Adidas</a>   <a class="subcategory" href="https://domain.com/converse/">Converse</a>   <a class="subcategory" href="https://domain.com/ravin/">Ravin</a>   <a class="subcategory" href="https://domain.com/dejavu/">Dejavu</a>   <a class="subcategory" href="https://domain.com/agu/">Agu</a>   <a class="subcategory" href="https://domain.com/activ/">Activ</a>   <a class="subcategory" href="https://domain.com/oxford--bellini--tie-house--milano/">Tie House</a>   <a class="subcategory" href="https://domain.com/shoe-room/">Shoe Room</a>   <a class="subcategory" href="https://domain.com/town-team/">Town Team</a>  </div></div></div></div>
</li>
HTML

讓我們設立一個項目

  1. 創建一個新的控制台應用程序或為我們的新範例添加一個名為“ShoppingSiteSample”的新文件夾。

  2. 新增名為“ShoppingScraper”的類別

  3. 首先將對網站的主類別及其子類別進行抓取

    讓我們建立一個類別模型:

public class Category
{
    /// <summary>
    /// Gets or sets the name.
    /// </summary>
    /// <value>
    /// The name.
    /// </value>
    public string Name { get; set; }
    /// <summary>
    /// Gets or sets the URL.
    /// </summary>
    /// <value>
    /// The URL.
    /// </value>
    public string URL { get; set; }
    /// <summary>
    /// Gets or sets the sub categories.
    /// </summary>
    /// <value>
    /// The sub categories.
    /// </value>
    public List<Category> SubCategories { get; set; }
}
public class Category
{
    /// <summary>
    /// Gets or sets the name.
    /// </summary>
    /// <value>
    /// The name.
    /// </value>
    public string Name { get; set; }
    /// <summary>
    /// Gets or sets the URL.
    /// </summary>
    /// <value>
    /// The URL.
    /// </value>
    public string URL { get; set; }
    /// <summary>
    /// Gets or sets the sub categories.
    /// </summary>
    /// <value>
    /// The sub categories.
    /// </value>
    public List<Category> SubCategories { get; set; }
}
Public Class Category
	''' <summary>
	''' Gets or sets the name.
	''' </summary>
	''' <value>
	''' The name.
	''' </value>
	Public Property Name() As String
	''' <summary>
	''' Gets or sets the URL.
	''' </summary>
	''' <value>
	''' The URL.
	''' </value>
	Public Property URL() As String
	''' <summary>
	''' Gets or sets the sub categories.
	''' </summary>
	''' <value>
	''' The sub categories.
	''' </value>
	Public Property SubCategories() As List(Of Category)
End Class
VB   C#
  1. 現在讓我們構建我們的抓取邏輯
public class ShoppingScraper : WebScraper
{
    /// <summary>
    /// Override this method initialize your web-scraper.
    /// Important tasks will be to Request at least one start url... and set allowed/banned domain or url patterns.
    /// </summary>
    public override void Init()
    {
        License.LicenseKey = "LicenseKey";
        this.LoggingLevel = WebScraper.LogLevel.All;
        this.WorkingDirectory = AppSetting.GetAppRoot() + @"\ShoppingSiteSample\Output\";
        this.Request("www.webSite.com", Parse);
    }

    /// <summary>
    /// Override this method to create the default Response handler for your web scraper.
    /// If you have multiple page types, you can add additional similar methods.
    /// </summary>
    /// <param name="response">The http Response object to parse</param>
    public override void Parse(Response response)
    {
        var categoryList = new List<Category>();

        foreach (var Links in response.Css("#menuFixed > ul > li > a "))
        {
            var cat = new Category();
            cat.URL = Links.Attributes ["href"];
            cat.Name = Links.InnerText;
            categoryList.Add(cat);
        }
        Scrape(categoryList, "Shopping.Jsonl");
    }
}
public class ShoppingScraper : WebScraper
{
    /// <summary>
    /// Override this method initialize your web-scraper.
    /// Important tasks will be to Request at least one start url... and set allowed/banned domain or url patterns.
    /// </summary>
    public override void Init()
    {
        License.LicenseKey = "LicenseKey";
        this.LoggingLevel = WebScraper.LogLevel.All;
        this.WorkingDirectory = AppSetting.GetAppRoot() + @"\ShoppingSiteSample\Output\";
        this.Request("www.webSite.com", Parse);
    }

    /// <summary>
    /// Override this method to create the default Response handler for your web scraper.
    /// If you have multiple page types, you can add additional similar methods.
    /// </summary>
    /// <param name="response">The http Response object to parse</param>
    public override void Parse(Response response)
    {
        var categoryList = new List<Category>();

        foreach (var Links in response.Css("#menuFixed > ul > li > a "))
        {
            var cat = new Category();
            cat.URL = Links.Attributes ["href"];
            cat.Name = Links.InnerText;
            categoryList.Add(cat);
        }
        Scrape(categoryList, "Shopping.Jsonl");
    }
}
IRON VB CONVERTER ERROR developers@ironsoftware.com
VB   C#

從菜單刮取鏈接

ShoppingSiteScrapeMenu related to 從購物網站抓取內容

讓我們更新代碼來抓取主要類別及其所有子鏈接

public override void Parse(Response response)
{
    // List of Categories Links (Root)
    var categoryList = new List<Category>();

    foreach (var li in response.Css("#menuFixed > ul > li"))
    {
        // List Of Main Links
        foreach (var Links in li.Css("a"))
        {
            var cat = new Category();
            cat.URL = Links.Attributes ["href"];
            cat.Name = Links.InnerText;
            cat.SubCategories = new List<Category>();
            // List of Sub Catgories Links
            foreach (var subCategory in li.Css("a [class=subcategory]"))
            {
                var subcat = new Category();
                subcat.URL = Links.Attributes ["href"];
                subcat.Name = Links.InnerText;
                // Check If Link Exist Before 
                if (cat.SubCategories.Find(c=>c.Name== subcat.Name && c.URL == subcat.URL) == null)
                {
                    // Add Sublinks
                    cat.SubCategories.Add(subcat);
                }
            }
            // Add Categories
            categoryList.Add(cat);
        }
    }
    Scrape(categoryList, "Shopping.Jsonl");
}
public override void Parse(Response response)
{
    // List of Categories Links (Root)
    var categoryList = new List<Category>();

    foreach (var li in response.Css("#menuFixed > ul > li"))
    {
        // List Of Main Links
        foreach (var Links in li.Css("a"))
        {
            var cat = new Category();
            cat.URL = Links.Attributes ["href"];
            cat.Name = Links.InnerText;
            cat.SubCategories = new List<Category>();
            // List of Sub Catgories Links
            foreach (var subCategory in li.Css("a [class=subcategory]"))
            {
                var subcat = new Category();
                subcat.URL = Links.Attributes ["href"];
                subcat.Name = Links.InnerText;
                // Check If Link Exist Before 
                if (cat.SubCategories.Find(c=>c.Name== subcat.Name && c.URL == subcat.URL) == null)
                {
                    // Add Sublinks
                    cat.SubCategories.Add(subcat);
                }
            }
            // Add Categories
            categoryList.Add(cat);
        }
    }
    Scrape(categoryList, "Shopping.Jsonl");
}
Public Overrides Sub Parse(ByVal response As Response)
	' List of Categories Links (Root)
	Dim categoryList = New List(Of Category)()

	For Each li In response.Css("#menuFixed > ul > li")
		' List Of Main Links
		For Each Links In li.Css("a")
			Dim cat = New Category()
			cat.URL = Links.Attributes ("href")
			cat.Name = Links.InnerText
			cat.SubCategories = New List(Of Category)()
			' List of Sub Catgories Links
			For Each subCategory In li.Css("a [class=subcategory]")
				Dim subcat = New Category()
				subcat.URL = Links.Attributes ("href")
				subcat.Name = Links.InnerText
				' Check If Link Exist Before 
				If cat.SubCategories.Find(Function(c) c.Name= subcat.Name AndAlso c.URL = subcat.URL) Is Nothing Then
					' Add Sublinks
					cat.SubCategories.Add(subcat)
				End If
			Next subCategory
			' Add Categories
			categoryList.Add(cat)
		Next Links
	Next li
	Scrape(categoryList, "Shopping.Jsonl")
End Sub
VB   C#

現在我們有了所有網站分類的連結,讓我們開始抓取每個分類中的產品。

讓我們導航到任何類別並檢查內容。

ProductSubCategoryList related to 從購物網站抓取內容

讓我們看看它的代碼

<section class="products">
    <div class="sku -gallery -validate-size " data-sku="AG249FA0T2PSGNAFAMZ" ft-product-sizes="41,42,43,44,45" ft-product-color="Multicolour">
        <a class="link" href="http://www.WebSite.com/agu-bundle-of-2-sneakers-black-navy-blue-653884.html">
            <div class="image-wrapper default-state">
                <img class="lazy image -loaded" alt="Bundle Of 2 Sneakers - Black &amp;amp; Navy Blue" data-image-vertical="1" width="210" height="262" src="https://static.WebSite.com/p/agu-6208-488356-1-catalog_grid_3.jpg" data-sku="AG249FA0T2PSGNAFAMZ" data-src="https://static.WebSite.com/p/agu-6208-488356-1-catalog_grid_3.jpg" data-placeholder="placeholder_m_1.jpg"><noscript>&lt;img src="https://static.WebSite.com/p/agu-6208-488356-1-catalog_grid_3.jpg" width="210" height="262" class="image" /&gt;</noscript>
            </div> <h2 class="title">
                <span class="brand ">Agu&nbsp;</span>
                <span class="name" dir="ltr">Bundle Of 2 Sneakers - Black &amp; Navy Blue</span>
            </h2><div class="price-container clearfix">
                <span class="price-box">
                    <span class="price">
                        <span data-currency-iso="EGP">EGP</span>
                        <span dir="ltr" data-price="299">299</span>
                    </span>   <span class="price -old  -no-special"></span>
                </span>
            </div><div class="rating-stars"><div class="stars-container"><div class="stars" style="width: 62%"></div></div> <div class="total-ratings">(30)</div> </div>    <span class="shop-first-logo-container"><img src="http://www.WebSite.com/images/local/logos/shop_first/ShoppingSite/logo_normal.png" data-src="http://www.WebSite.com/images/local/logos/shop_first/ShoppingSite/logo_normal.png" class="lazy shop-first-logo-img -mbxs -loaded"> </span>
            <span class="osh-icon -ShoppingSite-local shop_local--logo -block -mbs -mts"></span>
            <div class="list -sizes" data-selected-sku="">
                <span class="js-link sku-size" data-href="http://www.WebSite.com/agu-bundle-of-2-sneakers-black-navy-blue-653884.html?size=41">41</span>     <span class="js-link sku-size" data-href="http://www.WebSite.com/agu-bundle-of-2-sneakers-black-navy-blue-653884.html?size=42">42</span>
                <span class="js-link sku-size" data-href="http://www.WebSite.com/agu-bundle-of-2-sneakers-black-navy-blue-653884.html?size=43">43</span>     <span class="js-link sku-size" data-href="http://www.WebSite.com/agu-bundle-of-2-sneakers-black-navy-blue-653884.html?size=44">44</span>
                <span class="js-link sku-size" data-href="http://www.WebSite.com/agu-bundle-of-2-sneakers-black-navy-blue-653884.html?size=45">45</span>
            </div>
        </a>
    </div>
    <div class="sku -gallery -validate-size " data-sku="LE047FA01SRK4NAFAMZ" ft-product-sizes="110,115,120,125,130,135" ft-product-color="Black">
        <a class="link" href="http://www.WebSite.com/leather-shop-genuine-leather-belt-black-712030.html">
            <div class="image-wrapper default-state"><img class="lazy image -loaded" alt="Genuine Leather Belt - Black" data-image-vertical="1" width="210" height="262" src="https://static.WebSite.com/p/leather-shop-1831-030217-1-catalog_grid_3.jpg" data-sku="LE047FA01SRK4NAFAMZ" data-src="https://static.WebSite.com/p/leather-shop-1831-030217-1-catalog_grid_3.jpg" data-placeholder="placeholder_m_1.jpg"><noscript>&lt;img src="https://static.WebSite.com/p/leather-shop-1831-030217-1-catalog_grid_3.jpg" width="210" height="262" class="image" /&gt;</noscript></div>
            <h2 class="title"><span class="brand ">Leather Shop&nbsp;</span> <span class="name" dir="ltr">Genuine Leather Belt - Black</span></h2><div class="price-container clearfix">
                <span class="sale-flag-percent">-29%</span>  <span class="price-box"> <span class="price"><span data-currency-iso="EGP">EGP</span> <span dir="ltr" data-price="96">96</span> </span>   <span class="price -old "><span data-currency-iso="EGP">EGP</span> <span dir="ltr" data-price="135">135</span> </span> </span>
            </div><div class="rating-stars"><div class="stars-container"><div class="stars" style="width: 100%"></div></div> <div class="total-ratings">(1)</div> </div>
            <span class="osh-icon -ShoppingSite-local shop_local--logo -block -mbs -mts"></span>    <div class="list -sizes" data-selected-sku="">
                <span class="js-link sku-size" data-href="http://www.WebSite.com/leather-shop-genuine-leather-belt-black-712030.html?size=110">110</span>     <span class="js-link sku-size" data-href="http://www.WebSite.com/leather-shop-genuine-leather-belt-black-712030.html?size=115">115</span>
                <span class="js-link sku-size" data-href="http://www.WebSite.com/leather-shop-genuine-leather-belt-black-712030.html?size=120">120</span>     <span class="js-link sku-size" data-href="http://www.WebSite.com/leather-shop-genuine-leather-belt-black-712030.html?size=125">125</span>     <span class="js-link sku-size" data-href="http://www.WebSite.com/leather-shop-genuine-leather-belt-black-712030.html?size=130">130</span>
                <span class="js-link sku-size" data-href="http://www.WebSite.com/leather-shop-genuine-leather-belt-black-712030.html?size=135">135</span>
            </div>
        </a>
    </div>
</section>
<section class="products">
    <div class="sku -gallery -validate-size " data-sku="AG249FA0T2PSGNAFAMZ" ft-product-sizes="41,42,43,44,45" ft-product-color="Multicolour">
        <a class="link" href="http://www.WebSite.com/agu-bundle-of-2-sneakers-black-navy-blue-653884.html">
            <div class="image-wrapper default-state">
                <img class="lazy image -loaded" alt="Bundle Of 2 Sneakers - Black &amp;amp; Navy Blue" data-image-vertical="1" width="210" height="262" src="https://static.WebSite.com/p/agu-6208-488356-1-catalog_grid_3.jpg" data-sku="AG249FA0T2PSGNAFAMZ" data-src="https://static.WebSite.com/p/agu-6208-488356-1-catalog_grid_3.jpg" data-placeholder="placeholder_m_1.jpg"><noscript>&lt;img src="https://static.WebSite.com/p/agu-6208-488356-1-catalog_grid_3.jpg" width="210" height="262" class="image" /&gt;</noscript>
            </div> <h2 class="title">
                <span class="brand ">Agu&nbsp;</span>
                <span class="name" dir="ltr">Bundle Of 2 Sneakers - Black &amp; Navy Blue</span>
            </h2><div class="price-container clearfix">
                <span class="price-box">
                    <span class="price">
                        <span data-currency-iso="EGP">EGP</span>
                        <span dir="ltr" data-price="299">299</span>
                    </span>   <span class="price -old  -no-special"></span>
                </span>
            </div><div class="rating-stars"><div class="stars-container"><div class="stars" style="width: 62%"></div></div> <div class="total-ratings">(30)</div> </div>    <span class="shop-first-logo-container"><img src="http://www.WebSite.com/images/local/logos/shop_first/ShoppingSite/logo_normal.png" data-src="http://www.WebSite.com/images/local/logos/shop_first/ShoppingSite/logo_normal.png" class="lazy shop-first-logo-img -mbxs -loaded"> </span>
            <span class="osh-icon -ShoppingSite-local shop_local--logo -block -mbs -mts"></span>
            <div class="list -sizes" data-selected-sku="">
                <span class="js-link sku-size" data-href="http://www.WebSite.com/agu-bundle-of-2-sneakers-black-navy-blue-653884.html?size=41">41</span>     <span class="js-link sku-size" data-href="http://www.WebSite.com/agu-bundle-of-2-sneakers-black-navy-blue-653884.html?size=42">42</span>
                <span class="js-link sku-size" data-href="http://www.WebSite.com/agu-bundle-of-2-sneakers-black-navy-blue-653884.html?size=43">43</span>     <span class="js-link sku-size" data-href="http://www.WebSite.com/agu-bundle-of-2-sneakers-black-navy-blue-653884.html?size=44">44</span>
                <span class="js-link sku-size" data-href="http://www.WebSite.com/agu-bundle-of-2-sneakers-black-navy-blue-653884.html?size=45">45</span>
            </div>
        </a>
    </div>
    <div class="sku -gallery -validate-size " data-sku="LE047FA01SRK4NAFAMZ" ft-product-sizes="110,115,120,125,130,135" ft-product-color="Black">
        <a class="link" href="http://www.WebSite.com/leather-shop-genuine-leather-belt-black-712030.html">
            <div class="image-wrapper default-state"><img class="lazy image -loaded" alt="Genuine Leather Belt - Black" data-image-vertical="1" width="210" height="262" src="https://static.WebSite.com/p/leather-shop-1831-030217-1-catalog_grid_3.jpg" data-sku="LE047FA01SRK4NAFAMZ" data-src="https://static.WebSite.com/p/leather-shop-1831-030217-1-catalog_grid_3.jpg" data-placeholder="placeholder_m_1.jpg"><noscript>&lt;img src="https://static.WebSite.com/p/leather-shop-1831-030217-1-catalog_grid_3.jpg" width="210" height="262" class="image" /&gt;</noscript></div>
            <h2 class="title"><span class="brand ">Leather Shop&nbsp;</span> <span class="name" dir="ltr">Genuine Leather Belt - Black</span></h2><div class="price-container clearfix">
                <span class="sale-flag-percent">-29%</span>  <span class="price-box"> <span class="price"><span data-currency-iso="EGP">EGP</span> <span dir="ltr" data-price="96">96</span> </span>   <span class="price -old "><span data-currency-iso="EGP">EGP</span> <span dir="ltr" data-price="135">135</span> </span> </span>
            </div><div class="rating-stars"><div class="stars-container"><div class="stars" style="width: 100%"></div></div> <div class="total-ratings">(1)</div> </div>
            <span class="osh-icon -ShoppingSite-local shop_local--logo -block -mbs -mts"></span>    <div class="list -sizes" data-selected-sku="">
                <span class="js-link sku-size" data-href="http://www.WebSite.com/leather-shop-genuine-leather-belt-black-712030.html?size=110">110</span>     <span class="js-link sku-size" data-href="http://www.WebSite.com/leather-shop-genuine-leather-belt-black-712030.html?size=115">115</span>
                <span class="js-link sku-size" data-href="http://www.WebSite.com/leather-shop-genuine-leather-belt-black-712030.html?size=120">120</span>     <span class="js-link sku-size" data-href="http://www.WebSite.com/leather-shop-genuine-leather-belt-black-712030.html?size=125">125</span>     <span class="js-link sku-size" data-href="http://www.WebSite.com/leather-shop-genuine-leather-belt-black-712030.html?size=130">130</span>
                <span class="js-link sku-size" data-href="http://www.WebSite.com/leather-shop-genuine-leather-belt-black-712030.html?size=135">135</span>
            </div>
        </a>
    </div>
</section>
HTML

讓我們為這個內容建立我們的產品模型。

public class Product
{
    /// <summary>
    /// Gets or sets the name.
    /// </summary>
    /// <value>
    /// The name.
    /// </value>
    public string Name { get; set; }
    /// <summary>
    /// Gets or sets the price.
    /// </summary>
    /// <value>
    /// The price.
    /// </value>
    public string Price { get; set; }
    /// <summary>
    /// Gets or sets the image.
    /// </summary>
    /// <value>
    /// The image.
    /// </value>
    public string Image { get; set; }
}
public class Product
{
    /// <summary>
    /// Gets or sets the name.
    /// </summary>
    /// <value>
    /// The name.
    /// </value>
    public string Name { get; set; }
    /// <summary>
    /// Gets or sets the price.
    /// </summary>
    /// <value>
    /// The price.
    /// </value>
    public string Price { get; set; }
    /// <summary>
    /// Gets or sets the image.
    /// </summary>
    /// <value>
    /// The image.
    /// </value>
    public string Image { get; set; }
}
Public Class Product
	''' <summary>
	''' Gets or sets the name.
	''' </summary>
	''' <value>
	''' The name.
	''' </value>
	Public Property Name() As String
	''' <summary>
	''' Gets or sets the price.
	''' </summary>
	''' <value>
	''' The price.
	''' </value>
	Public Property Price() As String
	''' <summary>
	''' Gets or sets the image.
	''' </summary>
	''' <value>
	''' The image.
	''' </value>
	Public Property Image() As String
End Class
VB   C#

若要抓取分類頁面,我們新增一個抓取方法:

public void ParseCatgory(Response response)
{          
    // List of Products Links (Root)
    var productList = new List<Product>();

    foreach (var Links in response.Css("body > main > section.osh-content > section.products > div > a"))
    {
        var product = new Product();
        product.Name = Links.InnerText;
        product.Image = Links.Css("div.image-wrapper.default-state > img")[0].Attributes ["src"];                
        productList.Add(product);
    }

    Scrape(productList, "Products.Jsonl");
}
public void ParseCatgory(Response response)
{          
    // List of Products Links (Root)
    var productList = new List<Product>();

    foreach (var Links in response.Css("body > main > section.osh-content > section.products > div > a"))
    {
        var product = new Product();
        product.Name = Links.InnerText;
        product.Image = Links.Css("div.image-wrapper.default-state > img")[0].Attributes ["src"];                
        productList.Add(product);
    }

    Scrape(productList, "Products.Jsonl");
}
Public Sub ParseCatgory(ByVal response As Response)
	' List of Products Links (Root)
	Dim productList = New List(Of Product)()

	For Each Links In response.Css("body > main > section.osh-content > section.products > div > a")
		Dim product As New Product()
		product.Name = Links.InnerText
		product.Image = Links.Css("div.image-wrapper.default-state > img")(0).Attributes ("src")
		productList.Add(product)
	Next Links

	Scrape(productList, "Products.Jsonl")
End Sub
VB   C#

高級網頁抓取功能

HttpIdentity 功能:

一些網站系統要求使用者登入後才能查看內容; 在這種情況下,我們可以使用 HttpIdentity: -

HttpIdentity id = new HttpIdentity();
id.NetworkUsername = "username";
id.NetworkPassword = "pwd";
Identities.Add(id); 
HttpIdentity id = new HttpIdentity();
id.NetworkUsername = "username";
id.NetworkPassword = "pwd";
Identities.Add(id); 
Dim id As New HttpIdentity()
id.NetworkUsername = "username"
id.NetworkPassword = "pwd"
Identities.Add(id)
VB   C#

在 IronWebScraper 中最令人印象深刻和强大的功能之一,是能够使用数千种独特的(用戶的憑證和/或瀏覽器引擎)利用多重登入會話來欺騙或抓取網站。

public override void Init()
{
    License.LicenseKey = " LicenseKey ";
    this.LoggingLevel = WebScraper.LogLevel.All;
    this.WorkingDirectory = AppSetting.GetAppRoot() + @"\ShoppingSiteSample\Output\";
    var proxies = "IP-Proxy1: 8080,IP-Proxy2: 8081".Split(',');
    foreach (var UA in IronWebScraper.CommonUserAgents.ChromeDesktopUserAgents)
    {
        foreach (var proxy in proxies)
        {
            Identities.Add(new HttpIdentity()
            {
                UserAgent = UA,
                UseCookies = true,
                Proxy = proxy
            });
        }
    }
    this.Request("http://www.Website.com", Parse);
}
public override void Init()
{
    License.LicenseKey = " LicenseKey ";
    this.LoggingLevel = WebScraper.LogLevel.All;
    this.WorkingDirectory = AppSetting.GetAppRoot() + @"\ShoppingSiteSample\Output\";
    var proxies = "IP-Proxy1: 8080,IP-Proxy2: 8081".Split(',');
    foreach (var UA in IronWebScraper.CommonUserAgents.ChromeDesktopUserAgents)
    {
        foreach (var proxy in proxies)
        {
            Identities.Add(new HttpIdentity()
            {
                UserAgent = UA,
                UseCookies = true,
                Proxy = proxy
            });
        }
    }
    this.Request("http://www.Website.com", Parse);
}
Public Overrides Sub Init()
	License.LicenseKey = " LicenseKey "
	Me.LoggingLevel = WebScraper.LogLevel.All
	Me.WorkingDirectory = AppSetting.GetAppRoot() & "\ShoppingSiteSample\Output\"
	Dim proxies = "IP-Proxy1: 8080,IP-Proxy2: 8081".Split(","c)
	For Each UA In IronWebScraper.CommonUserAgents.ChromeDesktopUserAgents
		For Each proxy In proxies
			Identities.Add(New HttpIdentity() With {
				.UserAgent = UA,
				.UseCookies = True,
				.Proxy = proxy
			})
		Next proxy
	Next UA
	Me.Request("http://www.Website.com", Parse)
End Sub
VB   C#

您有多個屬性可以提供不同的行為,因此可以防止網站封鎖您。

以下是其中一些屬性: -

  • NetworkDomain:用於用戶認證的網路域。 支持Windows、NTLM、Kerberos、Linux、BSD和Mac OS X網絡。 必須與此同時使用(網絡用戶名和網絡密碼)
  • NetworkUsername:用於使用者身份驗證的網路/HTTP 使用者名稱。 支持 Http、Windows 網路、NTLM、Kerberos、Linux 網路、BSD 網路和 Mac OS。
  • NetworkPassword:用於用戶認證的網絡/HTTP 密碼。 支持Http、Windows網絡、NTLM、Keroberos、Linux網絡、BSD網絡和Mac OS。
  • 代理:設定代理伺服器設置
  • UserAgent:設置瀏覽器引擎(Chrome 桌面版、Chrome Mobile、Chrome 平板版、IE 和 Firefox 等。)
  • HttpRequestHeaders:用於將與此身份一起使用的自定義標頭值,並接受字典物件。(字典<string,string>)
  • UseCookies:啟用/禁用使用 cookies

    IronWebScraper 使用隨機身份運行抓取器。 如果我們需要指定使用特定的身份來解析頁面,我們可以這樣做。

public override void Init()
{
    License.LicenseKey = " LicenseKey ";
    this.LoggingLevel = WebScraper.LogLevel.All;
    this.WorkingDirectory = AppSetting.GetAppRoot() + @"\ShoppingSiteSample\Output\";
    HttpIdentity identity = new HttpIdentity();
    identity.NetworkUsername = "username";
    identity.NetworkPassword = "pwd";
    Identities.Add(id);
    this.Request("http://www.Website.com", Parse, identity);
}
public override void Init()
{
    License.LicenseKey = " LicenseKey ";
    this.LoggingLevel = WebScraper.LogLevel.All;
    this.WorkingDirectory = AppSetting.GetAppRoot() + @"\ShoppingSiteSample\Output\";
    HttpIdentity identity = new HttpIdentity();
    identity.NetworkUsername = "username";
    identity.NetworkPassword = "pwd";
    Identities.Add(id);
    this.Request("http://www.Website.com", Parse, identity);
}
Public Overrides Sub Init()
	License.LicenseKey = " LicenseKey "
	Me.LoggingLevel = WebScraper.LogLevel.All
	Me.WorkingDirectory = AppSetting.GetAppRoot() & "\ShoppingSiteSample\Output\"
	Dim identity As New HttpIdentity()
	identity.NetworkUsername = "username"
	identity.NetworkPassword = "pwd"
	Identities.Add(id)
	Me.Request("http://www.Website.com", Parse, identity)
End Sub
VB   C#

啟用網頁快取功能:

此功能用於緩存請求的頁面。 它經常在開發和測試階段使用; 允許開發者在更新代碼後緩存所需的頁面以供重用。 這使您能夠在重新啟動您的網絡抓取工具後,在已緩存的頁面上執行代碼,而無需每次都連接到實時網站。(即時回放).

您可以在Init中使用它()方法

啟用網頁快取();

啟用網頁快取(時限到期);

它將您的快取資料儲存到工作目錄資料夾下的WebCache資料夾中。

public override void Init()
{
    License.LicenseKey = " LicenseKey ";
    this.LoggingLevel = WebScraper.LogLevel.All;
    this.WorkingDirectory = AppSetting.GetAppRoot() + @"\ShoppingSiteSample\Output\";
    EnableWebCache(new TimeSpan(1,30,30));
    this.Request("http://www.WebSite.com", Parse);
}
public override void Init()
{
    License.LicenseKey = " LicenseKey ";
    this.LoggingLevel = WebScraper.LogLevel.All;
    this.WorkingDirectory = AppSetting.GetAppRoot() + @"\ShoppingSiteSample\Output\";
    EnableWebCache(new TimeSpan(1,30,30));
    this.Request("http://www.WebSite.com", Parse);
}
Public Overrides Sub Init()
	License.LicenseKey = " LicenseKey "
	Me.LoggingLevel = WebScraper.LogLevel.All
	Me.WorkingDirectory = AppSetting.GetAppRoot() & "\ShoppingSiteSample\Output\"
	EnableWebCache(New TimeSpan(1,30,30))
	Me.Request("http://www.WebSite.com", Parse)
End Sub
VB   C#

IronWebScraper 也具有在重新啟動代碼後通過使用 Start 設置引擎啟動過程名稱來使您的引擎繼續抓取的功能。(爬行ID)

static void Main(string [] args)
{
    // Create Object From Scraper class
    EngineScraper scrape = new EngineScraper();
    // Start Scraping
    scrape.Start("enginestate");
}
static void Main(string [] args)
{
    // Create Object From Scraper class
    EngineScraper scrape = new EngineScraper();
    // Start Scraping
    scrape.Start("enginestate");
}
Shared Sub Main(ByVal args() As String)
	' Create Object From Scraper class
	Dim scrape As New EngineScraper()
	' Start Scraping
	scrape.Start("enginestate")
End Sub
VB   C#

執行請求和回應將被保存到工作目錄內的 SavedState 資料夾中。

限流

我們可以控制每個域的最小和最大連接數以及連接速度。

public override void Init()
{
    License.LicenseKey = "LicenseKey";
    this.LoggingLevel = WebScraper.LogLevel.All;
    this.WorkingDirectory = AppSetting.GetAppRoot() + @"\ShoppingSiteSample\Output\";
    // Gets or sets the total number of allowed open HTTP requests (threads)
    this.MaxHttpConnectionLimit = 80;
    // Gets or sets minimum polite delay (pause)between request to a given domain or IP address.
    this.RateLimitPerHost = TimeSpan.FromMilliseconds(50);            
    //     Gets or sets the allowed number of concurrent HTTP requests (threads) per hostname
    //     or IP address. This helps protect hosts against too many requests.
    this.OpenConnectionLimitPerHost = 25;
    this.ObeyRobotsDotTxt = false;
    //     Makes the WebSraper intelligently throttle requests not only by hostname, but
    //     also by host servers' IP addresses. This is polite in-case multiple scraped domains
    //     are hosted on the same machine.
    this.ThrottleMode = Throttle.ByDomainHostName;

    this.Request("https://www.Website.com", Parse);
}
public override void Init()
{
    License.LicenseKey = "LicenseKey";
    this.LoggingLevel = WebScraper.LogLevel.All;
    this.WorkingDirectory = AppSetting.GetAppRoot() + @"\ShoppingSiteSample\Output\";
    // Gets or sets the total number of allowed open HTTP requests (threads)
    this.MaxHttpConnectionLimit = 80;
    // Gets or sets minimum polite delay (pause)between request to a given domain or IP address.
    this.RateLimitPerHost = TimeSpan.FromMilliseconds(50);            
    //     Gets or sets the allowed number of concurrent HTTP requests (threads) per hostname
    //     or IP address. This helps protect hosts against too many requests.
    this.OpenConnectionLimitPerHost = 25;
    this.ObeyRobotsDotTxt = false;
    //     Makes the WebSraper intelligently throttle requests not only by hostname, but
    //     also by host servers' IP addresses. This is polite in-case multiple scraped domains
    //     are hosted on the same machine.
    this.ThrottleMode = Throttle.ByDomainHostName;

    this.Request("https://www.Website.com", Parse);
}
Public Overrides Sub Init()
	License.LicenseKey = "LicenseKey"
	Me.LoggingLevel = WebScraper.LogLevel.All
	Me.WorkingDirectory = AppSetting.GetAppRoot() & "\ShoppingSiteSample\Output\"
	' Gets or sets the total number of allowed open HTTP requests (threads)
	Me.MaxHttpConnectionLimit = 80
	' Gets or sets minimum polite delay (pause)between request to a given domain or IP address.
	Me.RateLimitPerHost = TimeSpan.FromMilliseconds(50)
	'     Gets or sets the allowed number of concurrent HTTP requests (threads) per hostname
	'     or IP address. This helps protect hosts against too many requests.
	Me.OpenConnectionLimitPerHost = 25
	Me.ObeyRobotsDotTxt = False
	'     Makes the WebSraper intelligently throttle requests not only by hostname, but
	'     also by host servers' IP addresses. This is polite in-case multiple scraped domains
	'     are hosted on the same machine.
	Me.ThrottleMode = Throttle.ByDomainHostName

	Me.Request("https://www.Website.com", Parse)
End Sub
VB   C#

節流屬性

  • MaxHttpConnectionLimit
    允許開啟的HTTP請求總數 (線程)
  • RateLimitPerHost
    最小禮貌延遲或暫停 (以毫秒計算)在請求到指定域名或IP地址之間
  • OpenConnectionLimitPerHost
    允許的並發 HTTP 請求數量 (線程)
  • ThrottleMode
    使WebScraper不僅根據主機名稱,而且根據主機伺服器的IP地址智能地限制請求。 这是礼貌的做法,以防多个抓取的域名托管在同一台机器上。

附錄

如何創建Windows表單應用程序?

我們應該使用 Visual Studio 2013 或更高版本來進行這項工作。

按照以下步驟創建一個新的 Windows Forms 項目:

  1. 打開 Visual Studio

    Enterprise2015 related to 如何創建Windows表單應用程序?

  2. 檔案 -> 新增 -> 專案

    FileNewProject related to 如何創建Windows表單應用程序?

  3. 從範本中選擇程式語言(Visual C# 或 VB)-> Windows -> Windows Forms 應用程式

    CreateWindowsApp related to 如何創建Windows表單應用程序?

    專案名稱: IronScraperSample

位置:請選擇您硬碟上的位置

WindowsAppMainScreen related to 如何創建Windows表單應用程序?

如何創建網絡表單應用程序?

您應該使用 Visual Studio 2013 或更高版本來進行此操作。

按照以下步驟創建一個新的 Asp.NET 網頁表單專案

  1. 打開 Visual Studio

    Enterprise2015 related to 如何創建網絡表單應用程序?

  2. 檔案 -> 新增 -> 專案

    FileNewProject related to 如何創建網絡表單應用程序?

  3. 從範本選擇程式語言(Visual C# 或 VB)-> 網頁 -> ASP.NET Web 應用程式(.NET框架).

    ASPNETWebApplication related to 如何創建網絡表單應用程序?

    專案名稱: IronScraperSample

位置:從您的硬碟選擇一個位置

  1. 從您的 ASP.NET 模板

    1. 選擇空白模板
    2. 檢查網頁表單
    3. ASPNETTemplates related to 如何創建網絡表單應用程序?

  2. 現在您的基本 ASP.NET 網頁表單專案已建立

    ASPNETWebFormProject related to 如何創建網絡表單應用程序?

    點擊這裡下載完整教學範例程式碼專案。

.NET軟體工程師 對很多人來說,這是從 .NET 生成 PDF 文件的最有效方法,因為不需要學習額外的 API 或導航複雜的設計系統

艾哈邁德·阿布爾馬格德

跨國IT公司 .NET 軟體解決方案架構師

Ahmed 是一位經驗豐富且具有認證的微軟技術專家,擁有超過10年在 IT 和軟體開發領域的經驗。他曾在多家不同公司工作,現在是某跨國 IT 公司的國家經理。

Ahmed已經使用IronPDF和IronWebScraper超過一年,並在他公司中的多個專案中使用它們。