IronWebScraper 如何 抓取购物网站 使用 C# 抓取购物网站数据 Curtis Chau 已更新:六月 1, 2025 下载 IronWebScraper NuGet 下载 DLL 下载 免费试用 法学硕士副本 法学硕士副本 将页面复制为 Markdown 格式,用于 LLMs 在 ChatGPT 中打开 向 ChatGPT 咨询此页面 在双子座打开 向 Gemini 询问此页面 在双子座打开 向 Gemini 询问此页面 打开困惑 向 Perplexity 询问有关此页面的信息 分享 在 Facebook 上分享 分享到 X(Twitter) 在 LinkedIn 上分享 复制链接 电子邮件文章 This article was translated from English: Does it need improvement? Translated View the article in English 我们选择一个购物网站来抓取其内容。 如您从图片中所见,我们有一个左侧栏,其中包含该网站产品类别的链接。 我们的第一步是研究网站的HTML并计划如何抓取它。 时尚网站的类别有子类别(男装、女装、儿童)。 <li class="menu-item" data-id=""> <a href="https://domain.com/fashion-by-/" class="main-category"> <i class="cat-icon osh-font-fashion"></i> <span class="nav-subTxt">FASHION </span> <i class="osh-font-light-arrow-left"></i><i class="osh-font-light-arrow-right"></i> </a> <div class="navLayerWrapper" style="width: 633px; display: none;"> <div class="submenu"> <div class="column"> <div class="categories"> <a class="category" href="https://domain.com/fashion-by-/?sort=newest&dir=desc&viewType=gridView3">New Arrivals !</a> </div> <div class="categories"> <a class="category" href="https://domain.com/men-fashion/">Men</a> <a class="subcategory" href="https://domain.com/mens-shoes/">Shoes</a> <a class="subcategory" href="https://domain.com/mens-clothing/">Clothing</a> <a class="subcategory" href="https://domain.com/mens-accessories/">Accessories</a> </div> <div class="categories"> <a class="category" href="https://domain.com/women-fashion/">Women</a> <a class="subcategory" href="https://domain.com/womens-shoes/">Shoes</a> <a class="subcategory" href="https://domain.com/womens-clothing/">Clothing</a> <a class="subcategory" href="https://domain.com/womens-accessories/">Accessories</a> </div> <div class="categories"> <a class="category" href="https://domain.com/girls-boys-fashion/">Kids</a> <a class="subcategory" href="https://domain.com/boys-fashion/">Boys</a> <a class="subcategory" href="https://domain.com/girls/">Girls</a> </div> <div class="categories"> <a class="category" href="https://domain.com/maternity-clothes/">Maternity Clothes</a> </div> </div> <div class="column"> <div class="categories"> <span class="category defaultCursor">Men Best Sellers</span> <a class="subcategory" href="https://domain.com/mens-casual-shoes/">Casual Shoes</a> <a class="subcategory" href="https://domain.com/mens-sneakers/">Sneakers</a> <a class="subcategory" href="https://domain.com/mens-t-shirts/">T-shirts</a> <a class="subcategory" href="https://domain.com/mens-polos/">Polos</a> </div> <div class="categories"> <span class="category defaultCursor">Women Best Sellers</span> <a class="subcategory" href="https://domain.com/womens-sandals/">Sandals</a> <a class="subcategory" href="https://domain.com/womens-sneakers/">Sneakers</a> <a class="subcategory" href="https://domain.com/women-dresses/">Dresses</a> <a class="subcategory" href="https://domain.com/women-tops/">Tops</a> </div> <div class="categories"> <a class="category" href="https://domain.com/womens-curvy-clothing/">Women's Curvy Clothing</a> </div> <div class="categories"> <a class="category" href="https://domain.com/fashion-bundles/v/">Fashion Bundles</a> </div> <div class="categories"> <a class="category" href="https://domain.com/hijab-fashion/">Hijab Fashion</a> </div> </div> <div class="column"> <div class="categories"> <a class="category" href="https://domain.com/brands/fashion-by-/">SEE ALL BRANDS</a> <a class="subcategory" href="https://domain.com/adidas/">Adidas</a> <a class="subcategory" href="https://domain.com/converse/">Converse</a> <a class="subcategory" href="https://domain.com/ravin/">Ravin</a> <a class="subcategory" href="https://domain.com/dejavu/">Dejavu</a> <a class="subcategory" href="https://domain.com/agu/">Agu</a> <a class="subcategory" href="https://domain.com/activ/">Activ</a> <a class="subcategory" href="https://domain.com/oxford--bellini--tie-house--milano/">Tie House</a> <a class="subcategory" href="https://domain.com/shoe-room/">Shoe Room</a> <a class="subcategory" href="https://domain.com/town-team/">Town Team</a> </div> </div> </div> </div> </li> <li class="menu-item" data-id=""> <a href="https://domain.com/fashion-by-/" class="main-category"> <i class="cat-icon osh-font-fashion"></i> <span class="nav-subTxt">FASHION </span> <i class="osh-font-light-arrow-left"></i><i class="osh-font-light-arrow-right"></i> </a> <div class="navLayerWrapper" style="width: 633px; display: none;"> <div class="submenu"> <div class="column"> <div class="categories"> <a class="category" href="https://domain.com/fashion-by-/?sort=newest&dir=desc&viewType=gridView3">New Arrivals !</a> </div> <div class="categories"> <a class="category" href="https://domain.com/men-fashion/">Men</a> <a class="subcategory" href="https://domain.com/mens-shoes/">Shoes</a> <a class="subcategory" href="https://domain.com/mens-clothing/">Clothing</a> <a class="subcategory" href="https://domain.com/mens-accessories/">Accessories</a> </div> <div class="categories"> <a class="category" href="https://domain.com/women-fashion/">Women</a> <a class="subcategory" href="https://domain.com/womens-shoes/">Shoes</a> <a class="subcategory" href="https://domain.com/womens-clothing/">Clothing</a> <a class="subcategory" href="https://domain.com/womens-accessories/">Accessories</a> </div> <div class="categories"> <a class="category" href="https://domain.com/girls-boys-fashion/">Kids</a> <a class="subcategory" href="https://domain.com/boys-fashion/">Boys</a> <a class="subcategory" href="https://domain.com/girls/">Girls</a> </div> <div class="categories"> <a class="category" href="https://domain.com/maternity-clothes/">Maternity Clothes</a> </div> </div> <div class="column"> <div class="categories"> <span class="category defaultCursor">Men Best Sellers</span> <a class="subcategory" href="https://domain.com/mens-casual-shoes/">Casual Shoes</a> <a class="subcategory" href="https://domain.com/mens-sneakers/">Sneakers</a> <a class="subcategory" href="https://domain.com/mens-t-shirts/">T-shirts</a> <a class="subcategory" href="https://domain.com/mens-polos/">Polos</a> </div> <div class="categories"> <span class="category defaultCursor">Women Best Sellers</span> <a class="subcategory" href="https://domain.com/womens-sandals/">Sandals</a> <a class="subcategory" href="https://domain.com/womens-sneakers/">Sneakers</a> <a class="subcategory" href="https://domain.com/women-dresses/">Dresses</a> <a class="subcategory" href="https://domain.com/women-tops/">Tops</a> </div> <div class="categories"> <a class="category" href="https://domain.com/womens-curvy-clothing/">Women's Curvy Clothing</a> </div> <div class="categories"> <a class="category" href="https://domain.com/fashion-bundles/v/">Fashion Bundles</a> </div> <div class="categories"> <a class="category" href="https://domain.com/hijab-fashion/">Hijab Fashion</a> </div> </div> <div class="column"> <div class="categories"> <a class="category" href="https://domain.com/brands/fashion-by-/">SEE ALL BRANDS</a> <a class="subcategory" href="https://domain.com/adidas/">Adidas</a> <a class="subcategory" href="https://domain.com/converse/">Converse</a> <a class="subcategory" href="https://domain.com/ravin/">Ravin</a> <a class="subcategory" href="https://domain.com/dejavu/">Dejavu</a> <a class="subcategory" href="https://domain.com/agu/">Agu</a> <a class="subcategory" href="https://domain.com/activ/">Activ</a> <a class="subcategory" href="https://domain.com/oxford--bellini--tie-house--milano/">Tie House</a> <a class="subcategory" href="https://domain.com/shoe-room/">Shoe Room</a> <a class="subcategory" href="https://domain.com/town-team/">Town Team</a> </div> </div> </div> </div> </li> HTML 让我们创建一个项目。 创建一个新的控制台应用程序或为我们的新示例添加一个名为"ShoppingSiteSample"的新文件夹。 添加一个名为"ShoppingScraper"的新类。 第一步将是抓取网站类别及其子类别。 让我们创建一个类别模型: public class Category { /// <summary> /// Gets or sets the name. /// </summary> /// <value> /// The name. /// </value> public string Name { get; set; } /// <summary> /// Gets or sets the URL. /// </summary> /// <value> /// The URL. /// </value> public string URL { get; set; } /// <summary> /// Gets or sets the subcategories. /// </summary> /// <value> /// The subcategories. /// </value> public List<Category> SubCategories { get; set; } } public class Category { /// <summary> /// Gets or sets the name. /// </summary> /// <value> /// The name. /// </value> public string Name { get; set; } /// <summary> /// Gets or sets the URL. /// </summary> /// <value> /// The URL. /// </value> public string URL { get; set; } /// <summary> /// Gets or sets the subcategories. /// </summary> /// <value> /// The subcategories. /// </value> public List<Category> SubCategories { get; set; } } Public Class Category ''' <summary> ''' Gets or sets the name. ''' </summary> ''' <value> ''' The name. ''' </value> Public Property Name() As String ''' <summary> ''' Gets or sets the URL. ''' </summary> ''' <value> ''' The URL. ''' </value> Public Property URL() As String ''' <summary> ''' Gets or sets the subcategories. ''' </summary> ''' <value> ''' The subcategories. ''' </value> Public Property SubCategories() As List(Of Category) End Class $vbLabelText $csharpLabel 现在让我们构建我们的抓取逻辑 public class ShoppingScraper : WebScraper { /// <summary> /// Initialize the web scraper, setting the start URLs and allowed/banned domains or URL patterns. /// </summary> public override void Init() { License.LicenseKey = "LicenseKey"; this.LoggingLevel = WebScraper.LogLevel.All; this.WorkingDirectory = AppSetting.GetAppRoot() + @"\ShoppingSiteSample\Output\"; this.Request("www.webSite.com", Parse); } /// <summary> /// Parses the HTML document of the response to scrap the necessary data. /// </summary> /// <param name="response">The HTTP Response object to parse.</param> public override void Parse(Response response) { var categoryList = new List<Category>(); // Iterate through each link in the menu and extract the category data. foreach (var Links in response.Css("#menuFixed > ul > li > a")) { var cat = new Category { URL = Links.Attributes["href"], Name = Links.InnerText }; categoryList.Add(cat); } // Save the scraped data into a JSONL file. Scrape(categoryList, "Shopping.jsonl"); } } public class ShoppingScraper : WebScraper { /// <summary> /// Initialize the web scraper, setting the start URLs and allowed/banned domains or URL patterns. /// </summary> public override void Init() { License.LicenseKey = "LicenseKey"; this.LoggingLevel = WebScraper.LogLevel.All; this.WorkingDirectory = AppSetting.GetAppRoot() + @"\ShoppingSiteSample\Output\"; this.Request("www.webSite.com", Parse); } /// <summary> /// Parses the HTML document of the response to scrap the necessary data. /// </summary> /// <param name="response">The HTTP Response object to parse.</param> public override void Parse(Response response) { var categoryList = new List<Category>(); // Iterate through each link in the menu and extract the category data. foreach (var Links in response.Css("#menuFixed > ul > li > a")) { var cat = new Category { URL = Links.Attributes["href"], Name = Links.InnerText }; categoryList.Add(cat); } // Save the scraped data into a JSONL file. Scrape(categoryList, "Shopping.jsonl"); } } Public Class ShoppingScraper Inherits WebScraper ''' <summary> ''' Initialize the web scraper, setting the start URLs and allowed/banned domains or URL patterns. ''' </summary> Public Overrides Sub Init() License.LicenseKey = "LicenseKey" Me.LoggingLevel = WebScraper.LogLevel.All Me.WorkingDirectory = AppSetting.GetAppRoot() & "\ShoppingSiteSample\Output\" Me.Request("www.webSite.com", AddressOf Parse) End Sub ''' <summary> ''' Parses the HTML document of the response to scrap the necessary data. ''' </summary> ''' <param name="response">The HTTP Response object to parse.</param> Public Overrides Sub Parse(ByVal response As Response) Dim categoryList = New List(Of Category)() ' Iterate through each link in the menu and extract the category data. For Each Links In response.Css("#menuFixed > ul > li > a") Dim cat = New Category With { .URL = Links.Attributes("href"), .Name = Links.InnerText } categoryList.Add(cat) Next Links ' Save the scraped data into a JSONL file. Scrape(categoryList, "Shopping.jsonl") End Sub End Class $vbLabelText $csharpLabel 从菜单中抓取链接: 让我们更新代码来抓取主类别及其所有子链接: public override void Parse(Response response) { // List of Category Links (Root) var categoryList = new List<Category>(); // Traverse each 'li' under the fixed menu foreach (var li in response.Css("#menuFixed > ul > li")) { // List of Main Links foreach (var Links in li.Css("a")) { var cat = new Category { URL = Links.Attributes["href"], Name = Links.InnerText, SubCategories = new List<Category>() }; // List of Subcategories Links foreach (var subCategory in li.Css("a[class=subcategory]")) { var subcat = new Category { URL = subCategory.Attributes["href"], Name = subCategory.InnerText }; // Check if subcategory link already exists if (cat.SubCategories.Find(c => c.Name == subcat.Name && c.URL == subcat.URL) == null) { // Add sublinks cat.SubCategories.Add(subcat); } } // Add Main Category to the list categoryList.Add(cat); } } // Save the scraped data into a JSONL file. Scrape(categoryList, "Shopping.jsonl"); } public override void Parse(Response response) { // List of Category Links (Root) var categoryList = new List<Category>(); // Traverse each 'li' under the fixed menu foreach (var li in response.Css("#menuFixed > ul > li")) { // List of Main Links foreach (var Links in li.Css("a")) { var cat = new Category { URL = Links.Attributes["href"], Name = Links.InnerText, SubCategories = new List<Category>() }; // List of Subcategories Links foreach (var subCategory in li.Css("a[class=subcategory]")) { var subcat = new Category { URL = subCategory.Attributes["href"], Name = subCategory.InnerText }; // Check if subcategory link already exists if (cat.SubCategories.Find(c => c.Name == subcat.Name && c.URL == subcat.URL) == null) { // Add sublinks cat.SubCategories.Add(subcat); } } // Add Main Category to the list categoryList.Add(cat); } } // Save the scraped data into a JSONL file. Scrape(categoryList, "Shopping.jsonl"); } Public Overrides Sub Parse(ByVal response As Response) ' List of Category Links (Root) Dim categoryList = New List(Of Category)() ' Traverse each 'li' under the fixed menu For Each li In response.Css("#menuFixed > ul > li") ' List of Main Links For Each Links In li.Css("a") Dim cat = New Category With { .URL = Links.Attributes("href"), .Name = Links.InnerText, .SubCategories = New List(Of Category)() } ' List of Subcategories Links For Each subCategory In li.Css("a[class=subcategory]") Dim subcat = New Category With { .URL = subCategory.Attributes("href"), .Name = subCategory.InnerText } ' Check if subcategory link already exists If cat.SubCategories.Find(Function(c) c.Name = subcat.Name AndAlso c.URL = subcat.URL) Is Nothing Then ' Add sublinks cat.SubCategories.Add(subcat) End If Next subCategory ' Add Main Category to the list categoryList.Add(cat) Next Links Next li ' Save the scraped data into a JSONL file. Scrape(categoryList, "Shopping.jsonl") End Sub $vbLabelText $csharpLabel 现在我们有了所有网站类别的链接。 让我们开始抓取每个类别中的产品。 让我们导航到任何类别并检查内容。 让我们查看其代码: <section class="products"> <div class="sku -gallery -validate-size " data-sku="AG249FA0T2PSGNAFAMZ" ft-product-sizes="41,42,43,44,45" ft-product-color="Multicolour"> <a class="link" href="http://www.WebSite.com/agu-bundle-of-2-sneakers-black-navy-blue-653884.html"> <div class="image-wrapper default-state"> <img class="lazy image -loaded" alt="Bundle Of 2 Sneakers - Black & Navy Blue" data-image-vertical="1" width="210" height="262" src="https://static.WebSite.com/p/agu-6208-488356-1-catalog_grid_3.jpg" data-sku="AG249FA0T2PSGNAFAMZ" data-src="https://static.WebSite.com/p/agu-6208-488356-1-catalog_grid_3.jpg" data-placeholder="placeholder_m_1.jpg"> <noscript><img src="https://static.WebSite.com/p/agu-6208-488356-1-catalog_grid_3.jpg" width="210" height="262" class="image" /></noscript> </div> <h2 class="title"> <span class="brand ">Agu </span> <span class="name" dir="ltr">Bundle Of 2 Sneakers - Black & Navy Blue</span> </h2> <div class="price-container clearfix"> <span class="price-box"> <span class="price"> <span data-currency-iso="EGP">EGP</span> <span dir="ltr" data-price="299">299</span> </span> <span class="price -old -no-special"></span> </span> </div> <div class="rating-stars"> <div class="stars-container"> <div class="stars" style="width: 62%"></div> </div> <div class="total-ratings">(30)</div> </div> <span class="shop-first-logo-container"> <img src="http://www.WebSite.com/images/local/logos/shop_first/ShoppingSite/logo_normal.png" data-src="http://www.WebSite.com/images/local/logos/shop_first/ShoppingSite/logo_normal.png" class="lazy shop-first-logo-img -mbxs -loaded"> </span> <span class="osh-icon -ShoppingSite-local shop_local--logo -block -mbs -mts"></span> <div class="list -sizes" data-selected-sku=""> <span class="js-link sku-size" data-href="http://www.WebSite.com/agu-bundle-of-2-sneakers-black-navy-blue-653884.html?size=41">41</span> <span class="js-link sku-size" data-href="http://www.WebSite.com/agu-bundle-of-2-sneakers-black-navy-blue-653884.html?size=42">42</span> <span class="js-link sku-size" data-href="http://www.WebSite.com/agu-bundle-of-2-sneakers-black-navy-blue-653884.html?size=43">43</span> <span class="js-link sku-size" data-href="http://www.WebSite.com/agu-bundle-of-2-sneakers-black-navy-blue-653884.html?size=44">44</span> <span class="js-link sku-size" data-href="http://www.WebSite.com/agu-bundle-of-2-sneakers-black-navy-blue-653884.html?size=45">45</span> </div> </a> </div> <div class="sku -gallery -validate-size " data-sku="LE047FA01SRK4NAFAMZ" ft-product-sizes="110,115,120,125,130,135" ft-product-color="Black"> <a class="link" href="http://www.WebSite.com/leather-shop-genuine-leather-belt-black-712030.html"> <div class="image-wrapper default-state"> <img class="lazy image -loaded" alt="Genuine Leather Belt - Black" data-image-vertical="1" width="210" height="262" src="https://static.WebSite.com/p/leather-shop-1831-030217-1-catalog_grid_3.jpg" data-sku="LE047FA01SRK4NAFAMZ" data-src="https://static.WebSite.com/p/leather-shop-1831-030217-1-catalog_grid_3.jpg" data-placeholder="placeholder_m_1.jpg"> <noscript><img src="https://static.WebSite.com/p/leather-shop-1831-030217-1-catalog_grid_3.jpg" width="210" height="262" class="image" /></noscript> </div> <h2 class="title"><span class="brand ">Leather Shop </span> <span class="name" dir="ltr">Genuine Leather Belt - Black</span></h2> <div class="price-container clearfix"> <span class="sale-flag-percent">-29%</span> <span class="price-box"> <span class="price"><span data-currency-iso="EGP">EGP</span> <span dir="ltr" data-price="96">96</span> </span> <span class="price -old"><span data-currency-iso="EGP">EGP</span> <span dir="ltr" data-price="135">135</span> </span> </span> </div> <div class="rating-stars"> <div class="stars-container"> <div class="stars" style="width: 100%"></div> </div> <div class="total-ratings">(1)</div> </div> <span class="osh-icon -ShoppingSite-local shop_local--logo -block -mbs -mts"></span> <div class="list -sizes" data-selected-sku=""> <span class="js-link sku-size" data-href="http://www.WebSite.com/leather-shop-genuine-leather-belt-black-712030.html?size=110">110</span> <span class="js-link sku-size" data-href="http://www.WebSite.com/leather-shop-genuine-leather-belt-black-712030.html?size=115">115</span> <span class="js-link sku-size" data-href="http://www.WebSite.com/leather-shop-genuine-leather-belt-black-712030.html?size=120">120</span> <span class="js-link sku-size" data-href="http://www.WebSite.com/leather-shop-genuine-leather-belt-black-712030.html?size=125">125</span> <span class="js-link sku-size" data-href="http://www.WebSite.com/leather-shop-genuine-leather-belt-black-712030.html?size=130">130</span> <span class="js-link sku-size" data-href="http://www.WebSite.com/leather-shop-genuine-leather-belt-black-712030.html?size=135">135</span> </div> </a> </div> </section> <section class="products"> <div class="sku -gallery -validate-size " data-sku="AG249FA0T2PSGNAFAMZ" ft-product-sizes="41,42,43,44,45" ft-product-color="Multicolour"> <a class="link" href="http://www.WebSite.com/agu-bundle-of-2-sneakers-black-navy-blue-653884.html"> <div class="image-wrapper default-state"> <img class="lazy image -loaded" alt="Bundle Of 2 Sneakers - Black & Navy Blue" data-image-vertical="1" width="210" height="262" src="https://static.WebSite.com/p/agu-6208-488356-1-catalog_grid_3.jpg" data-sku="AG249FA0T2PSGNAFAMZ" data-src="https://static.WebSite.com/p/agu-6208-488356-1-catalog_grid_3.jpg" data-placeholder="placeholder_m_1.jpg"> <noscript><img src="https://static.WebSite.com/p/agu-6208-488356-1-catalog_grid_3.jpg" width="210" height="262" class="image" /></noscript> </div> <h2 class="title"> <span class="brand ">Agu </span> <span class="name" dir="ltr">Bundle Of 2 Sneakers - Black & Navy Blue</span> </h2> <div class="price-container clearfix"> <span class="price-box"> <span class="price"> <span data-currency-iso="EGP">EGP</span> <span dir="ltr" data-price="299">299</span> </span> <span class="price -old -no-special"></span> </span> </div> <div class="rating-stars"> <div class="stars-container"> <div class="stars" style="width: 62%"></div> </div> <div class="total-ratings">(30)</div> </div> <span class="shop-first-logo-container"> <img src="http://www.WebSite.com/images/local/logos/shop_first/ShoppingSite/logo_normal.png" data-src="http://www.WebSite.com/images/local/logos/shop_first/ShoppingSite/logo_normal.png" class="lazy shop-first-logo-img -mbxs -loaded"> </span> <span class="osh-icon -ShoppingSite-local shop_local--logo -block -mbs -mts"></span> <div class="list -sizes" data-selected-sku=""> <span class="js-link sku-size" data-href="http://www.WebSite.com/agu-bundle-of-2-sneakers-black-navy-blue-653884.html?size=41">41</span> <span class="js-link sku-size" data-href="http://www.WebSite.com/agu-bundle-of-2-sneakers-black-navy-blue-653884.html?size=42">42</span> <span class="js-link sku-size" data-href="http://www.WebSite.com/agu-bundle-of-2-sneakers-black-navy-blue-653884.html?size=43">43</span> <span class="js-link sku-size" data-href="http://www.WebSite.com/agu-bundle-of-2-sneakers-black-navy-blue-653884.html?size=44">44</span> <span class="js-link sku-size" data-href="http://www.WebSite.com/agu-bundle-of-2-sneakers-black-navy-blue-653884.html?size=45">45</span> </div> </a> </div> <div class="sku -gallery -validate-size " data-sku="LE047FA01SRK4NAFAMZ" ft-product-sizes="110,115,120,125,130,135" ft-product-color="Black"> <a class="link" href="http://www.WebSite.com/leather-shop-genuine-leather-belt-black-712030.html"> <div class="image-wrapper default-state"> <img class="lazy image -loaded" alt="Genuine Leather Belt - Black" data-image-vertical="1" width="210" height="262" src="https://static.WebSite.com/p/leather-shop-1831-030217-1-catalog_grid_3.jpg" data-sku="LE047FA01SRK4NAFAMZ" data-src="https://static.WebSite.com/p/leather-shop-1831-030217-1-catalog_grid_3.jpg" data-placeholder="placeholder_m_1.jpg"> <noscript><img src="https://static.WebSite.com/p/leather-shop-1831-030217-1-catalog_grid_3.jpg" width="210" height="262" class="image" /></noscript> </div> <h2 class="title"><span class="brand ">Leather Shop </span> <span class="name" dir="ltr">Genuine Leather Belt - Black</span></h2> <div class="price-container clearfix"> <span class="sale-flag-percent">-29%</span> <span class="price-box"> <span class="price"><span data-currency-iso="EGP">EGP</span> <span dir="ltr" data-price="96">96</span> </span> <span class="price -old"><span data-currency-iso="EGP">EGP</span> <span dir="ltr" data-price="135">135</span> </span> </span> </div> <div class="rating-stars"> <div class="stars-container"> <div class="stars" style="width: 100%"></div> </div> <div class="total-ratings">(1)</div> </div> <span class="osh-icon -ShoppingSite-local shop_local--logo -block -mbs -mts"></span> <div class="list -sizes" data-selected-sku=""> <span class="js-link sku-size" data-href="http://www.WebSite.com/leather-shop-genuine-leather-belt-black-712030.html?size=110">110</span> <span class="js-link sku-size" data-href="http://www.WebSite.com/leather-shop-genuine-leather-belt-black-712030.html?size=115">115</span> <span class="js-link sku-size" data-href="http://www.WebSite.com/leather-shop-genuine-leather-belt-black-712030.html?size=120">120</span> <span class="js-link sku-size" data-href="http://www.WebSite.com/leather-shop-genuine-leather-belt-black-712030.html?size=125">125</span> <span class="js-link sku-size" data-href="http://www.WebSite.com/leather-shop-genuine-leather-belt-black-712030.html?size=130">130</span> <span class="js-link sku-size" data-href="http://www.WebSite.com/leather-shop-genuine-leather-belt-black-712030.html?size=135">135</span> </div> </a> </div> </section> HTML 让我们为此内容构建我们的产品模型: public class Product { /// <summary> /// Gets or sets the name. /// </summary> /// <value> /// The name. /// </value> public string Name { get; set; } /// <summary> /// Gets or sets the price. /// </summary> /// <value> /// The price. /// </value> public string Price { get; set; } /// <summary> /// Gets or sets the image. /// </summary> /// <value> /// The image. /// </value> public string Image { get; set; } } public class Product { /// <summary> /// Gets or sets the name. /// </summary> /// <value> /// The name. /// </value> public string Name { get; set; } /// <summary> /// Gets or sets the price. /// </summary> /// <value> /// The price. /// </value> public string Price { get; set; } /// <summary> /// Gets or sets the image. /// </summary> /// <value> /// The image. /// </value> public string Image { get; set; } } Public Class Product ''' <summary> ''' Gets or sets the name. ''' </summary> ''' <value> ''' The name. ''' </value> Public Property Name() As String ''' <summary> ''' Gets or sets the price. ''' </summary> ''' <value> ''' The price. ''' </value> Public Property Price() As String ''' <summary> ''' Gets or sets the image. ''' </summary> ''' <value> ''' The image. ''' </value> Public Property Image() As String End Class $vbLabelText $csharpLabel 为了抓取类别页面,我们添加一个新的抓取方法: public void ParseCategory(Response response) { // List of Products var productList = new List<Product>(); // Iterate through product links in the product section foreach (var Links in response.Css("section.products > div > a")) { var product = new Product { Name = Links.Css("h2.title > span.name").First().InnerText, Price = Links.Css("div.price-container > span.price-box > span.price > span[data-price]").First().InnerText, Image = Links.Css("div.image-wrapper.default-state > img").First().Attributes["src"] }; productList.Add(product); } // Save the scraped product data into a JSONL file. Scrape(productList, "Products.jsonl"); } public void ParseCategory(Response response) { // List of Products var productList = new List<Product>(); // Iterate through product links in the product section foreach (var Links in response.Css("section.products > div > a")) { var product = new Product { Name = Links.Css("h2.title > span.name").First().InnerText, Price = Links.Css("div.price-container > span.price-box > span.price > span[data-price]").First().InnerText, Image = Links.Css("div.image-wrapper.default-state > img").First().Attributes["src"] }; productList.Add(product); } // Save the scraped product data into a JSONL file. Scrape(productList, "Products.jsonl"); } Public Sub ParseCategory(ByVal response As Response) ' List of Products Dim productList = New List(Of Product)() ' Iterate through product links in the product section For Each Links In response.Css("section.products > div > a") Dim product As New Product With { .Name = Links.Css("h2.title > span.name").First().InnerText, .Price = Links.Css("div.price-container > span.price-box > span.price > span[data-price]").First().InnerText, .Image = Links.Css("div.image-wrapper.default-state > img").First().Attributes("src") } productList.Add(product) Next Links ' Save the scraped product data into a JSONL file. Scrape(productList, "Products.jsonl") End Sub $vbLabelText $csharpLabel 常见问题解答 我如何从在线购物网站抓取产品类别? 您可以使用IronWebScraper通过检查网站的HTML结构来抓取产品类别,尤其是侧边栏的类别列表。初始化抓取工具,定义一个包含名称和URL等属性的'Category'模型,然后使用抓取工具提取这些数据。 为购物网站设置网页抓取工具涉及哪些步骤? 首先创建一个新的C#控制台应用程序和一个类,如'ShoppingScraper'。使用IronWebScraper初始化抓取工具,设置目标URL,并定义抓取逻辑以提取类别和产品详细信息。 我如何使用网页抓取器提取和保存产品详细信息? 使用IronWebScraper,您可以通过定义'Product'模型来提取产品详细信息,如名称、价格和图片。使用抓取工具解析类别页面,并使用Scrape方法将提取的数据保存到JSONL文件。 'Category'模型在网页抓取中的作用是什么? 'Category'模型作为一种数据结构,用于存储每个产品类别的信息,包括其名称、URL和子类别,以便于使用IronWebScraper进行有组织的数据提取。 IronWebScraper如何处理子类别提取? IronWebScraper通过在每个主类别中迭代子类别链接来处理子类别,将它们添加到相应'Category'模型的'SubCategories'列表中,以实现全面的抓取。 在从购物网站抓取数据时有哪些常见挑战? 常见的挑战包括处理动态JavaScript内容、管理分页、适应网站结构的变化,以及确保遵守法律和技术限制,如robots.txt。 IronWebScraper能从购物网站中提取产品尺寸吗? 可以,IronWebScraper可以通过识别产品详细信息中的尺寸信息,并将其存储为提取的产品数据的一部分来捕获产品尺寸。 您如何验证网页抓取数据的准确性? 通过检查网站的HTML,使用精确的CSS选择器,并在各种页面上测试抓取工具,确保数据提取的一致性和正确性。 我如何增强抓取器以检索页面上的所有子链接? 通过改进抓取逻辑以跟随并提取页面上所有子链接的数据,并使用IronWebScraper的功能确保全面的数据收集,来增强抓取器。 Curtis Chau 立即与工程团队聊天 技术作家 Curtis Chau 拥有卡尔顿大学的计算机科学学士学位,专注于前端开发,精通 Node.js、TypeScript、JavaScript 和 React。他热衷于打造直观且美观的用户界面,喜欢使用现代框架并创建结构良好、视觉吸引力强的手册。除了开发之外,Curtis 对物联网 (IoT) 有浓厚的兴趣,探索将硬件和软件集成的新方法。在空闲时间,他喜欢玩游戏和构建 Discord 机器人,将他对技术的热爱与创造力相结合。 准备开始了吗? Nuget 下载 125,527 | Version: 2025.11 刚刚发布 免费 NuGet 下载 总下载量:125,527 查看许可证