IronWebScraper 如何 抓取購物網站 Scrape a Shopping Website in C Darrius Serrant 更新:2026年1月31日 下載 IronWebScraper NuGet 下載 DLL 下載 開始免費試用 LLM副本 LLM副本 將頁面複製為 Markdown 格式,用於 LLMs 在 ChatGPT 中打開 請向 ChatGPT 諮詢此頁面 在雙子座打開 請向 Gemini 詢問此頁面 在 Grok 中打開 向 Grok 詢問此頁面 打開困惑 向 Perplexity 詢問有關此頁面的信息 分享 在 Facebook 上分享 分享到 X(Twitter) 在 LinkedIn 上分享 複製連結 電子郵件文章 This article was translated from English: Does it need improvement? Translated View the article in English 學習如何使用 C# 和 WebScraper 框架從購物網站抓取產品類別和商品,將 HTML 元素中的結構化資料提取到自訂模型中。 本綜合指南將引導您使用IronWebScraper 庫來建立一個強大的電子商務爬蟲。 快速入門:使用 C# 抓取購物網站資料 使用NuGet套件管理器安裝https://www.nuget.org/packages/IronWebScraper PM > Install-Package IronWebScraper 複製並運行這段程式碼。 using IronWebScraper; public class QuickShoppingScraper : WebScraper { public override void Init() { // Apply your license key License.LicenseKey = "YOUR-LICENSE-KEY"; // Set the starting URL this.Request("https://shopping-site.com", Parse); } public override void Parse(Response response) { // Extract product data foreach (var product in response.Css(".product-item")) { var item = new { Name = product.Css(".product-name").First().InnerText, Price = product.Css(".price").First().InnerText, Image = product.Css("img").First().Attributes["src"] }; Scrape(item, "products.jsonl"); } } } // Run the scraper var scraper = new QuickShoppingScraper(); scraper.Start(); 部署到您的生產環境進行測試 今天就在您的專案中開始使用免費試用IronWebScraper Free 30 Day Trial 建立一個名為"ShoppingSiteSample"的新控制台應用程式項目 新增一個名為"ShoppingScraper"的類,該類繼承自 WebScraper 為 Category 和 Product 資料建立模型 重寫 Init() 設定起始 URL 和 Parse() 方法以進行抓取 執行爬蟲程序,將類別和產品提取到 JSONL 檔案中 如何分析購物網站的HTML結構? 選擇一個購物網站來分析其內容結構。 了解 HTML 結構對於成功進行網頁抓取至關重要。 在編寫任何程式碼之前,請花時間使用瀏覽器開發者工具分析目標網站的結構。 如圖所示,左側邊欄包含網站產品類別的連結。 第一步是研究網站的 HTML 程式碼並規劃抓取方法。 分析階段對於建立有效的網路爬蟲策略至關重要。 為什麼理解HTML結構很重要? 時尚網站的類別下有子類別(男裝、女裝、童裝)。 理解這種層級結構有助於設計合適的資料模型和抓取邏輯。 在使用進階網頁爬蟲功能時,正確的 HTML 分析變得更加重要。 <li class="menu-item" data-id=""> <a href="https://domain.com/fashion-by-/" class="main-category"> <i class="cat-icon osh-font-fashion"></i> <span class="nav-subTxt">FASHION </span> <i class="osh-font-light-arrow-left"></i><i class="osh-font-light-arrow-right"></i> </a> <div class="navLayerWrapper" style="width: 633px; display: none;"> <div class="submenu"> <div class="column"> <div class="categories"> <a class="category" href="https://domain.com/fashion-by-/?sort=newest&dir=desc&viewType=gridView3">New Arrivals !</a> </div> <div class="categories"> <a class="category" href="https://domain.com/men-fashion/">Men</a> <a class="subcategory" href="https://domain.com/mens-shoes/">Shoes</a> <a class="subcategory" href="https://domain.com/mens-clothing/">Clothing</a> <a class="subcategory" href="https://domain.com/mens-accessories/">Accessories</a> </div> <div class="categories"> <a class="category" href="https://domain.com/women-fashion/">Women</a> <a class="subcategory" href="https://domain.com/womens-shoes/">Shoes</a> <a class="subcategory" href="https://domain.com/womens-clothing/">Clothing</a> <a class="subcategory" href="https://domain.com/womens-accessories/">Accessories</a> </div> <div class="categories"> <a class="category" href="https://domain.com/girls-boys-fashion/">Kids</a> <a class="subcategory" href="https://domain.com/boys-fashion/">Boys</a> <a class="subcategory" href="https://domain.com/girls/">Girls</a> </div> <div class="categories"> <a class="category" href="https://domain.com/maternity-clothes/">Maternity Clothes</a> </div> </div> <div class="column"> <div class="categories"> <span class="category defaultCursor">Men Best Sellers</span> <a class="subcategory" href="https://domain.com/mens-casual-shoes/">Casual Shoes</a> <a class="subcategory" href="https://domain.com/mens-sneakers/">Sneakers</a> <a class="subcategory" href="https://domain.com/mens-t-shirts/">T-shirts</a> <a class="subcategory" href="https://domain.com/mens-polos/">Polos</a> </div> <div class="categories"> <span class="category defaultCursor">Women Best Sellers</span> <a class="subcategory" href="https://domain.com/womens-sandals/">Sandals</a> <a class="subcategory" href="https://domain.com/womens-sneakers/">Sneakers</a> <a class="subcategory" href="https://domain.com/women-dresses/">Dresses</a> <a class="subcategory" href="https://domain.com/women-tops/">Tops</a> </div> <div class="categories"> <a class="category" href="https://domain.com/womens-curvy-clothing/">Women's Curvy Clothing</a> </div> <div class="categories"> <a class="category" href="https://domain.com/fashion-bundles/v/">Fashion Bundles</a> </div> <div class="categories"> <a class="category" href="https://domain.com/hijab-fashion/">Hijab Fashion</a> </div> </div> <div class="column"> <div class="categories"> <a class="category" href="https://domain.com/brands/fashion-by-/">SEE ALL BRANDS</a> <a class="subcategory" href="https://domain.com/adidas/">Adidas</a> <a class="subcategory" href="https://domain.com/converse/">Converse</a> <a class="subcategory" href="https://domain.com/ravin/">Ravin</a> <a class="subcategory" href="https://domain.com/dejavu/">Dejavu</a> <a class="subcategory" href="https://domain.com/agu/">Agu</a> <a class="subcategory" href="https://domain.com/activ/">Activ</a> <a class="subcategory" href="https://domain.com/oxford--bellini--tie-house--milano/">Tie House</a> <a class="subcategory" href="https://domain.com/shoe-room/">Shoe Room</a> <a class="subcategory" href="https://domain.com/town-team/">Town Team</a> </div> </div> </div> </div> </li> <li class="menu-item" data-id=""> <a href="https://domain.com/fashion-by-/" class="main-category"> <i class="cat-icon osh-font-fashion"></i> <span class="nav-subTxt">FASHION </span> <i class="osh-font-light-arrow-left"></i><i class="osh-font-light-arrow-right"></i> </a> <div class="navLayerWrapper" style="width: 633px; display: none;"> <div class="submenu"> <div class="column"> <div class="categories"> <a class="category" href="https://domain.com/fashion-by-/?sort=newest&dir=desc&viewType=gridView3">New Arrivals !</a> </div> <div class="categories"> <a class="category" href="https://domain.com/men-fashion/">Men</a> <a class="subcategory" href="https://domain.com/mens-shoes/">Shoes</a> <a class="subcategory" href="https://domain.com/mens-clothing/">Clothing</a> <a class="subcategory" href="https://domain.com/mens-accessories/">Accessories</a> </div> <div class="categories"> <a class="category" href="https://domain.com/women-fashion/">Women</a> <a class="subcategory" href="https://domain.com/womens-shoes/">Shoes</a> <a class="subcategory" href="https://domain.com/womens-clothing/">Clothing</a> <a class="subcategory" href="https://domain.com/womens-accessories/">Accessories</a> </div> <div class="categories"> <a class="category" href="https://domain.com/girls-boys-fashion/">Kids</a> <a class="subcategory" href="https://domain.com/boys-fashion/">Boys</a> <a class="subcategory" href="https://domain.com/girls/">Girls</a> </div> <div class="categories"> <a class="category" href="https://domain.com/maternity-clothes/">Maternity Clothes</a> </div> </div> <div class="column"> <div class="categories"> <span class="category defaultCursor">Men Best Sellers</span> <a class="subcategory" href="https://domain.com/mens-casual-shoes/">Casual Shoes</a> <a class="subcategory" href="https://domain.com/mens-sneakers/">Sneakers</a> <a class="subcategory" href="https://domain.com/mens-t-shirts/">T-shirts</a> <a class="subcategory" href="https://domain.com/mens-polos/">Polos</a> </div> <div class="categories"> <span class="category defaultCursor">Women Best Sellers</span> <a class="subcategory" href="https://domain.com/womens-sandals/">Sandals</a> <a class="subcategory" href="https://domain.com/womens-sneakers/">Sneakers</a> <a class="subcategory" href="https://domain.com/women-dresses/">Dresses</a> <a class="subcategory" href="https://domain.com/women-tops/">Tops</a> </div> <div class="categories"> <a class="category" href="https://domain.com/womens-curvy-clothing/">Women's Curvy Clothing</a> </div> <div class="categories"> <a class="category" href="https://domain.com/fashion-bundles/v/">Fashion Bundles</a> </div> <div class="categories"> <a class="category" href="https://domain.com/hijab-fashion/">Hijab Fashion</a> </div> </div> <div class="column"> <div class="categories"> <a class="category" href="https://domain.com/brands/fashion-by-/">SEE ALL BRANDS</a> <a class="subcategory" href="https://domain.com/adidas/">Adidas</a> <a class="subcategory" href="https://domain.com/converse/">Converse</a> <a class="subcategory" href="https://domain.com/ravin/">Ravin</a> <a class="subcategory" href="https://domain.com/dejavu/">Dejavu</a> <a class="subcategory" href="https://domain.com/agu/">Agu</a> <a class="subcategory" href="https://domain.com/activ/">Activ</a> <a class="subcategory" href="https://domain.com/oxford--bellini--tie-house--milano/">Tie House</a> <a class="subcategory" href="https://domain.com/shoe-room/">Shoe Room</a> <a class="subcategory" href="https://domain.com/town-team/">Town Team</a> </div> </div> </div> </div> </li> HTML 如何設定網頁抓取項目? 依照C# 網路爬蟲的最佳實務搭建一個專案。 建立一個新的控制台應用程序,或為範例新增一個名為"ShoppingSiteSample"的新資料夾。 新增一個名為"ShoppingScraper"的新類 首先抓取網站的分類及其子分類。 透過NuGet套件管理員或套件管理器控制台安裝 IronWebScraper: Install-Package IronWebScraper Install-Package IronWebScraper $vbLabelText $csharpLabel 類別資料應該使用哪種資料模型? 建立一個能夠正確表示所發現的層級結構的類別模型: public class Category { /// <summary> /// Gets or sets the name. /// </summary> /// <value> /// The name. /// </value> public string Name { get; set; } /// <summary> /// Gets or sets the URL. /// </summary> /// <value> /// The URL. /// </value> public string URL { get; set; } /// <summary> /// Gets or sets the subcategories. /// </summary> /// <value> /// The subcategories. /// </value> public List<Category> SubCategories { get; set; } // Additional properties for enhanced data collection public int ProductCount { get; set; } public DateTime LastScraped { get; set; } public string CategoryType { get; set; } } public class Category { /// <summary> /// Gets or sets the name. /// </summary> /// <value> /// The name. /// </value> public string Name { get; set; } /// <summary> /// Gets or sets the URL. /// </summary> /// <value> /// The URL. /// </value> public string URL { get; set; } /// <summary> /// Gets or sets the subcategories. /// </summary> /// <value> /// The subcategories. /// </value> public List<Category> SubCategories { get; set; } // Additional properties for enhanced data collection public int ProductCount { get; set; } public DateTime LastScraped { get; set; } public string CategoryType { get; set; } } $vbLabelText $csharpLabel 如何建構基本的爬蟲邏輯? 建立爬蟲邏輯,請記住在運行爬蟲之前應用您的許可證密鑰: public class ShoppingScraper : WebScraper { /// <summary> /// Initialize the web scraper, setting the start URLs and allowed/banned domains or URL patterns. /// </summary> public override void Init() { // Apply your license key - get one from https://ironsoftware.com/csharp/webscraper/licensing/ License.LicenseKey = "LicenseKey"; this.LoggingLevel = WebScraper.LogLevel.All; this.WorkingDirectory = AppSetting.GetAppRoot() + @"\ShoppingSiteSample\Output\"; // Configure request settings for better performance this.Request("www.webSite.com", Parse); } /// <summary> /// Parses the HTML document of the response to scrap the necessary data. /// </summary> /// <param name="response">The HTTP Response object to parse.</param> public override void Parse(Response response) { var categoryList = new List<Category>(); // Iterate through each link in the menu and extract the category data. foreach (var Links in response.Css("#menuFixed > ul > li > a")) { var cat = new Category { URL = Links.Attributes["href"], Name = Links.InnerText, LastScraped = DateTime.Now }; categoryList.Add(cat); } // Save the scraped data into a JSONL file. Scrape(categoryList, "Shopping.jsonl"); } } public class ShoppingScraper : WebScraper { /// <summary> /// Initialize the web scraper, setting the start URLs and allowed/banned domains or URL patterns. /// </summary> public override void Init() { // Apply your license key - get one from https://ironsoftware.com/csharp/webscraper/licensing/ License.LicenseKey = "LicenseKey"; this.LoggingLevel = WebScraper.LogLevel.All; this.WorkingDirectory = AppSetting.GetAppRoot() + @"\ShoppingSiteSample\Output\"; // Configure request settings for better performance this.Request("www.webSite.com", Parse); } /// <summary> /// Parses the HTML document of the response to scrap the necessary data. /// </summary> /// <param name="response">The HTTP Response object to parse.</param> public override void Parse(Response response) { var categoryList = new List<Category>(); // Iterate through each link in the menu and extract the category data. foreach (var Links in response.Css("#menuFixed > ul > li > a")) { var cat = new Category { URL = Links.Attributes["href"], Name = Links.InnerText, LastScraped = DateTime.Now }; categoryList.Add(cat); } // Save the scraped data into a JSONL file. Scrape(categoryList, "Shopping.jsonl"); } } $vbLabelText $csharpLabel 菜單中的哪些元素是我需要重點關注的? 從選單中抓取連結需要精確的 CSS 選擇器。 API 參考文件提供了有關可用選擇器方法的詳細資訊: 如何同時抓取主類別和子類別? 更新程式碼,使其能夠抓取主要類別及其所有子連結。 這種方法可以確保完整捕捉導航結構: public override void Parse(Response response) { // List of Category Links (Root) var categoryList = new List<Category>(); // Traverse each 'li' under the fixed menu foreach (var li in response.Css("#menuFixed > ul > li")) { // List of Main Links foreach (var Links in li.Css("a")) { var cat = new Category { URL = Links.Attributes["href"], Name = Links.InnerText, SubCategories = new List<Category>(), LastScraped = DateTime.Now }; // List of Subcategories Links foreach (var subCategory in li.Css("a[class=subcategory]")) { var subcat = new Category { URL = subCategory.Attributes["href"], Name = subCategory.InnerText, CategoryType = "Subcategory" }; // Check if subcategory link already exists if (cat.SubCategories.Find(c => c.Name == subcat.Name && c.URL == subcat.URL) == null) { // Add sublinks cat.SubCategories.Add(subcat); } } // Update product count based on subcategories cat.ProductCount = cat.SubCategories.Count; // Add Main Category to the list categoryList.Add(cat); } } // Save the scraped data into a JSONL file. Scrape(categoryList, "Shopping.jsonl"); } public override void Parse(Response response) { // List of Category Links (Root) var categoryList = new List<Category>(); // Traverse each 'li' under the fixed menu foreach (var li in response.Css("#menuFixed > ul > li")) { // List of Main Links foreach (var Links in li.Css("a")) { var cat = new Category { URL = Links.Attributes["href"], Name = Links.InnerText, SubCategories = new List<Category>(), LastScraped = DateTime.Now }; // List of Subcategories Links foreach (var subCategory in li.Css("a[class=subcategory]")) { var subcat = new Category { URL = subCategory.Attributes["href"], Name = subCategory.InnerText, CategoryType = "Subcategory" }; // Check if subcategory link already exists if (cat.SubCategories.Find(c => c.Name == subcat.Name && c.URL == subcat.URL) == null) { // Add sublinks cat.SubCategories.Add(subcat); } } // Update product count based on subcategories cat.ProductCount = cat.SubCategories.Count; // Add Main Category to the list categoryList.Add(cat); } } // Save the scraped data into a JSONL file. Scrape(categoryList, "Shopping.jsonl"); } $vbLabelText $csharpLabel 如何從類別頁面提取產品資訊? 利用所有網站類別的鏈接,開始抓取每個類別中的產品。 在處理產品頁面時,線程安全對於獲得最佳效能至關重要。 瀏覽任意類別並查看內容: 產品HTML結構是什麼樣的呢? 檢查HTML結構以了解產品組織結構: <section class="products"> <div class="sku -gallery -validate-size " data-sku="AG249FA0T2PSGNAFAMZ" ft-product-sizes="41,42,43,44,45" ft-product-color="Multicolour"> <a class="link" href="http://www.WebSite.com/agu-bundle-of-2-sneakers-black-navy-blue-653884.html"> <div class="image-wrapper default-state"> <img class="lazy image -loaded" alt="Bundle Of 2 Sneakers - Black & Navy Blue" data-image-vertical="1" width="210" height="262" src="https://static.WebSite.com/p/agu-6208-488356-1-catalog_grid_3.jpg" data-sku="AG249FA0T2PSGNAFAMZ" data-src="https://static.WebSite.com/p/agu-6208-488356-1-catalog_grid_3.jpg" data-placeholder="placeholder_m_1.jpg"> <noscript><img src="https://static.WebSite.com/p/agu-6208-488356-1-catalog_grid_3.jpg" width="210" height="262" class="image" /></noscript> </div> <h2 class="title"></h2> <span class="brand ">Agu </span> <span class="name" dir="ltr">Bundle Of 2 Sneakers - Black & Navy Blue</span> </h2> <div class="price-container clearfix"> <span class="price-box"> <span class="price"> <span data-currency-iso="EGP">EGP</span> <span dir="ltr" data-price="299">299</span> </span> <span class="price -old -no-special"></span> </span> </div> <div class="rating-stars"> <div class="stars-container"> <div class="stars" style="width: 62%"></div> </div> <div class="total-ratings">(30)</div> </div> <span class="shop-first-logo-container"> <img src="http://www.WebSite.com/images/local/logos/shop_first/ShoppingSite/logo_normal.png" data-src="http://www.WebSite.com/images/local/logos/shop_first/ShoppingSite/logo_normal.png" class="lazy shop-first-logo-img -mbxs -loaded"> </span> <span class="osh-icon -ShoppingSite-local shop_local--logo -block -mbs -mts"></span> <div class="list -sizes" data-selected-sku=""> <span class="js-link sku-size" data-href="http://www.WebSite.com/agu-bundle-of-2-sneakers-black-navy-blue-653884.html?size=41">41</span> <span class="js-link sku-size" data-href="http://www.WebSite.com/agu-bundle-of-2-sneakers-black-navy-blue-653884.html?size=42">42</span> <span class="js-link sku-size" data-href="http://www.WebSite.com/agu-bundle-of-2-sneakers-black-navy-blue-653884.html?size=43">43</span> <span class="js-link sku-size" data-href="http://www.WebSite.com/agu-bundle-of-2-sneakers-black-navy-blue-653884.html?size=44">44</span> <span class="js-link sku-size" data-href="http://www.WebSite.com/agu-bundle-of-2-sneakers-black-navy-blue-653884.html?size=45">45</span> </div> </a> </div> <div class="sku -gallery -validate-size " data-sku="LE047FA01SRK4NAFAMZ" ft-product-sizes="110,115,120,125,130,135" ft-product-color="Black"> <a class="link" href="http://www.WebSite.com/leather-shop-genuine-leather-belt-black-712030.html"> <div class="image-wrapper default-state"> <img class="lazy image -loaded" alt="Genuine Leather Belt - Black" data-image-vertical="1" width="210" height="262" src="https://static.WebSite.com/p/leather-shop-1831-030217-1-catalog_grid_3.jpg" data-sku="LE047FA01SRK4NAFAMZ" data-src="https://static.WebSite.com/p/leather-shop-1831-030217-1-catalog_grid_3.jpg" data-placeholder="placeholder_m_1.jpg"> <noscript><img src="https://static.WebSite.com/p/leather-shop-1831-030217-1-catalog_grid_3.jpg" width="210" height="262" class="image" /></noscript> </div> <h2 class="title"><span class="brand ">Leather Shop </span> <span class="name" dir="ltr">Genuine Leather Belt - Black</span></h2> <div class="price-container clearfix"> <span class="sale-flag-percent">-29%</span> <span class="price-box"> <span class="price"><span data-currency-iso="EGP">EGP</span> <span dir="ltr" data-price="96">96</span> </span> <span class="price -old"><span data-currency-iso="EGP">EGP</span> <span dir="ltr" data-price="135">135</span> </span> </span> </div> <div class="rating-stars"> <div class="stars-container"> <div class="stars" style="width: 100%"></div> </div> <div class="total-ratings">(1)</div> </div> <span class="osh-icon -ShoppingSite-local shop_local--logo -block -mbs -mts"></span> <div class="list -sizes" data-selected-sku=""> <span class="js-link sku-size" data-href="http://www.WebSite.com/leather-shop-genuine-leather-belt-black-712030.html?size=110">110</span> <span class="js-link sku-size" data-href="http://www.WebSite.com/leather-shop-genuine-leather-belt-black-712030.html?size=115">115</span> <span class="js-link sku-size" data-href="http://www.WebSite.com/leather-shop-genuine-leather-belt-black-712030.html?size=120">120</span> <span class="js-link sku-size" data-href="http://www.WebSite.com/leather-shop-genuine-leather-belt-black-712030.html?size=125">125</span> <span class="js-link sku-size" data-href="http://www.WebSite.com/leather-shop-genuine-leather-belt-black-712030.html?size=130">130</span> <span class="js-link sku-size" data-href="http://www.WebSite.com/leather-shop-genuine-leather-belt-black-712030.html?size=135">135</span> </div> </a> </div> </section> <section class="products"> <div class="sku -gallery -validate-size " data-sku="AG249FA0T2PSGNAFAMZ" ft-product-sizes="41,42,43,44,45" ft-product-color="Multicolour"> <a class="link" href="http://www.WebSite.com/agu-bundle-of-2-sneakers-black-navy-blue-653884.html"> <div class="image-wrapper default-state"> <img class="lazy image -loaded" alt="Bundle Of 2 Sneakers - Black & Navy Blue" data-image-vertical="1" width="210" height="262" src="https://static.WebSite.com/p/agu-6208-488356-1-catalog_grid_3.jpg" data-sku="AG249FA0T2PSGNAFAMZ" data-src="https://static.WebSite.com/p/agu-6208-488356-1-catalog_grid_3.jpg" data-placeholder="placeholder_m_1.jpg"> <noscript><img src="https://static.WebSite.com/p/agu-6208-488356-1-catalog_grid_3.jpg" width="210" height="262" class="image" /></noscript> </div> <h2 class="title"></h2> <span class="brand ">Agu </span> <span class="name" dir="ltr">Bundle Of 2 Sneakers - Black & Navy Blue</span> </h2> <div class="price-container clearfix"> <span class="price-box"> <span class="price"> <span data-currency-iso="EGP">EGP</span> <span dir="ltr" data-price="299">299</span> </span> <span class="price -old -no-special"></span> </span> </div> <div class="rating-stars"> <div class="stars-container"> <div class="stars" style="width: 62%"></div> </div> <div class="total-ratings">(30)</div> </div> <span class="shop-first-logo-container"> <img src="http://www.WebSite.com/images/local/logos/shop_first/ShoppingSite/logo_normal.png" data-src="http://www.WebSite.com/images/local/logos/shop_first/ShoppingSite/logo_normal.png" class="lazy shop-first-logo-img -mbxs -loaded"> </span> <span class="osh-icon -ShoppingSite-local shop_local--logo -block -mbs -mts"></span> <div class="list -sizes" data-selected-sku=""> <span class="js-link sku-size" data-href="http://www.WebSite.com/agu-bundle-of-2-sneakers-black-navy-blue-653884.html?size=41">41</span> <span class="js-link sku-size" data-href="http://www.WebSite.com/agu-bundle-of-2-sneakers-black-navy-blue-653884.html?size=42">42</span> <span class="js-link sku-size" data-href="http://www.WebSite.com/agu-bundle-of-2-sneakers-black-navy-blue-653884.html?size=43">43</span> <span class="js-link sku-size" data-href="http://www.WebSite.com/agu-bundle-of-2-sneakers-black-navy-blue-653884.html?size=44">44</span> <span class="js-link sku-size" data-href="http://www.WebSite.com/agu-bundle-of-2-sneakers-black-navy-blue-653884.html?size=45">45</span> </div> </a> </div> <div class="sku -gallery -validate-size " data-sku="LE047FA01SRK4NAFAMZ" ft-product-sizes="110,115,120,125,130,135" ft-product-color="Black"> <a class="link" href="http://www.WebSite.com/leather-shop-genuine-leather-belt-black-712030.html"> <div class="image-wrapper default-state"> <img class="lazy image -loaded" alt="Genuine Leather Belt - Black" data-image-vertical="1" width="210" height="262" src="https://static.WebSite.com/p/leather-shop-1831-030217-1-catalog_grid_3.jpg" data-sku="LE047FA01SRK4NAFAMZ" data-src="https://static.WebSite.com/p/leather-shop-1831-030217-1-catalog_grid_3.jpg" data-placeholder="placeholder_m_1.jpg"> <noscript><img src="https://static.WebSite.com/p/leather-shop-1831-030217-1-catalog_grid_3.jpg" width="210" height="262" class="image" /></noscript> </div> <h2 class="title"><span class="brand ">Leather Shop </span> <span class="name" dir="ltr">Genuine Leather Belt - Black</span></h2> <div class="price-container clearfix"> <span class="sale-flag-percent">-29%</span> <span class="price-box"> <span class="price"><span data-currency-iso="EGP">EGP</span> <span dir="ltr" data-price="96">96</span> </span> <span class="price -old"><span data-currency-iso="EGP">EGP</span> <span dir="ltr" data-price="135">135</span> </span> </span> </div> <div class="rating-stars"> <div class="stars-container"> <div class="stars" style="width: 100%"></div> </div> <div class="total-ratings">(1)</div> </div> <span class="osh-icon -ShoppingSite-local shop_local--logo -block -mbs -mts"></span> <div class="list -sizes" data-selected-sku=""> <span class="js-link sku-size" data-href="http://www.WebSite.com/leather-shop-genuine-leather-belt-black-712030.html?size=110">110</span> <span class="js-link sku-size" data-href="http://www.WebSite.com/leather-shop-genuine-leather-belt-black-712030.html?size=115">115</span> <span class="js-link sku-size" data-href="http://www.WebSite.com/leather-shop-genuine-leather-belt-black-712030.html?size=120">120</span> <span class="js-link sku-size" data-href="http://www.WebSite.com/leather-shop-genuine-leather-belt-black-712030.html?size=125">125</span> <span class="js-link sku-size" data-href="http://www.WebSite.com/leather-shop-genuine-leather-belt-black-712030.html?size=130">130</span> <span class="js-link sku-size" data-href="http://www.WebSite.com/leather-shop-genuine-leather-belt-black-712030.html?size=135">135</span> </div> </a> </div> </section> HTML 我應該創建哪種產品模型? 為該內容建構產品模型。 使用購物網站資料抓取工具時,請抓取所有相關的產品詳細資訊: public class Product { /// <summary> /// Gets or sets the name. /// </summary> /// <value> /// The name. /// </value> public string Name { get; set; } /// <summary> /// Gets or sets the price. /// </summary> /// <value> /// The price. /// </value> public string Price { get; set; } /// <summary> /// Gets or sets the image. /// </summary> /// <value> /// The image. /// </value> public string Image { get; set; } // Additional properties for comprehensive data collection public string Brand { get; set; } public string OldPrice { get; set; } public string Discount { get; set; } public float Rating { get; set; } public int ReviewCount { get; set; } public List<string> AvailableSizes { get; set; } public string ProductUrl { get; set; } public string SKU { get; set; } public DateTime ScrapedDate { get; set; } } public class Product { /// <summary> /// Gets or sets the name. /// </summary> /// <value> /// The name. /// </value> public string Name { get; set; } /// <summary> /// Gets or sets the price. /// </summary> /// <value> /// The price. /// </value> public string Price { get; set; } /// <summary> /// Gets or sets the image. /// </summary> /// <value> /// The image. /// </value> public string Image { get; set; } // Additional properties for comprehensive data collection public string Brand { get; set; } public string OldPrice { get; set; } public string Discount { get; set; } public float Rating { get; set; } public int ReviewCount { get; set; } public List<string> AvailableSizes { get; set; } public string ProductUrl { get; set; } public string SKU { get; set; } public DateTime ScrapedDate { get; set; } } $vbLabelText $csharpLabel 如何新增產品資料抓取功能? 若要抓取類別頁面,請新增新的抓取方法,並新增錯誤處理和資料驗證功能: public void ParseCategory(Response response) { // List of Products var productList = new List<Product>(); // Iterate through product links in the product section foreach (var Links in response.Css("section.products > div > a")) { try { var product = new Product { Name = Links.Css("h2.title > span.name").First().InnerText, Brand = Links.Css("h2.title > span.brand").FirstOrDefault()?.InnerText ?? "Unknown", Price = Links.Css("div.price-container > span.price-box > span.price > span[data-price]").First().InnerText, Image = Links.Css("div.image-wrapper.default-state > img").First().Attributes["src"], ProductUrl = Links.Attributes["href"], SKU = Links.ParentNode.Attributes["data-sku"], ScrapedDate = DateTime.Now }; // Extract old price if available var oldPriceElement = Links.Css("span.price.-old > span[data-price]").FirstOrDefault(); if (oldPriceElement != null) { product.OldPrice = oldPriceElement.InnerText; } // Extract discount percentage var discountElement = Links.Css("span.sale-flag-percent").FirstOrDefault(); if (discountElement != null) { product.Discount = discountElement.InnerText; } // Extract rating information var ratingWidth = Links.Css("div.stars").FirstOrDefault()?.Attributes["style"]; if (!string.IsNullOrEmpty(ratingWidth)) { var width = System.Text.RegularExpressions.Regex.Match(ratingWidth, @"(\d+)%").Groups[1].Value; if (int.TryParse(width, out int ratingPercent)) { product.Rating = ratingPercent / 20.0f; // Convert percentage to 5-star scale } } // Extract review count var reviewText = Links.Css("div.total-ratings").FirstOrDefault()?.InnerText; if (!string.IsNullOrEmpty(reviewText)) { var reviewCount = System.Text.RegularExpressions.Regex.Match(reviewText, @"\d+").Value; if (int.TryParse(reviewCount, out int count)) { product.ReviewCount = count; } } // Extract available sizes product.AvailableSizes = Links.Css("div.list.-sizes > span.sku-size") .Select(s => s.InnerText) .ToList(); productList.Add(product); } catch (Exception ex) { // Log error and continue with next product Console.WriteLine($"Error parsing product: {ex.Message}"); } } // Save the scraped product data into a JSONL file. Scrape(productList, "Products.jsonl"); // Handle pagination if needed var nextPageLink = response.Css("a.pagination-next").FirstOrDefault(); if (nextPageLink != null) { var nextPageUrl = nextPageLink.Attributes["href"]; this.Request(nextPageUrl, ParseCategory); } } public void ParseCategory(Response response) { // List of Products var productList = new List<Product>(); // Iterate through product links in the product section foreach (var Links in response.Css("section.products > div > a")) { try { var product = new Product { Name = Links.Css("h2.title > span.name").First().InnerText, Brand = Links.Css("h2.title > span.brand").FirstOrDefault()?.InnerText ?? "Unknown", Price = Links.Css("div.price-container > span.price-box > span.price > span[data-price]").First().InnerText, Image = Links.Css("div.image-wrapper.default-state > img").First().Attributes["src"], ProductUrl = Links.Attributes["href"], SKU = Links.ParentNode.Attributes["data-sku"], ScrapedDate = DateTime.Now }; // Extract old price if available var oldPriceElement = Links.Css("span.price.-old > span[data-price]").FirstOrDefault(); if (oldPriceElement != null) { product.OldPrice = oldPriceElement.InnerText; } // Extract discount percentage var discountElement = Links.Css("span.sale-flag-percent").FirstOrDefault(); if (discountElement != null) { product.Discount = discountElement.InnerText; } // Extract rating information var ratingWidth = Links.Css("div.stars").FirstOrDefault()?.Attributes["style"]; if (!string.IsNullOrEmpty(ratingWidth)) { var width = System.Text.RegularExpressions.Regex.Match(ratingWidth, @"(\d+)%").Groups[1].Value; if (int.TryParse(width, out int ratingPercent)) { product.Rating = ratingPercent / 20.0f; // Convert percentage to 5-star scale } } // Extract review count var reviewText = Links.Css("div.total-ratings").FirstOrDefault()?.InnerText; if (!string.IsNullOrEmpty(reviewText)) { var reviewCount = System.Text.RegularExpressions.Regex.Match(reviewText, @"\d+").Value; if (int.TryParse(reviewCount, out int count)) { product.ReviewCount = count; } } // Extract available sizes product.AvailableSizes = Links.Css("div.list.-sizes > span.sku-size") .Select(s => s.InnerText) .ToList(); productList.Add(product); } catch (Exception ex) { // Log error and continue with next product Console.WriteLine($"Error parsing product: {ex.Message}"); } } // Save the scraped product data into a JSONL file. Scrape(productList, "Products.jsonl"); // Handle pagination if needed var nextPageLink = response.Css("a.pagination-next").FirstOrDefault(); if (nextPageLink != null) { var nextPageUrl = nextPageLink.Attributes["href"]; this.Request(nextPageUrl, ParseCategory); } } $vbLabelText $csharpLabel 這種全面的購物網站抓取方法確保能夠捕捉所有相關的產品訊息,同時還能優雅地處理錯誤。 對於更高級的場景,請探索 IronWebScraper 中提供的高級網路爬蟲功能。 常見問題解答 如何使用 C# 從購物網站擷取產品資料? IronWebScraper 可利用 CSS 選擇器,輕鬆地從購物網站中擷取產品資料。您可以建立一個 WebScraper 類別,覆寫 Parse 方法,並使用 response.Css() 來針對特定的 HTML 元素,例如產品名稱、價格和圖片。擷取的資料可以儲存為各種格式,包括 JSON 和 JSONL 檔案。 建立購物網站 scraper 的基本步驟是什麼? 使用 IronWebScraper 創建購物網站刮刀:1) 建立 Console App 專案;2) 新增繼承自 WebScraper 的類別;3) 建立類別和產品的資料模型;4) 重覆 Init() 方法以設定您的起始 URL;5) 重覆 Parse() 方法以使用 CSS 選擇器擷取資料;以及 6) 執行 scraper 以將資料儲存為您偏好的格式。 在搜刮電子商務網站時,我該如何處理層級分類結構? IronWebScraper 可讓您透過建立適當的資料模型,反映父子關係 (如時裝 > 男裝 > 鞋子),以處理層級結構。您可以使用 CSS 選擇器瀏覽嵌套的 HTML 元素,並以程式化方式建立分類樹狀結構,這在使用 IronWebScraper 的進階功能時特別有用。 在刮除之前,分析購物網站 HTML 結構的最佳方法是什麼? 在使用 IronWebScraper 搜刮購物網站之前,請使用瀏覽器開發工具檢查 HTML 結構。在 CSS 類別和元素階層中尋找一致的模式。此分析可幫助您找出要在 IronWebScraper Parse() 方法中使用的正確 CSS 選擇器,以便準確定位產品資訊、類別和其他資料元素。 我可以從同一個頁面中抽取產品清單和類別導覽嗎? 是的,IronWebScraper 使您能夠從單一頁面中提取多種類型的資料。在您的 Parse() 方法中,您可以使用不同的 CSS 選擇器同時針對類別連結 (如 '.category-item「)和產品清單 (如 」.product-item'),然後將它們儲存到不同的輸出檔案或資料結構中。 如何將擷取的產品資料儲存至檔案? IronWebScraper 提供內建的 Scrape() 方法,可自動儲存擷取的資料。只需傳送您的資料物件和檔案名稱至 Scrape(item, "products.jsonl")。該函式庫支援多種輸出格式,包括 JSON、JSONL 和 CSV,讓您可以輕鬆匯出刮除的電子商務資料作進一步處理。 Darrius Serrant 立即與工程團隊聊天 全棧軟件工程師 (WebOps) Darrius Serrant 擁有邁阿密大學計算機科學學士學位,目前任職於 Iron Software 的全栈 WebOps 市場營銷工程師。從小就迷上編碼,他認為計算既神秘又可接近,是創意和解決問題的完美媒介。在 Iron Software,Darrius 喜歡創造新事物,並簡化複雜概念以便於理解。作為我們的駐場開發者之一,他也自願教學生,分享他的專業知識給下一代。對 Darrius 來說,工作令人滿意因為它被重視且有實際影響。 準備好開始了嗎? Nuget 下載 131,175 | 版本: 2026.3 剛剛發布 開始免費試用 免費 NuGet 下載 總下載量:131,175 查看許可證 還在捲動嗎? 想要快速證明? PM > Install-Package IronWebScraper 執行範例 觀看您的目標網站成為結構化資料。 免費 NuGet 下載 總下載量:131,175 查看許可證