Scrapen von Inhalten einer Shopping-Website
Wir wählen eine Einkaufsseite aus, um den Inhalt von ihr zu scrapen
Wie Sie auf dem Bild sehen können, haben wir eine linke Leiste, die Links zu den Produktkategorien der Website enthält
Unser erster Schritt besteht also darin, den HTML-Code der Website zu untersuchen und zu planen, wie wir sie scrapen wollen.
Die Kategorien der Modeseiten haben Unterkategorien(Männer, Frauen, Kinder)
<li class="menu-item" data-id="">
<a href="" class="main-category">
<i class="cat-icon osh-font-fashion"></i> <span class="nav-subTxt">FASHION </span> <i class="osh-font-light-arrow-left"></i><i class="osh-font-light-arrow-right"></i>
</a> <div class="navLayerWrapper" style="width: 633px; display: none;"><div class="submenu"><div class="column"><div class="categories"><a class="category" href="">New Arrivals !</a> </div><div class="categories"><a class="category" href="">Men</a> <a class="subcategory" href="">Shoes</a> <a class="subcategory" href="">Clothing</a> <a class="subcategory" href="">Accessories</a> </div><div class="categories"><a class="category" href="">Women</a> <a class="subcategory" href="">Shoes</a> <a class="subcategory" href="">Clothing</a> <a class="subcategory" href="">Accessories</a> </div><div class="categories"><a class="category" href="">Kids</a> <a class="subcategory" href="">Boys</a> <a class="subcategory" href="">Girls</a> </div><div class="categories"><a class="category" href="">Maternity Clothes</a> </div></div><div class="column"><div class="categories"> <span class="category defaultCursor">Men Best Sellers</span> <a class="subcategory" href="">Casual Shoes</a> <a class="subcategory" href="">Sneakers</a> <a class="subcategory" href="">T-shirts</a> <a class="subcategory" href="">Polos</a> </div><div class="categories"> <span class="category defaultCursor">Women Best Sellers</span> <a class="subcategory" href="">Sandals</a> <a class="subcategory" href="">Sneakers</a> <a class="subcategory" href="">Dresses</a> <a class="subcategory" href="">Tops</a> </div><div class="categories"><a class="category" href="">Women's Curvy Clothing</a> </div><div class="categories"><a class="category" href="">Fashion Bundles</a> </div><div class="categories"><a class="category" href="">Hijab Fashion</a> </div></div><div class="column"><div class="categories"><a class="category" href="">SEE ALL BRANDS</a> <a class="subcategory" href="">Adidas</a> <a class="subcategory" href="">Converse</a> <a class="subcategory" href="">Ravin</a> <a class="subcategory" href="">Dejavu</a> <a class="subcategory" href="">Agu</a> <a class="subcategory" href="">Activ</a> <a class="subcategory" href="">Tie House</a> <a class="subcategory" href="">Shoe Room</a> <a class="subcategory" href="">Town Team</a> </div></div></div></div>
<li class="menu-item" data-id="">
<a href="" class="main-category">
<i class="cat-icon osh-font-fashion"></i> <span class="nav-subTxt">FASHION </span> <i class="osh-font-light-arrow-left"></i><i class="osh-font-light-arrow-right"></i>
</a> <div class="navLayerWrapper" style="width: 633px; display: none;"><div class="submenu"><div class="column"><div class="categories"><a class="category" href="">New Arrivals !</a> </div><div class="categories"><a class="category" href="">Men</a> <a class="subcategory" href="">Shoes</a> <a class="subcategory" href="">Clothing</a> <a class="subcategory" href="">Accessories</a> </div><div class="categories"><a class="category" href="">Women</a> <a class="subcategory" href="">Shoes</a> <a class="subcategory" href="">Clothing</a> <a class="subcategory" href="">Accessories</a> </div><div class="categories"><a class="category" href="">Kids</a> <a class="subcategory" href="">Boys</a> <a class="subcategory" href="">Girls</a> </div><div class="categories"><a class="category" href="">Maternity Clothes</a> </div></div><div class="column"><div class="categories"> <span class="category defaultCursor">Men Best Sellers</span> <a class="subcategory" href="">Casual Shoes</a> <a class="subcategory" href="">Sneakers</a> <a class="subcategory" href="">T-shirts</a> <a class="subcategory" href="">Polos</a> </div><div class="categories"> <span class="category defaultCursor">Women Best Sellers</span> <a class="subcategory" href="">Sandals</a> <a class="subcategory" href="">Sneakers</a> <a class="subcategory" href="">Dresses</a> <a class="subcategory" href="">Tops</a> </div><div class="categories"><a class="category" href="">Women's Curvy Clothing</a> </div><div class="categories"><a class="category" href="">Fashion Bundles</a> </div><div class="categories"><a class="category" href="">Hijab Fashion</a> </div></div><div class="column"><div class="categories"><a class="category" href="">SEE ALL BRANDS</a> <a class="subcategory" href="">Adidas</a> <a class="subcategory" href="">Converse</a> <a class="subcategory" href="">Ravin</a> <a class="subcategory" href="">Dejavu</a> <a class="subcategory" href="">Agu</a> <a class="subcategory" href="">Activ</a> <a class="subcategory" href="">Tie House</a> <a class="subcategory" href="">Shoe Room</a> <a class="subcategory" href="">Town Team</a> </div></div></div></div>
Richten wir ein Projekt ein
Erstellen Sie eine neue Konsolenanwendung oder fügen Sie einen neuen Ordner für unser neues Beispiel mit dem Namen "ShoppingSiteSample" hinzu
Neue Klasse mit dem Namen "ShoppingScraper" hinzufügen
Der erste Schritt besteht darin, die Website-Kategorien und ihre Unterkategorien zu durchsuchen
Lassen Sie uns ein Kategorienmodell erstellen:
public class Category
/// <summary>
/// Gets or sets the name.
/// </summary>
/// <value>
/// The name.
/// </value>
public string Name { get; set; }
/// <summary>
/// Gets or sets the URL.
/// </summary>
/// <value>
/// The URL.
/// </value>
public string URL { get; set; }
/// <summary>
/// Gets or sets the sub categories.
/// </summary>
/// <value>
/// The sub categories.
/// </value>
public List<Category> SubCategories { get; set; }
public class Category
/// <summary>
/// Gets or sets the name.
/// </summary>
/// <value>
/// The name.
/// </value>
public string Name { get; set; }
/// <summary>
/// Gets or sets the URL.
/// </summary>
/// <value>
/// The URL.
/// </value>
public string URL { get; set; }
/// <summary>
/// Gets or sets the sub categories.
/// </summary>
/// <value>
/// The sub categories.
/// </value>
public List<Category> SubCategories { get; set; }
Public Class Category
''' <summary>
''' Gets or sets the name.
''' </summary>
''' <value>
''' The name.
''' </value>
Public Property Name() As String
''' <summary>
''' Gets or sets the URL.
''' </summary>
''' <value>
''' The URL.
''' </value>
Public Property URL() As String
''' <summary>
''' Gets or sets the sub categories.
''' </summary>
''' <value>
''' The sub categories.
''' </value>
Public Property SubCategories() As List(Of Category)
End Class
- Bauen wir nun unsere Scrape-Logik auf
public class ShoppingScraper : WebScraper
/// <summary>
/// Override this method initialize your web-scraper.
/// Important tasks will be to Request at least one start url... and set allowed/banned domain or url patterns.
/// </summary>
public override void Init()
License.LicenseKey = "LicenseKey";
this.LoggingLevel = WebScraper.LogLevel.All;
this.WorkingDirectory = AppSetting.GetAppRoot() + @"\ShoppingSiteSample\Output\";
this.Request("", Parse);
/// <summary>
/// Override this method to create the default Response handler for your web scraper.
/// If you have multiple page types, you can add additional similar methods.
/// </summary>
/// <param name="response">The http Response object to parse</param>
public override void Parse(Response response)
var categoryList = new List<Category>();
foreach (var Links in response.Css("#menuFixed > ul > li > a "))
var cat = new Category();
cat.URL = Links.Attributes ["href"];
cat.Name = Links.InnerText;
Scrape(categoryList, "Shopping.Jsonl");
public class ShoppingScraper : WebScraper
/// <summary>
/// Override this method initialize your web-scraper.
/// Important tasks will be to Request at least one start url... and set allowed/banned domain or url patterns.
/// </summary>
public override void Init()
License.LicenseKey = "LicenseKey";
this.LoggingLevel = WebScraper.LogLevel.All;
this.WorkingDirectory = AppSetting.GetAppRoot() + @"\ShoppingSiteSample\Output\";
this.Request("", Parse);
/// <summary>
/// Override this method to create the default Response handler for your web scraper.
/// If you have multiple page types, you can add additional similar methods.
/// </summary>
/// <param name="response">The http Response object to parse</param>
public override void Parse(Response response)
var categoryList = new List<Category>();
foreach (var Links in response.Css("#menuFixed > ul > li > a "))
var cat = new Category();
cat.URL = Links.Attributes ["href"];
cat.Name = Links.InnerText;
Scrape(categoryList, "Shopping.Jsonl");
Scraping von Links aus dem Menü
Aktualisieren wir unseren Code, um die Hauptkategorien und alle ihre Unterlinks zu scrapen
public override void Parse(Response response)
// List of Categories Links (Root)
var categoryList = new List<Category>();
foreach (var li in response.Css("#menuFixed > ul > li"))
// List Of Main Links
foreach (var Links in li.Css("a"))
var cat = new Category();
cat.URL = Links.Attributes ["href"];
cat.Name = Links.InnerText;
cat.SubCategories = new List<Category>();
// List of Sub Catgories Links
foreach (var subCategory in li.Css("a [class=subcategory]"))
var subcat = new Category();
subcat.URL = Links.Attributes ["href"];
subcat.Name = Links.InnerText;
// Check If Link Exist Before
if (cat.SubCategories.Find(c=>c.Name== subcat.Name && c.URL == subcat.URL) == null)
// Add Sublinks
// Add Categories
Scrape(categoryList, "Shopping.Jsonl");
public override void Parse(Response response)
// List of Categories Links (Root)
var categoryList = new List<Category>();
foreach (var li in response.Css("#menuFixed > ul > li"))
// List Of Main Links
foreach (var Links in li.Css("a"))
var cat = new Category();
cat.URL = Links.Attributes ["href"];
cat.Name = Links.InnerText;
cat.SubCategories = new List<Category>();
// List of Sub Catgories Links
foreach (var subCategory in li.Css("a [class=subcategory]"))
var subcat = new Category();
subcat.URL = Links.Attributes ["href"];
subcat.Name = Links.InnerText;
// Check If Link Exist Before
if (cat.SubCategories.Find(c=>c.Name== subcat.Name && c.URL == subcat.URL) == null)
// Add Sublinks
// Add Categories
Scrape(categoryList, "Shopping.Jsonl");
Public Overrides Sub Parse(ByVal response As Response)
' List of Categories Links (Root)
Dim categoryList = New List(Of Category)()
For Each li In response.Css("#menuFixed > ul > li")
' List Of Main Links
For Each Links In li.Css("a")
Dim cat = New Category()
cat.URL = Links.Attributes ("href")
cat.Name = Links.InnerText
cat.SubCategories = New List(Of Category)()
' List of Sub Catgories Links
For Each subCategory In li.Css("a [class=subcategory]")
Dim subcat = New Category()
subcat.URL = Links.Attributes ("href")
subcat.Name = Links.InnerText
' Check If Link Exist Before
If cat.SubCategories.Find(Function(c) c.Name= subcat.Name AndAlso c.URL = subcat.URL) Is Nothing Then
' Add Sublinks
End If
Next subCategory
' Add Categories
Next Links
Next li
Scrape(categoryList, "Shopping.Jsonl")
End Sub
Da wir nun Links zu allen Website-Kategorien haben, können wir damit beginnen, die Produkte innerhalb jeder Kategorie zu durchsuchen
Navigieren wir zu einer beliebigen Kategorie und prüfen wir den Inhalt.
Schauen wir uns den Code an
<section class="products">
<div class="sku -gallery -validate-size " data-sku="AG249FA0T2PSGNAFAMZ" ft-product-sizes="41,42,43,44,45" ft-product-color="Multicolour">
<a class="link" href="">
<div class="image-wrapper default-state">
<img class="lazy image -loaded" alt="Bundle Of 2 Sneakers - Black &amp; Navy Blue" data-image-vertical="1" width="210" height="262" src="" data-sku="AG249FA0T2PSGNAFAMZ" data-src="" data-placeholder="placeholder_m_1.jpg"><noscript><img src="" width="210" height="262" class="image" /></noscript>
</div> <h2 class="title">
<span class="brand ">Agu </span>
<span class="name" dir="ltr">Bundle Of 2 Sneakers - Black & Navy Blue</span>
</h2><div class="price-container clearfix">
<span class="price-box">
<span class="price">
<span data-currency-iso="EGP">EGP</span>
<span dir="ltr" data-price="299">299</span>
</span> <span class="price -old -no-special"></span>
</div><div class="rating-stars"><div class="stars-container"><div class="stars" style="width: 62%"></div></div> <div class="total-ratings">(30)</div> </div> <span class="shop-first-logo-container"><img src="" data-src="" class="lazy shop-first-logo-img -mbxs -loaded"> </span>
<span class="osh-icon -ShoppingSite-local shop_local--logo -block -mbs -mts"></span>
<div class="list -sizes" data-selected-sku="">
<span class="js-link sku-size" data-href="">41</span> <span class="js-link sku-size" data-href="">42</span>
<span class="js-link sku-size" data-href="">43</span> <span class="js-link sku-size" data-href="">44</span>
<span class="js-link sku-size" data-href="">45</span>
<div class="sku -gallery -validate-size " data-sku="LE047FA01SRK4NAFAMZ" ft-product-sizes="110,115,120,125,130,135" ft-product-color="Black">
<a class="link" href="">
<div class="image-wrapper default-state"><img class="lazy image -loaded" alt="Genuine Leather Belt - Black" data-image-vertical="1" width="210" height="262" src="" data-sku="LE047FA01SRK4NAFAMZ" data-src="" data-placeholder="placeholder_m_1.jpg"><noscript><img src="" width="210" height="262" class="image" /></noscript></div>
<h2 class="title"><span class="brand ">Leather Shop </span> <span class="name" dir="ltr">Genuine Leather Belt - Black</span></h2><div class="price-container clearfix">
<span class="sale-flag-percent">-29%</span> <span class="price-box"> <span class="price"><span data-currency-iso="EGP">EGP</span> <span dir="ltr" data-price="96">96</span> </span> <span class="price -old "><span data-currency-iso="EGP">EGP</span> <span dir="ltr" data-price="135">135</span> </span> </span>
</div><div class="rating-stars"><div class="stars-container"><div class="stars" style="width: 100%"></div></div> <div class="total-ratings">(1)</div> </div>
<span class="osh-icon -ShoppingSite-local shop_local--logo -block -mbs -mts"></span> <div class="list -sizes" data-selected-sku="">
<span class="js-link sku-size" data-href="">110</span> <span class="js-link sku-size" data-href="">115</span>
<span class="js-link sku-size" data-href="">120</span> <span class="js-link sku-size" data-href="">125</span> <span class="js-link sku-size" data-href="">130</span>
<span class="js-link sku-size" data-href="">135</span>
<section class="products">
<div class="sku -gallery -validate-size " data-sku="AG249FA0T2PSGNAFAMZ" ft-product-sizes="41,42,43,44,45" ft-product-color="Multicolour">
<a class="link" href="">
<div class="image-wrapper default-state">
<img class="lazy image -loaded" alt="Bundle Of 2 Sneakers - Black &amp; Navy Blue" data-image-vertical="1" width="210" height="262" src="" data-sku="AG249FA0T2PSGNAFAMZ" data-src="" data-placeholder="placeholder_m_1.jpg"><noscript><img src="" width="210" height="262" class="image" /></noscript>
</div> <h2 class="title">
<span class="brand ">Agu </span>
<span class="name" dir="ltr">Bundle Of 2 Sneakers - Black & Navy Blue</span>
</h2><div class="price-container clearfix">
<span class="price-box">
<span class="price">
<span data-currency-iso="EGP">EGP</span>
<span dir="ltr" data-price="299">299</span>
</span> <span class="price -old -no-special"></span>
</div><div class="rating-stars"><div class="stars-container"><div class="stars" style="width: 62%"></div></div> <div class="total-ratings">(30)</div> </div> <span class="shop-first-logo-container"><img src="" data-src="" class="lazy shop-first-logo-img -mbxs -loaded"> </span>
<span class="osh-icon -ShoppingSite-local shop_local--logo -block -mbs -mts"></span>
<div class="list -sizes" data-selected-sku="">
<span class="js-link sku-size" data-href="">41</span> <span class="js-link sku-size" data-href="">42</span>
<span class="js-link sku-size" data-href="">43</span> <span class="js-link sku-size" data-href="">44</span>
<span class="js-link sku-size" data-href="">45</span>
<div class="sku -gallery -validate-size " data-sku="LE047FA01SRK4NAFAMZ" ft-product-sizes="110,115,120,125,130,135" ft-product-color="Black">
<a class="link" href="">
<div class="image-wrapper default-state"><img class="lazy image -loaded" alt="Genuine Leather Belt - Black" data-image-vertical="1" width="210" height="262" src="" data-sku="LE047FA01SRK4NAFAMZ" data-src="" data-placeholder="placeholder_m_1.jpg"><noscript><img src="" width="210" height="262" class="image" /></noscript></div>
<h2 class="title"><span class="brand ">Leather Shop </span> <span class="name" dir="ltr">Genuine Leather Belt - Black</span></h2><div class="price-container clearfix">
<span class="sale-flag-percent">-29%</span> <span class="price-box"> <span class="price"><span data-currency-iso="EGP">EGP</span> <span dir="ltr" data-price="96">96</span> </span> <span class="price -old "><span data-currency-iso="EGP">EGP</span> <span dir="ltr" data-price="135">135</span> </span> </span>
</div><div class="rating-stars"><div class="stars-container"><div class="stars" style="width: 100%"></div></div> <div class="total-ratings">(1)</div> </div>
<span class="osh-icon -ShoppingSite-local shop_local--logo -block -mbs -mts"></span> <div class="list -sizes" data-selected-sku="">
<span class="js-link sku-size" data-href="">110</span> <span class="js-link sku-size" data-href="">115</span>
<span class="js-link sku-size" data-href="">120</span> <span class="js-link sku-size" data-href="">125</span> <span class="js-link sku-size" data-href="">130</span>
<span class="js-link sku-size" data-href="">135</span>
Lassen Sie uns unser Produktmodell für diesen Inhalt erstellen.
public class Product
/// <summary>
/// Gets or sets the name.
/// </summary>
/// <value>
/// The name.
/// </value>
public string Name { get; set; }
/// <summary>
/// Gets or sets the price.
/// </summary>
/// <value>
/// The price.
/// </value>
public string Price { get; set; }
/// <summary>
/// Gets or sets the image.
/// </summary>
/// <value>
/// The image.
/// </value>
public string Image { get; set; }
public class Product
/// <summary>
/// Gets or sets the name.
/// </summary>
/// <value>
/// The name.
/// </value>
public string Name { get; set; }
/// <summary>
/// Gets or sets the price.
/// </summary>
/// <value>
/// The price.
/// </value>
public string Price { get; set; }
/// <summary>
/// Gets or sets the image.
/// </summary>
/// <value>
/// The image.
/// </value>
public string Image { get; set; }
Public Class Product
''' <summary>
''' Gets or sets the name.
''' </summary>
''' <value>
''' The name.
''' </value>
Public Property Name() As String
''' <summary>
''' Gets or sets the price.
''' </summary>
''' <value>
''' The price.
''' </value>
Public Property Price() As String
''' <summary>
''' Gets or sets the image.
''' </summary>
''' <value>
''' The image.
''' </value>
Public Property Image() As String
End Class
Um Kategorieseiten zu scrapen, fügen wir eine neue Scrape-Methode hinzu:
public void ParseCatgory(Response response)
// List of Products Links (Root)
var productList = new List<Product>();
foreach (var Links in response.Css("body > main > section.osh-content > section.products > div > a"))
var product = new Product();
product.Name = Links.InnerText;
product.Image = Links.Css("div.image-wrapper.default-state > img")[0].Attributes ["src"];
Scrape(productList, "Products.Jsonl");
public void ParseCatgory(Response response)
// List of Products Links (Root)
var productList = new List<Product>();
foreach (var Links in response.Css("body > main > section.osh-content > section.products > div > a"))
var product = new Product();
product.Name = Links.InnerText;
product.Image = Links.Css("div.image-wrapper.default-state > img")[0].Attributes ["src"];
Scrape(productList, "Products.Jsonl");
Public Sub ParseCatgory(ByVal response As Response)
' List of Products Links (Root)
Dim productList = New List(Of Product)()
For Each Links In response.Css("body > main > section.osh-content > section.products > div > a")
Dim product As New Product()
product.Name = Links.InnerText
product.Image = Links.Css("div.image-wrapper.default-state > img")(0).Attributes ("src")
Next Links
Scrape(productList, "Products.Jsonl")
End Sub