Scrape Content from a Shopping Website
We select a shopping site to scrape the content from it.
As you can see from the image, we have a left bar that contains links for the site's product categories. Our first step is to investigate the site's HTML and plan how we wish to scrape it.
The fashion site categories have subcategories (Men, Women, Kids).
<li class="menu-item" data-id="">
<a href="https://domain.com/fashion-by-/" class="main-category">
<i class="cat-icon osh-font-fashion"></i>
<span class="nav-subTxt">FASHION </span>
<i class="osh-font-light-arrow-left"></i><i class="osh-font-light-arrow-right"></i>
</a>
<div class="navLayerWrapper" style="width: 633px; display: none;">
<div class="submenu">
<div class="column">
<div class="categories">
<a class="category" href="https://domain.com/fashion-by-/?sort=newest&dir=desc&viewType=gridView3">New Arrivals !</a>
</div>
<div class="categories">
<a class="category" href="https://domain.com/men-fashion/">Men</a>
<a class="subcategory" href="https://domain.com/mens-shoes/">Shoes</a>
<a class="subcategory" href="https://domain.com/mens-clothing/">Clothing</a>
<a class="subcategory" href="https://domain.com/mens-accessories/">Accessories</a>
</div>
<div class="categories">
<a class="category" href="https://domain.com/women-fashion/">Women</a>
<a class="subcategory" href="https://domain.com/womens-shoes/">Shoes</a>
<a class="subcategory" href="https://domain.com/womens-clothing/">Clothing</a>
<a class="subcategory" href="https://domain.com/womens-accessories/">Accessories</a>
</div>
<div class="categories">
<a class="category" href="https://domain.com/girls-boys-fashion/">Kids</a>
<a class="subcategory" href="https://domain.com/boys-fashion/">Boys</a>
<a class="subcategory" href="https://domain.com/girls/">Girls</a>
</div>
<div class="categories">
<a class="category" href="https://domain.com/maternity-clothes/">Maternity Clothes</a>
</div>
</div>
<div class="column">
<div class="categories">
<span class="category defaultCursor">Men Best Sellers</span>
<a class="subcategory" href="https://domain.com/mens-casual-shoes/">Casual Shoes</a>
<a class="subcategory" href="https://domain.com/mens-sneakers/">Sneakers</a>
<a class="subcategory" href="https://domain.com/mens-t-shirts/">T-shirts</a>
<a class="subcategory" href="https://domain.com/mens-polos/">Polos</a>
</div>
<div class="categories">
<span class="category defaultCursor">Women Best Sellers</span>
<a class="subcategory" href="https://domain.com/womens-sandals/">Sandals</a>
<a class="subcategory" href="https://domain.com/womens-sneakers/">Sneakers</a>
<a class="subcategory" href="https://domain.com/women-dresses/">Dresses</a>
<a class="subcategory" href="https://domain.com/women-tops/">Tops</a>
</div>
<div class="categories">
<a class="category" href="https://domain.com/womens-curvy-clothing/">Women's Curvy Clothing</a>
</div>
<div class="categories">
<a class="category" href="https://domain.com/fashion-bundles/v/">Fashion Bundles</a>
</div>
<div class="categories">
<a class="category" href="https://domain.com/hijab-fashion/">Hijab Fashion</a>
</div>
</div>
<div class="column">
<div class="categories">
<a class="category" href="https://domain.com/brands/fashion-by-/">SEE ALL BRANDS</a>
<a class="subcategory" href="https://domain.com/adidas/">Adidas</a>
<a class="subcategory" href="https://domain.com/converse/">Converse</a>
<a class="subcategory" href="https://domain.com/ravin/">Ravin</a>
<a class="subcategory" href="https://domain.com/dejavu/">Dejavu</a>
<a class="subcategory" href="https://domain.com/agu/">Agu</a>
<a class="subcategory" href="https://domain.com/activ/">Activ</a>
<a class="subcategory" href="https://domain.com/oxford--bellini--tie-house--milano/">Tie House</a>
<a class="subcategory" href="https://domain.com/shoe-room/">Shoe Room</a>
<a class="subcategory" href="https://domain.com/town-team/">Town Team</a>
</div>
</div>
</div>
</div>
</li>
<li class="menu-item" data-id="">
<a href="https://domain.com/fashion-by-/" class="main-category">
<i class="cat-icon osh-font-fashion"></i>
<span class="nav-subTxt">FASHION </span>
<i class="osh-font-light-arrow-left"></i><i class="osh-font-light-arrow-right"></i>
</a>
<div class="navLayerWrapper" style="width: 633px; display: none;">
<div class="submenu">
<div class="column">
<div class="categories">
<a class="category" href="https://domain.com/fashion-by-/?sort=newest&dir=desc&viewType=gridView3">New Arrivals !</a>
</div>
<div class="categories">
<a class="category" href="https://domain.com/men-fashion/">Men</a>
<a class="subcategory" href="https://domain.com/mens-shoes/">Shoes</a>
<a class="subcategory" href="https://domain.com/mens-clothing/">Clothing</a>
<a class="subcategory" href="https://domain.com/mens-accessories/">Accessories</a>
</div>
<div class="categories">
<a class="category" href="https://domain.com/women-fashion/">Women</a>
<a class="subcategory" href="https://domain.com/womens-shoes/">Shoes</a>
<a class="subcategory" href="https://domain.com/womens-clothing/">Clothing</a>
<a class="subcategory" href="https://domain.com/womens-accessories/">Accessories</a>
</div>
<div class="categories">
<a class="category" href="https://domain.com/girls-boys-fashion/">Kids</a>
<a class="subcategory" href="https://domain.com/boys-fashion/">Boys</a>
<a class="subcategory" href="https://domain.com/girls/">Girls</a>
</div>
<div class="categories">
<a class="category" href="https://domain.com/maternity-clothes/">Maternity Clothes</a>
</div>
</div>
<div class="column">
<div class="categories">
<span class="category defaultCursor">Men Best Sellers</span>
<a class="subcategory" href="https://domain.com/mens-casual-shoes/">Casual Shoes</a>
<a class="subcategory" href="https://domain.com/mens-sneakers/">Sneakers</a>
<a class="subcategory" href="https://domain.com/mens-t-shirts/">T-shirts</a>
<a class="subcategory" href="https://domain.com/mens-polos/">Polos</a>
</div>
<div class="categories">
<span class="category defaultCursor">Women Best Sellers</span>
<a class="subcategory" href="https://domain.com/womens-sandals/">Sandals</a>
<a class="subcategory" href="https://domain.com/womens-sneakers/">Sneakers</a>
<a class="subcategory" href="https://domain.com/women-dresses/">Dresses</a>
<a class="subcategory" href="https://domain.com/women-tops/">Tops</a>
</div>
<div class="categories">
<a class="category" href="https://domain.com/womens-curvy-clothing/">Women's Curvy Clothing</a>
</div>
<div class="categories">
<a class="category" href="https://domain.com/fashion-bundles/v/">Fashion Bundles</a>
</div>
<div class="categories">
<a class="category" href="https://domain.com/hijab-fashion/">Hijab Fashion</a>
</div>
</div>
<div class="column">
<div class="categories">
<a class="category" href="https://domain.com/brands/fashion-by-/">SEE ALL BRANDS</a>
<a class="subcategory" href="https://domain.com/adidas/">Adidas</a>
<a class="subcategory" href="https://domain.com/converse/">Converse</a>
<a class="subcategory" href="https://domain.com/ravin/">Ravin</a>
<a class="subcategory" href="https://domain.com/dejavu/">Dejavu</a>
<a class="subcategory" href="https://domain.com/agu/">Agu</a>
<a class="subcategory" href="https://domain.com/activ/">Activ</a>
<a class="subcategory" href="https://domain.com/oxford--bellini--tie-house--milano/">Tie House</a>
<a class="subcategory" href="https://domain.com/shoe-room/">Shoe Room</a>
<a class="subcategory" href="https://domain.com/town-team/">Town Team</a>
</div>
</div>
</div>
</div>
</li>
Let's set up a project.
- Create a new Console App or add a new folder for our new sample with the name “ShoppingSiteSample”.
- Add a new class with the name “ShoppingScraper”.
- The first step will be to scrape the site categories and their subcategories.
Let’s create a Categories Model:
public class Category
{
/// <summary>
/// Gets or sets the name.
/// </summary>
/// <value>
/// The name.
/// </value>
public string Name { get; set; }
/// <summary>
/// Gets or sets the URL.
/// </summary>
/// <value>
/// The URL.
/// </value>
public string URL { get; set; }
/// <summary>
/// Gets or sets the subcategories.
/// </summary>
/// <value>
/// The subcategories.
/// </value>
public List<Category> SubCategories { get; set; }
}
public class Category
{
/// <summary>
/// Gets or sets the name.
/// </summary>
/// <value>
/// The name.
/// </value>
public string Name { get; set; }
/// <summary>
/// Gets or sets the URL.
/// </summary>
/// <value>
/// The URL.
/// </value>
public string URL { get; set; }
/// <summary>
/// Gets or sets the subcategories.
/// </summary>
/// <value>
/// The subcategories.
/// </value>
public List<Category> SubCategories { get; set; }
}
Public Class Category
''' <summary>
''' Gets or sets the name.
''' </summary>
''' <value>
''' The name.
''' </value>
Public Property Name() As String
''' <summary>
''' Gets or sets the URL.
''' </summary>
''' <value>
''' The URL.
''' </value>
Public Property URL() As String
''' <summary>
''' Gets or sets the subcategories.
''' </summary>
''' <value>
''' The subcategories.
''' </value>
Public Property SubCategories() As List(Of Category)
End Class
- Now let’s build our scraper logic
public class ShoppingScraper : WebScraper
{
/// <summary>
/// Initialize the web scraper, setting the start URLs and allowed/banned domains or URL patterns.
/// </summary>
public override void Init()
{
License.LicenseKey = "LicenseKey";
this.LoggingLevel = WebScraper.LogLevel.All;
this.WorkingDirectory = AppSetting.GetAppRoot() + @"\ShoppingSiteSample\Output\";
this.Request("www.webSite.com", Parse);
}
/// <summary>
/// Parses the HTML document of the response to scrap the necessary data.
/// </summary>
/// <param name="response">The HTTP Response object to parse.</param>
public override void Parse(Response response)
{
var categoryList = new List<Category>();
// Iterate through each link in the menu and extract the category data.
foreach (var Links in response.Css("#menuFixed > ul > li > a"))
{
var cat = new Category
{
URL = Links.Attributes["href"],
Name = Links.InnerText
};
categoryList.Add(cat);
}
// Save the scraped data into a JSONL file.
Scrape(categoryList, "Shopping.jsonl");
}
}
public class ShoppingScraper : WebScraper
{
/// <summary>
/// Initialize the web scraper, setting the start URLs and allowed/banned domains or URL patterns.
/// </summary>
public override void Init()
{
License.LicenseKey = "LicenseKey";
this.LoggingLevel = WebScraper.LogLevel.All;
this.WorkingDirectory = AppSetting.GetAppRoot() + @"\ShoppingSiteSample\Output\";
this.Request("www.webSite.com", Parse);
}
/// <summary>
/// Parses the HTML document of the response to scrap the necessary data.
/// </summary>
/// <param name="response">The HTTP Response object to parse.</param>
public override void Parse(Response response)
{
var categoryList = new List<Category>();
// Iterate through each link in the menu and extract the category data.
foreach (var Links in response.Css("#menuFixed > ul > li > a"))
{
var cat = new Category
{
URL = Links.Attributes["href"],
Name = Links.InnerText
};
categoryList.Add(cat);
}
// Save the scraped data into a JSONL file.
Scrape(categoryList, "Shopping.jsonl");
}
}
Public Class ShoppingScraper
Inherits WebScraper
''' <summary>
''' Initialize the web scraper, setting the start URLs and allowed/banned domains or URL patterns.
''' </summary>
Public Overrides Sub Init()
License.LicenseKey = "LicenseKey"
Me.LoggingLevel = WebScraper.LogLevel.All
Me.WorkingDirectory = AppSetting.GetAppRoot() & "\ShoppingSiteSample\Output\"
Me.Request("www.webSite.com", AddressOf Parse)
End Sub
''' <summary>
''' Parses the HTML document of the response to scrap the necessary data.
''' </summary>
''' <param name="response">The HTTP Response object to parse.</param>
Public Overrides Sub Parse(ByVal response As Response)
Dim categoryList = New List(Of Category)()
' Iterate through each link in the menu and extract the category data.
For Each Links In response.Css("#menuFixed > ul > li > a")
Dim cat = New Category With {
.URL = Links.Attributes("href"),
.Name = Links.InnerText
}
categoryList.Add(cat)
Next Links
' Save the scraped data into a JSONL file.
Scrape(categoryList, "Shopping.jsonl")
End Sub
End Class
Scraping links from the menu:
Let’s update our code to scrape the Main Categories and all its sub-links:
public override void Parse(Response response)
{
// List of Category Links (Root)
var categoryList = new List<Category>();
// Traverse each 'li' under the fixed menu
foreach (var li in response.Css("#menuFixed > ul > li"))
{
// List of Main Links
foreach (var Links in li.Css("a"))
{
var cat = new Category
{
URL = Links.Attributes["href"],
Name = Links.InnerText,
SubCategories = new List<Category>()
};
// List of Subcategories Links
foreach (var subCategory in li.Css("a[class=subcategory]"))
{
var subcat = new Category
{
URL = subCategory.Attributes["href"],
Name = subCategory.InnerText
};
// Check if subcategory link already exists
if (cat.SubCategories.Find(c => c.Name == subcat.Name && c.URL == subcat.URL) == null)
{
// Add sublinks
cat.SubCategories.Add(subcat);
}
}
// Add Main Category to the list
categoryList.Add(cat);
}
}
// Save the scraped data into a JSONL file.
Scrape(categoryList, "Shopping.jsonl");
}
public override void Parse(Response response)
{
// List of Category Links (Root)
var categoryList = new List<Category>();
// Traverse each 'li' under the fixed menu
foreach (var li in response.Css("#menuFixed > ul > li"))
{
// List of Main Links
foreach (var Links in li.Css("a"))
{
var cat = new Category
{
URL = Links.Attributes["href"],
Name = Links.InnerText,
SubCategories = new List<Category>()
};
// List of Subcategories Links
foreach (var subCategory in li.Css("a[class=subcategory]"))
{
var subcat = new Category
{
URL = subCategory.Attributes["href"],
Name = subCategory.InnerText
};
// Check if subcategory link already exists
if (cat.SubCategories.Find(c => c.Name == subcat.Name && c.URL == subcat.URL) == null)
{
// Add sublinks
cat.SubCategories.Add(subcat);
}
}
// Add Main Category to the list
categoryList.Add(cat);
}
}
// Save the scraped data into a JSONL file.
Scrape(categoryList, "Shopping.jsonl");
}
Public Overrides Sub Parse(ByVal response As Response)
' List of Category Links (Root)
Dim categoryList = New List(Of Category)()
' Traverse each 'li' under the fixed menu
For Each li In response.Css("#menuFixed > ul > li")
' List of Main Links
For Each Links In li.Css("a")
Dim cat = New Category With {
.URL = Links.Attributes("href"),
.Name = Links.InnerText,
.SubCategories = New List(Of Category)()
}
' List of Subcategories Links
For Each subCategory In li.Css("a[class=subcategory]")
Dim subcat = New Category With {
.URL = subCategory.Attributes("href"),
.Name = subCategory.InnerText
}
' Check if subcategory link already exists
If cat.SubCategories.Find(Function(c) c.Name = subcat.Name AndAlso c.URL = subcat.URL) Is Nothing Then
' Add sublinks
cat.SubCategories.Add(subcat)
End If
Next subCategory
' Add Main Category to the list
categoryList.Add(cat)
Next Links
Next li
' Save the scraped data into a JSONL file.
Scrape(categoryList, "Shopping.jsonl")
End Sub
Now we have links to all site categories. Let’s start scraping the products within each category. Let’s navigate to any category and check the content.
Let’s see its code:
<section class="products">
<div class="sku -gallery -validate-size " data-sku="AG249FA0T2PSGNAFAMZ" ft-product-sizes="41,42,43,44,45" ft-product-color="Multicolour">
<a class="link" href="http://www.WebSite.com/agu-bundle-of-2-sneakers-black-navy-blue-653884.html">
<div class="image-wrapper default-state">
<img class="lazy image -loaded" alt="Bundle Of 2 Sneakers - Black & Navy Blue" data-image-vertical="1" width="210" height="262" src="https://static.WebSite.com/p/agu-6208-488356-1-catalog_grid_3.jpg" data-sku="AG249FA0T2PSGNAFAMZ" data-src="https://static.WebSite.com/p/agu-6208-488356-1-catalog_grid_3.jpg" data-placeholder="placeholder_m_1.jpg">
<noscript><img src="https://static.WebSite.com/p/agu-6208-488356-1-catalog_grid_3.jpg" width="210" height="262" class="image" /></noscript>
</div>
<h2 class="title">
<span class="brand ">Agu </span>
<span class="name" dir="ltr">Bundle Of 2 Sneakers - Black & Navy Blue</span>
</h2>
<div class="price-container clearfix">
<span class="price-box">
<span class="price">
<span data-currency-iso="EGP">EGP</span>
<span dir="ltr" data-price="299">299</span>
</span>
<span class="price -old -no-special"></span>
</span>
</div>
<div class="rating-stars">
<div class="stars-container">
<div class="stars" style="width: 62%"></div>
</div>
<div class="total-ratings">(30)</div>
</div>
<span class="shop-first-logo-container">
<img src="http://www.WebSite.com/images/local/logos/shop_first/ShoppingSite/logo_normal.png" data-src="http://www.WebSite.com/images/local/logos/shop_first/ShoppingSite/logo_normal.png" class="lazy shop-first-logo-img -mbxs -loaded">
</span>
<span class="osh-icon -ShoppingSite-local shop_local--logo -block -mbs -mts"></span>
<div class="list -sizes" data-selected-sku="">
<span class="js-link sku-size" data-href="http://www.WebSite.com/agu-bundle-of-2-sneakers-black-navy-blue-653884.html?size=41">41</span>
<span class="js-link sku-size" data-href="http://www.WebSite.com/agu-bundle-of-2-sneakers-black-navy-blue-653884.html?size=42">42</span>
<span class="js-link sku-size" data-href="http://www.WebSite.com/agu-bundle-of-2-sneakers-black-navy-blue-653884.html?size=43">43</span>
<span class="js-link sku-size" data-href="http://www.WebSite.com/agu-bundle-of-2-sneakers-black-navy-blue-653884.html?size=44">44</span>
<span class="js-link sku-size" data-href="http://www.WebSite.com/agu-bundle-of-2-sneakers-black-navy-blue-653884.html?size=45">45</span>
</div>
</a>
</div>
<div class="sku -gallery -validate-size " data-sku="LE047FA01SRK4NAFAMZ" ft-product-sizes="110,115,120,125,130,135" ft-product-color="Black">
<a class="link" href="http://www.WebSite.com/leather-shop-genuine-leather-belt-black-712030.html">
<div class="image-wrapper default-state">
<img class="lazy image -loaded" alt="Genuine Leather Belt - Black" data-image-vertical="1" width="210" height="262" src="https://static.WebSite.com/p/leather-shop-1831-030217-1-catalog_grid_3.jpg" data-sku="LE047FA01SRK4NAFAMZ" data-src="https://static.WebSite.com/p/leather-shop-1831-030217-1-catalog_grid_3.jpg" data-placeholder="placeholder_m_1.jpg">
<noscript><img src="https://static.WebSite.com/p/leather-shop-1831-030217-1-catalog_grid_3.jpg" width="210" height="262" class="image" /></noscript>
</div>
<h2 class="title"><span class="brand ">Leather Shop </span> <span class="name" dir="ltr">Genuine Leather Belt - Black</span></h2>
<div class="price-container clearfix">
<span class="sale-flag-percent">-29%</span>
<span class="price-box">
<span class="price"><span data-currency-iso="EGP">EGP</span> <span dir="ltr" data-price="96">96</span> </span>
<span class="price -old"><span data-currency-iso="EGP">EGP</span> <span dir="ltr" data-price="135">135</span> </span>
</span>
</div>
<div class="rating-stars">
<div class="stars-container">
<div class="stars" style="width: 100%"></div>
</div>
<div class="total-ratings">(1)</div>
</div>
<span class="osh-icon -ShoppingSite-local shop_local--logo -block -mbs -mts"></span>
<div class="list -sizes" data-selected-sku="">
<span class="js-link sku-size" data-href="http://www.WebSite.com/leather-shop-genuine-leather-belt-black-712030.html?size=110">110</span>
<span class="js-link sku-size" data-href="http://www.WebSite.com/leather-shop-genuine-leather-belt-black-712030.html?size=115">115</span>
<span class="js-link sku-size" data-href="http://www.WebSite.com/leather-shop-genuine-leather-belt-black-712030.html?size=120">120</span>
<span class="js-link sku-size" data-href="http://www.WebSite.com/leather-shop-genuine-leather-belt-black-712030.html?size=125">125</span>
<span class="js-link sku-size" data-href="http://www.WebSite.com/leather-shop-genuine-leather-belt-black-712030.html?size=130">130</span>
<span class="js-link sku-size" data-href="http://www.WebSite.com/leather-shop-genuine-leather-belt-black-712030.html?size=135">135</span>
</div>
</a>
</div>
</section>
<section class="products">
<div class="sku -gallery -validate-size " data-sku="AG249FA0T2PSGNAFAMZ" ft-product-sizes="41,42,43,44,45" ft-product-color="Multicolour">
<a class="link" href="http://www.WebSite.com/agu-bundle-of-2-sneakers-black-navy-blue-653884.html">
<div class="image-wrapper default-state">
<img class="lazy image -loaded" alt="Bundle Of 2 Sneakers - Black & Navy Blue" data-image-vertical="1" width="210" height="262" src="https://static.WebSite.com/p/agu-6208-488356-1-catalog_grid_3.jpg" data-sku="AG249FA0T2PSGNAFAMZ" data-src="https://static.WebSite.com/p/agu-6208-488356-1-catalog_grid_3.jpg" data-placeholder="placeholder_m_1.jpg">
<noscript><img src="https://static.WebSite.com/p/agu-6208-488356-1-catalog_grid_3.jpg" width="210" height="262" class="image" /></noscript>
</div>
<h2 class="title">
<span class="brand ">Agu </span>
<span class="name" dir="ltr">Bundle Of 2 Sneakers - Black & Navy Blue</span>
</h2>
<div class="price-container clearfix">
<span class="price-box">
<span class="price">
<span data-currency-iso="EGP">EGP</span>
<span dir="ltr" data-price="299">299</span>
</span>
<span class="price -old -no-special"></span>
</span>
</div>
<div class="rating-stars">
<div class="stars-container">
<div class="stars" style="width: 62%"></div>
</div>
<div class="total-ratings">(30)</div>
</div>
<span class="shop-first-logo-container">
<img src="http://www.WebSite.com/images/local/logos/shop_first/ShoppingSite/logo_normal.png" data-src="http://www.WebSite.com/images/local/logos/shop_first/ShoppingSite/logo_normal.png" class="lazy shop-first-logo-img -mbxs -loaded">
</span>
<span class="osh-icon -ShoppingSite-local shop_local--logo -block -mbs -mts"></span>
<div class="list -sizes" data-selected-sku="">
<span class="js-link sku-size" data-href="http://www.WebSite.com/agu-bundle-of-2-sneakers-black-navy-blue-653884.html?size=41">41</span>
<span class="js-link sku-size" data-href="http://www.WebSite.com/agu-bundle-of-2-sneakers-black-navy-blue-653884.html?size=42">42</span>
<span class="js-link sku-size" data-href="http://www.WebSite.com/agu-bundle-of-2-sneakers-black-navy-blue-653884.html?size=43">43</span>
<span class="js-link sku-size" data-href="http://www.WebSite.com/agu-bundle-of-2-sneakers-black-navy-blue-653884.html?size=44">44</span>
<span class="js-link sku-size" data-href="http://www.WebSite.com/agu-bundle-of-2-sneakers-black-navy-blue-653884.html?size=45">45</span>
</div>
</a>
</div>
<div class="sku -gallery -validate-size " data-sku="LE047FA01SRK4NAFAMZ" ft-product-sizes="110,115,120,125,130,135" ft-product-color="Black">
<a class="link" href="http://www.WebSite.com/leather-shop-genuine-leather-belt-black-712030.html">
<div class="image-wrapper default-state">
<img class="lazy image -loaded" alt="Genuine Leather Belt - Black" data-image-vertical="1" width="210" height="262" src="https://static.WebSite.com/p/leather-shop-1831-030217-1-catalog_grid_3.jpg" data-sku="LE047FA01SRK4NAFAMZ" data-src="https://static.WebSite.com/p/leather-shop-1831-030217-1-catalog_grid_3.jpg" data-placeholder="placeholder_m_1.jpg">
<noscript><img src="https://static.WebSite.com/p/leather-shop-1831-030217-1-catalog_grid_3.jpg" width="210" height="262" class="image" /></noscript>
</div>
<h2 class="title"><span class="brand ">Leather Shop </span> <span class="name" dir="ltr">Genuine Leather Belt - Black</span></h2>
<div class="price-container clearfix">
<span class="sale-flag-percent">-29%</span>
<span class="price-box">
<span class="price"><span data-currency-iso="EGP">EGP</span> <span dir="ltr" data-price="96">96</span> </span>
<span class="price -old"><span data-currency-iso="EGP">EGP</span> <span dir="ltr" data-price="135">135</span> </span>
</span>
</div>
<div class="rating-stars">
<div class="stars-container">
<div class="stars" style="width: 100%"></div>
</div>
<div class="total-ratings">(1)</div>
</div>
<span class="osh-icon -ShoppingSite-local shop_local--logo -block -mbs -mts"></span>
<div class="list -sizes" data-selected-sku="">
<span class="js-link sku-size" data-href="http://www.WebSite.com/leather-shop-genuine-leather-belt-black-712030.html?size=110">110</span>
<span class="js-link sku-size" data-href="http://www.WebSite.com/leather-shop-genuine-leather-belt-black-712030.html?size=115">115</span>
<span class="js-link sku-size" data-href="http://www.WebSite.com/leather-shop-genuine-leather-belt-black-712030.html?size=120">120</span>
<span class="js-link sku-size" data-href="http://www.WebSite.com/leather-shop-genuine-leather-belt-black-712030.html?size=125">125</span>
<span class="js-link sku-size" data-href="http://www.WebSite.com/leather-shop-genuine-leather-belt-black-712030.html?size=130">130</span>
<span class="js-link sku-size" data-href="http://www.WebSite.com/leather-shop-genuine-leather-belt-black-712030.html?size=135">135</span>
</div>
</a>
</div>
</section>
Let’s build our product model for this content:
public class Product
{
/// <summary>
/// Gets or sets the name.
/// </summary>
/// <value>
/// The name.
/// </value>
public string Name { get; set; }
/// <summary>
/// Gets or sets the price.
/// </summary>
/// <value>
/// The price.
/// </value>
public string Price { get; set; }
/// <summary>
/// Gets or sets the image.
/// </summary>
/// <value>
/// The image.
/// </value>
public string Image { get; set; }
}
public class Product
{
/// <summary>
/// Gets or sets the name.
/// </summary>
/// <value>
/// The name.
/// </value>
public string Name { get; set; }
/// <summary>
/// Gets or sets the price.
/// </summary>
/// <value>
/// The price.
/// </value>
public string Price { get; set; }
/// <summary>
/// Gets or sets the image.
/// </summary>
/// <value>
/// The image.
/// </value>
public string Image { get; set; }
}
Public Class Product
''' <summary>
''' Gets or sets the name.
''' </summary>
''' <value>
''' The name.
''' </value>
Public Property Name() As String
''' <summary>
''' Gets or sets the price.
''' </summary>
''' <value>
''' The price.
''' </value>
Public Property Price() As String
''' <summary>
''' Gets or sets the image.
''' </summary>
''' <value>
''' The image.
''' </value>
Public Property Image() As String
End Class
To scrape category pages, we add a new scrape method:
public void ParseCategory(Response response)
{
// List of Products
var productList = new List<Product>();
// Iterate through product links in the product section
foreach (var Links in response.Css("section.products > div > a"))
{
var product = new Product
{
Name = Links.Css("h2.title > span.name").First().InnerText,
Price = Links.Css("div.price-container > span.price-box > span.price > span[data-price]").First().InnerText,
Image = Links.Css("div.image-wrapper.default-state > img").First().Attributes["src"]
};
productList.Add(product);
}
// Save the scraped product data into a JSONL file.
Scrape(productList, "Products.jsonl");
}
public void ParseCategory(Response response)
{
// List of Products
var productList = new List<Product>();
// Iterate through product links in the product section
foreach (var Links in response.Css("section.products > div > a"))
{
var product = new Product
{
Name = Links.Css("h2.title > span.name").First().InnerText,
Price = Links.Css("div.price-container > span.price-box > span.price > span[data-price]").First().InnerText,
Image = Links.Css("div.image-wrapper.default-state > img").First().Attributes["src"]
};
productList.Add(product);
}
// Save the scraped product data into a JSONL file.
Scrape(productList, "Products.jsonl");
}
Public Sub ParseCategory(ByVal response As Response)
' List of Products
Dim productList = New List(Of Product)()
' Iterate through product links in the product section
For Each Links In response.Css("section.products > div > a")
Dim product As New Product With {
.Name = Links.Css("h2.title > span.name").First().InnerText,
.Price = Links.Css("div.price-container > span.price-box > span.price > span[data-price]").First().InnerText,
.Image = Links.Css("div.image-wrapper.default-state > img").First().Attributes("src")
}
productList.Add(product)
Next Links
' Save the scraped product data into a JSONL file.
Scrape(productList, "Products.jsonl")
End Sub
Frequently Asked Questions
How can I scrape product categories from an online shopping website?
You can use IronWebScraper to scrape product categories by examining the site's HTML structure, particularly the sidebar listing categories. Initialize the scraper, define a 'Category' model with properties like name and URL, and use the scraper to extract this data.
What steps are involved in setting up a web scraper for a shopping site?
Start by creating a new C# console application and a class, such as 'ShoppingScraper'. Use IronWebScraper to initialize the scraper, set target URLs, and define the scraping logic to extract categories and product details.
How can I extract and save product details using a web scraper?
With IronWebScraper, you can extract product details like names, prices, and images by defining a 'Product' model. Use the scraper to parse category pages and save the extracted data into JSONL files using the Scrape
method.
What is the role of the 'Category' model in web scraping?
The 'Category' model serves as a data structure to store information about each product category, including its name, URL, and subcategories, facilitating organized data extraction with IronWebScraper.
How does IronWebScraper handle subcategory extraction?
IronWebScraper handles subcategories by iterating through subcategory links within each main category, adding them to the 'SubCategories' list of the respective 'Category' model for a comprehensive scrape.
What are common challenges when scraping data from shopping websites?
Common challenges include dealing with dynamic JavaScript content, managing pagination, adapting to changes in site structure, and ensuring compliance with legal and technical constraints like robots.txt.
Can IronWebScraper extract product sizes from a shopping site?
Yes, IronWebScraper can capture product sizes by identifying size information within product details and storing it as part of the extracted product data.
How do you verify the accuracy of web scraped data?
Ensure accuracy by inspecting the site's HTML, using precise CSS selectors, and testing the scraper across various pages to confirm consistent and correct data extraction.
How can I enhance a scraper to retrieve all sub-links on a page?
Enhance your scraper by refining the scraping logic to follow and extract data from all sub-links on a page, ensuring comprehensive data collection using IronWebScraper's capabilities.