Web-Scraping einer Shopping-Website in C

This article was translated from English: Does it need improvement?
Translated
View the article in English

Lernen Sie, wie man mit C# und dem WebScraper-Framework Produktkategorien und Artikel von Shopping-Websites ausliest und strukturierte Daten aus HTML-Elementen in benutzerdefinierte Modelle extrahiert. Dieses umfassende Handbuch führt Sie durch die Erstellung eines robusten E-Commerce-Scrapers unter Verwendung der IronWebScraper-Bibliothek.

Als-Überschrift:2(Schnellstart: Scrape Shopping Website in C#)

Nuget IconLegen Sie jetzt mit NuGet los, um PDFs zu erstellen:

  1. Installieren Sie IronWebScraper mit dem NuGet-Paketmanager.

    PM > Install-Package IronWebScraper

  2. Kopieren Sie diesen Codeausschnitt und führen Sie ihn aus.

    using IronWebScraper;
    
    public class QuickShoppingScraper : WebScraper
    {
        public override void Init()
        {
            // Apply your license key
            License.LicenseKey = "YOUR-LICENSE-KEY";
    
            // Set the starting URL
            this.Request("https://shopping-site.com", Parse);
        }
    
        public override void Parse(Response response)
        {
            // Extract product data
            foreach (var product in response.Css(".product-item"))
            {
                var item = new
                {
                    Name = product.Css(".product-name").First().InnerText,
                    Price = product.Css(".price").First().InnerText,
                    Image = product.Css("img").First().Attributes["src"]
                };
    
                Scrape(item, "products.jsonl");
            }
        }
    }
    
    // Run the scraper
    var scraper = new QuickShoppingScraper();
    scraper.Start();
  3. Bereitstellen zum Testen in Ihrer Live-Umgebung

    Beginnen Sie noch heute mit der Nutzung von IronWebScraper in Ihrem Projekt – mit einer kostenlosen Testversion.
    arrow pointer
  1. Erstellen Sie ein neues Console App-Projekt mit dem Namen "ShoppingSiteSample"
  2. Fügen Sie eine Klasse namens "ShoppingScraper" hinzu, die von WebScraper erbt
  3. Erstellen Sie Modelle für Kategorie- und Produkt-Daten
  4. Überschreiben Sie Init(), um die Start-URL und die Parse()-Methode für das Scraping festzulegen
  5. Führen Sie den Scraper aus, um Kategorien und Produkte in JSONL-Dateien zu extrahieren

Wie analysiere ich die HTML-Struktur der Shopping-Website?

Wählen Sie eine Shopping-Website aus, um ihre Inhaltsstruktur zu analysieren. Das Verständnis der HTML-Struktur ist für ein erfolgreiches Web Scraping entscheidend. Bevor Sie Code schreiben, sollten Sie die Struktur der Ziel-Website mit Hilfe von Browser-Entwickler-Tools analysieren.

Jumia E-Commerce-Homepage mit Ramadan-Werbebanner und Navigationsmenü

Wie auf dem Bild zu sehen, enthält die linke Seitenleiste Links zu den Produktkategorien der Website. Der erste Schritt besteht darin, den HTML-Code der Website zu untersuchen und das Scraping-Verfahren zu planen. Diese Analysephase ist für die Entwicklung einer effektiven Scraping-Strategie unerlässlich.

Navigationsmenü einer E-Commerce-Website mit Produktkategorien, Unterkategorien und Markenbereichen

Warum ist es wichtig, die HTML-Struktur zu verstehen?

Die Modekategorien der Seite haben Unterkategorien (Männer, Frauen, Kinder). Das Verständnis dieser hierarchischen Struktur hilft beim Entwurf geeigneter Datenmodelle und Scraping-Logik. Bei der Arbeit mit erweiterten Web-Scraping-Funktionen wird eine korrekte HTML-Analyse noch wichtiger.

<li class="menu-item" data-id="">
    <a href="https://domain.com/fashion-by-/" class="main-category">
        <i class="cat-icon osh-font-fashion"></i>
        <span class="nav-subTxt">FASHION </span>
        <i class="osh-font-light-arrow-left"></i><i class="osh-font-light-arrow-right"></i>
    </a>
    <div class="navLayerWrapper" style="width: 633px; display: none;">
        <div class="submenu">
            <div class="column">
                <div class="categories">
                    <a class="category" href="https://domain.com/fashion-by-/?sort=newest&amp;dir=desc&amp;viewType=gridView3">New Arrivals !</a>
                </div>
                <div class="categories">
                    <a class="category" href="https://domain.com/men-fashion/">Men</a>
                    <a class="subcategory" href="https://domain.com/mens-shoes/">Shoes</a>
                    <a class="subcategory" href="https://domain.com/mens-clothing/">Clothing</a>
                    <a class="subcategory" href="https://domain.com/mens-accessories/">Accessories</a>
                </div>
                <div class="categories">
                    <a class="category" href="https://domain.com/women-fashion/">Women</a>
                    <a class="subcategory" href="https://domain.com/womens-shoes/">Shoes</a>
                    <a class="subcategory" href="https://domain.com/womens-clothing/">Clothing</a>
                    <a class="subcategory" href="https://domain.com/womens-accessories/">Accessories</a>
                </div>
                <div class="categories">
                    <a class="category" href="https://domain.com/girls-boys-fashion/">Kids</a>
                    <a class="subcategory" href="https://domain.com/boys-fashion/">Boys</a>
                    <a class="subcategory" href="https://domain.com/girls/">Girls</a>
                </div>
                <div class="categories">
                    <a class="category" href="https://domain.com/maternity-clothes/">Maternity Clothes</a>
                </div>
            </div>
            <div class="column">
                <div class="categories">
                    <span class="category defaultCursor">Men Best Sellers</span>
                    <a class="subcategory" href="https://domain.com/mens-casual-shoes/">Casual Shoes</a>
                    <a class="subcategory" href="https://domain.com/mens-sneakers/">Sneakers</a>
                    <a class="subcategory" href="https://domain.com/mens-t-shirts/">T-shirts</a>
                    <a class="subcategory" href="https://domain.com/mens-polos/">Polos</a>
                </div>
                <div class="categories">
                    <span class="category defaultCursor">Women Best Sellers</span>
                    <a class="subcategory" href="https://domain.com/womens-sandals/">Sandals</a>
                    <a class="subcategory" href="https://domain.com/womens-sneakers/">Sneakers</a>
                    <a class="subcategory" href="https://domain.com/women-dresses/">Dresses</a>
                    <a class="subcategory" href="https://domain.com/women-tops/">Tops</a>
                </div>
                <div class="categories">
                    <a class="category" href="https://domain.com/womens-curvy-clothing/">Women's Curvy Clothing</a>
                </div>
                <div class="categories">
                    <a class="category" href="https://domain.com/fashion-bundles/v/">Fashion Bundles</a>
                </div>
                <div class="categories">
                    <a class="category" href="https://domain.com/hijab-fashion/">Hijab Fashion</a>
                </div>
            </div>
            <div class="column">
                <div class="categories">
                    <a class="category" href="https://domain.com/brands/fashion-by-/">SEE ALL BRANDS</a>
                    <a class="subcategory" href="https://domain.com/adidas/">Adidas</a>
                    <a class="subcategory" href="https://domain.com/converse/">Converse</a>
                    <a class="subcategory" href="https://domain.com/ravin/">Ravin</a>
                    <a class="subcategory" href="https://domain.com/dejavu/">Dejavu</a>
                    <a class="subcategory" href="https://domain.com/agu/">Agu</a>
                    <a class="subcategory" href="https://domain.com/activ/">Activ</a>
                    <a class="subcategory" href="https://domain.com/oxford--bellini--tie-house--milano/">Tie House</a>
                    <a class="subcategory" href="https://domain.com/shoe-room/">Shoe Room</a>
                    <a class="subcategory" href="https://domain.com/town-team/">Town Team</a>
                </div>
            </div>
        </div>
    </div>
</li>
<li class="menu-item" data-id="">
    <a href="https://domain.com/fashion-by-/" class="main-category">
        <i class="cat-icon osh-font-fashion"></i>
        <span class="nav-subTxt">FASHION </span>
        <i class="osh-font-light-arrow-left"></i><i class="osh-font-light-arrow-right"></i>
    </a>
    <div class="navLayerWrapper" style="width: 633px; display: none;">
        <div class="submenu">
            <div class="column">
                <div class="categories">
                    <a class="category" href="https://domain.com/fashion-by-/?sort=newest&amp;dir=desc&amp;viewType=gridView3">New Arrivals !</a>
                </div>
                <div class="categories">
                    <a class="category" href="https://domain.com/men-fashion/">Men</a>
                    <a class="subcategory" href="https://domain.com/mens-shoes/">Shoes</a>
                    <a class="subcategory" href="https://domain.com/mens-clothing/">Clothing</a>
                    <a class="subcategory" href="https://domain.com/mens-accessories/">Accessories</a>
                </div>
                <div class="categories">
                    <a class="category" href="https://domain.com/women-fashion/">Women</a>
                    <a class="subcategory" href="https://domain.com/womens-shoes/">Shoes</a>
                    <a class="subcategory" href="https://domain.com/womens-clothing/">Clothing</a>
                    <a class="subcategory" href="https://domain.com/womens-accessories/">Accessories</a>
                </div>
                <div class="categories">
                    <a class="category" href="https://domain.com/girls-boys-fashion/">Kids</a>
                    <a class="subcategory" href="https://domain.com/boys-fashion/">Boys</a>
                    <a class="subcategory" href="https://domain.com/girls/">Girls</a>
                </div>
                <div class="categories">
                    <a class="category" href="https://domain.com/maternity-clothes/">Maternity Clothes</a>
                </div>
            </div>
            <div class="column">
                <div class="categories">
                    <span class="category defaultCursor">Men Best Sellers</span>
                    <a class="subcategory" href="https://domain.com/mens-casual-shoes/">Casual Shoes</a>
                    <a class="subcategory" href="https://domain.com/mens-sneakers/">Sneakers</a>
                    <a class="subcategory" href="https://domain.com/mens-t-shirts/">T-shirts</a>
                    <a class="subcategory" href="https://domain.com/mens-polos/">Polos</a>
                </div>
                <div class="categories">
                    <span class="category defaultCursor">Women Best Sellers</span>
                    <a class="subcategory" href="https://domain.com/womens-sandals/">Sandals</a>
                    <a class="subcategory" href="https://domain.com/womens-sneakers/">Sneakers</a>
                    <a class="subcategory" href="https://domain.com/women-dresses/">Dresses</a>
                    <a class="subcategory" href="https://domain.com/women-tops/">Tops</a>
                </div>
                <div class="categories">
                    <a class="category" href="https://domain.com/womens-curvy-clothing/">Women's Curvy Clothing</a>
                </div>
                <div class="categories">
                    <a class="category" href="https://domain.com/fashion-bundles/v/">Fashion Bundles</a>
                </div>
                <div class="categories">
                    <a class="category" href="https://domain.com/hijab-fashion/">Hijab Fashion</a>
                </div>
            </div>
            <div class="column">
                <div class="categories">
                    <a class="category" href="https://domain.com/brands/fashion-by-/">SEE ALL BRANDS</a>
                    <a class="subcategory" href="https://domain.com/adidas/">Adidas</a>
                    <a class="subcategory" href="https://domain.com/converse/">Converse</a>
                    <a class="subcategory" href="https://domain.com/ravin/">Ravin</a>
                    <a class="subcategory" href="https://domain.com/dejavu/">Dejavu</a>
                    <a class="subcategory" href="https://domain.com/agu/">Agu</a>
                    <a class="subcategory" href="https://domain.com/activ/">Activ</a>
                    <a class="subcategory" href="https://domain.com/oxford--bellini--tie-house--milano/">Tie House</a>
                    <a class="subcategory" href="https://domain.com/shoe-room/">Shoe Room</a>
                    <a class="subcategory" href="https://domain.com/town-team/">Town Team</a>
                </div>
            </div>
        </div>
    </div>
</li>
HTML

Wie richte ich das Web-Scraping-Projekt ein?

Richten Sie ein Projekt ein, das den Best Practices für C# Web Scraping folgt.

  1. Erstellen Sie eine neue Konsolenanwendung oder fügen Sie einen neuen Ordner für das Beispiel mit dem Namen "ShoppingSiteSample" hinzu
  2. Fügen Sie eine neue Klasse namens "ShoppingScraper" hinzu
  3. Beginnen Sie mit dem Scannen von Website-Kategorien und deren Unterkategorien
  4. Installieren Sie IronWebScraper über den NuGet Package Manager oder die Package Manager Console:
Install-Package IronWebScraper
Install-Package IronWebScraper
IRON VB CONVERTER ERROR developers@ironsoftware.com
$vbLabelText   $csharpLabel

Welches Datenmodell sollte ich für Kategorien verwenden?

Erstellen Sie ein Kategorienmodell, das die entdeckte hierarchische Struktur richtig darstellt:

public class Category
{
    /// <summary>
    /// Gets or sets the name.
    /// </summary>
    /// <value>
    /// The name.
    /// </value>
    public string Name { get; set; }

    /// <summary>
    /// Gets or sets the URL.
    /// </summary>
    /// <value>
    /// The URL.
    /// </value>
    public string URL { get; set; }

    /// <summary>
    /// Gets or sets the subcategories.
    /// </summary>
    /// <value>
    /// The subcategories.
    /// </value>
    public List<Category> SubCategories { get; set; }

    // Additional properties for enhanced data collection
    public int ProductCount { get; set; }
    public DateTime LastScraped { get; set; }
    public string CategoryType { get; set; }
}
public class Category
{
    /// <summary>
    /// Gets or sets the name.
    /// </summary>
    /// <value>
    /// The name.
    /// </value>
    public string Name { get; set; }

    /// <summary>
    /// Gets or sets the URL.
    /// </summary>
    /// <value>
    /// The URL.
    /// </value>
    public string URL { get; set; }

    /// <summary>
    /// Gets or sets the subcategories.
    /// </summary>
    /// <value>
    /// The subcategories.
    /// </value>
    public List<Category> SubCategories { get; set; }

    // Additional properties for enhanced data collection
    public int ProductCount { get; set; }
    public DateTime LastScraped { get; set; }
    public string CategoryType { get; set; }
}
IRON VB CONVERTER ERROR developers@ironsoftware.com
$vbLabelText   $csharpLabel

Wie baue ich die grundlegende Scraper-Logik auf?

Erstellen Sie die Scraper-Logik und denken Sie daran, Ihren Lizenzschlüssel anzuwenden, bevor Sie den Scraper ausführen:

public class ShoppingScraper : WebScraper
{
    /// <summary>
    /// Initialize the web scraper, setting the start URLs and allowed/banned domains or URL patterns.
    /// </summary>
    public override void Init()
    {
        // Apply your license key - get one from https://ironsoftware.com/csharp/webscraper/licensing/
        License.LicenseKey = "LicenseKey";
        this.LoggingLevel = WebScraper.LogLevel.All;
        this.WorkingDirectory = AppSetting.GetAppRoot() + @"\ShoppingSiteSample\Output\";

        // Configure request settings for better performance
        this.Request("www.webSite.com", Parse);
    }

    /// <summary>
    /// Parses the HTML document of the response to scrap the necessary data.
    /// </summary>
    /// <param name="response">The HTTP Response object to parse.</param>
    public override void Parse(Response response)
    {
        var categoryList = new List<Category>();

        // Iterate through each link in the menu and extract the category data.
        foreach (var Links in response.Css("#menuFixed > ul > li > a"))
        {
            var cat = new Category
            {
                URL = Links.Attributes["href"],
                Name = Links.InnerText,
                LastScraped = DateTime.Now
            };
            categoryList.Add(cat);
        }

        // Save the scraped data into a JSONL file.
        Scrape(categoryList, "Shopping.jsonl");
    }
}
public class ShoppingScraper : WebScraper
{
    /// <summary>
    /// Initialize the web scraper, setting the start URLs and allowed/banned domains or URL patterns.
    /// </summary>
    public override void Init()
    {
        // Apply your license key - get one from https://ironsoftware.com/csharp/webscraper/licensing/
        License.LicenseKey = "LicenseKey";
        this.LoggingLevel = WebScraper.LogLevel.All;
        this.WorkingDirectory = AppSetting.GetAppRoot() + @"\ShoppingSiteSample\Output\";

        // Configure request settings for better performance
        this.Request("www.webSite.com", Parse);
    }

    /// <summary>
    /// Parses the HTML document of the response to scrap the necessary data.
    /// </summary>
    /// <param name="response">The HTTP Response object to parse.</param>
    public override void Parse(Response response)
    {
        var categoryList = new List<Category>();

        // Iterate through each link in the menu and extract the category data.
        foreach (var Links in response.Css("#menuFixed > ul > li > a"))
        {
            var cat = new Category
            {
                URL = Links.Attributes["href"],
                Name = Links.InnerText,
                LastScraped = DateTime.Now
            };
            categoryList.Add(cat);
        }

        // Save the scraped data into a JSONL file.
        Scrape(categoryList, "Shopping.jsonl");
    }
}
IRON VB CONVERTER ERROR developers@ironsoftware.com
$vbLabelText   $csharpLabel

Auf welche Elemente im Menü ziele ich ab?

Das Scraping von Links aus dem Menü erfordert präzise CSS-Selektoren. Die API-Referenz bietet detaillierte Informationen über die verfügbaren Selektor-Methoden:

JSON-Datei in Notepad zeigt die Struktur der E-Commerce-Kategorien mit verschachtelten Unterkategorien und URLs

Wie kann ich sowohl Hauptkategorien als auch Unterkategorien scrapen?

Aktualisieren Sie den Code, um die Hauptkategorien und alle Unterlinks zu scrapen. Dieser Ansatz gewährleistet eine vollständige Erfassung der Navigationsstruktur:

public override void Parse(Response response)
{
    // List of Category Links (Root)
    var categoryList = new List<Category>();

    // Traverse each 'li' under the fixed menu
    foreach (var li in response.Css("#menuFixed > ul > li"))
    {
        // List of Main Links
        foreach (var Links in li.Css("a"))
        {
            var cat = new Category
            {
                URL = Links.Attributes["href"],
                Name = Links.InnerText,
                SubCategories = new List<Category>(),
                LastScraped = DateTime.Now
            };

            // List of Subcategories Links
            foreach (var subCategory in li.Css("a[class=subcategory]"))
            {
                var subcat = new Category
                {
                    URL = subCategory.Attributes["href"],
                    Name = subCategory.InnerText,
                    CategoryType = "Subcategory"
                };

                // Check if subcategory link already exists
                if (cat.SubCategories.Find(c => c.Name == subcat.Name && c.URL == subcat.URL) == null)
                {
                    // Add sublinks
                    cat.SubCategories.Add(subcat);
                }
            }

            // Update product count based on subcategories
            cat.ProductCount = cat.SubCategories.Count;

            // Add Main Category to the list
            categoryList.Add(cat);
        }
    }

    // Save the scraped data into a JSONL file.
    Scrape(categoryList, "Shopping.jsonl");
}
public override void Parse(Response response)
{
    // List of Category Links (Root)
    var categoryList = new List<Category>();

    // Traverse each 'li' under the fixed menu
    foreach (var li in response.Css("#menuFixed > ul > li"))
    {
        // List of Main Links
        foreach (var Links in li.Css("a"))
        {
            var cat = new Category
            {
                URL = Links.Attributes["href"],
                Name = Links.InnerText,
                SubCategories = new List<Category>(),
                LastScraped = DateTime.Now
            };

            // List of Subcategories Links
            foreach (var subCategory in li.Css("a[class=subcategory]"))
            {
                var subcat = new Category
                {
                    URL = subCategory.Attributes["href"],
                    Name = subCategory.InnerText,
                    CategoryType = "Subcategory"
                };

                // Check if subcategory link already exists
                if (cat.SubCategories.Find(c => c.Name == subcat.Name && c.URL == subcat.URL) == null)
                {
                    // Add sublinks
                    cat.SubCategories.Add(subcat);
                }
            }

            // Update product count based on subcategories
            cat.ProductCount = cat.SubCategories.Count;

            // Add Main Category to the list
            categoryList.Add(cat);
        }
    }

    // Save the scraped data into a JSONL file.
    Scrape(categoryList, "Shopping.jsonl");
}
IRON VB CONVERTER ERROR developers@ironsoftware.com
$vbLabelText   $csharpLabel

Wie kann ich Produktinformationen aus Kategorieseiten extrahieren?

Da die Links zu allen Website-Kategorien verfügbar sind, beginnen Sie mit dem Scrapen von Produkten innerhalb jeder Kategorie. Bei Produktseiten ist Thread-Sicherheit für eine optimale Leistung wichtig. Navigieren Sie zu einer beliebigen Kategorie und sehen Sie sich den Inhalt an:

E-Commerce Produktlistenseite mit Schuhen und Accessoires mit Preisen, Bewertungen und Filterung

Wie sieht die Produkt-HTML-Struktur aus?

Untersuchen Sie die HTML-Struktur, um die Produktorganisation zu verstehen:

<section class="products">
    <div class="sku -gallery -validate-size " data-sku="AG249FA0T2PSGNAFAMZ" ft-product-sizes="41,42,43,44,45" ft-product-color="Multicolour">
        <a class="link" href="http://www.WebSite.com/agu-bundle-of-2-sneakers-black-navy-blue-653884.html">
            <div class="image-wrapper default-state">
                <img class="lazy image -loaded" alt="Bundle Of 2 Sneakers - Black & Navy Blue" data-image-vertical="1" width="210" height="262" src="https://static.WebSite.com/p/agu-6208-488356-1-catalog_grid_3.jpg" data-sku="AG249FA0T2PSGNAFAMZ" data-src="https://static.WebSite.com/p/agu-6208-488356-1-catalog_grid_3.jpg" data-placeholder="placeholder_m_1.jpg">
                <noscript><img src="https://static.WebSite.com/p/agu-6208-488356-1-catalog_grid_3.jpg" width="210" height="262" class="image" /></noscript>
            </div>
            <h2 class="title">
                <span class="brand ">Agu&nbsp;</span>
                <span class="name" dir="ltr">Bundle Of 2 Sneakers - Black & Navy Blue</span>
            </h2>
            <div class="price-container clearfix">
                <span class="price-box">
                    <span class="price">
                        <span data-currency-iso="EGP">EGP</span>
                        <span dir="ltr" data-price="299">299</span>
                    </span>
                    <span class="price -old  -no-special"></span>
                </span>
            </div>
            <div class="rating-stars">
                <div class="stars-container">
                    <div class="stars" style="width: 62%"></div>
                </div>
                <div class="total-ratings">(30)</div>
            </div>
            <span class="shop-first-logo-container">
                <img src="http://www.WebSite.com/images/local/logos/shop_first/ShoppingSite/logo_normal.png" data-src="http://www.WebSite.com/images/local/logos/shop_first/ShoppingSite/logo_normal.png" class="lazy shop-first-logo-img -mbxs -loaded">
            </span>
            <span class="osh-icon -ShoppingSite-local shop_local--logo -block -mbs -mts"></span>
            <div class="list -sizes" data-selected-sku="">
                <span class="js-link sku-size" data-href="http://www.WebSite.com/agu-bundle-of-2-sneakers-black-navy-blue-653884.html?size=41">41</span>
                <span class="js-link sku-size" data-href="http://www.WebSite.com/agu-bundle-of-2-sneakers-black-navy-blue-653884.html?size=42">42</span>
                <span class="js-link sku-size" data-href="http://www.WebSite.com/agu-bundle-of-2-sneakers-black-navy-blue-653884.html?size=43">43</span>
                <span class="js-link sku-size" data-href="http://www.WebSite.com/agu-bundle-of-2-sneakers-black-navy-blue-653884.html?size=44">44</span>
                <span class="js-link sku-size" data-href="http://www.WebSite.com/agu-bundle-of-2-sneakers-black-navy-blue-653884.html?size=45">45</span>
            </div>
        </a>
    </div>
    <div class="sku -gallery -validate-size " data-sku="LE047FA01SRK4NAFAMZ" ft-product-sizes="110,115,120,125,130,135" ft-product-color="Black">
        <a class="link" href="http://www.WebSite.com/leather-shop-genuine-leather-belt-black-712030.html">
            <div class="image-wrapper default-state">
                <img class="lazy image -loaded" alt="Genuine Leather Belt - Black" data-image-vertical="1" width="210" height="262" src="https://static.WebSite.com/p/leather-shop-1831-030217-1-catalog_grid_3.jpg" data-sku="LE047FA01SRK4NAFAMZ" data-src="https://static.WebSite.com/p/leather-shop-1831-030217-1-catalog_grid_3.jpg" data-placeholder="placeholder_m_1.jpg">
                <noscript><img src="https://static.WebSite.com/p/leather-shop-1831-030217-1-catalog_grid_3.jpg" width="210" height="262" class="image" /></noscript>
            </div>
            <h2 class="title"><span class="brand ">Leather Shop&nbsp;</span> <span class="name" dir="ltr">Genuine Leather Belt - Black</span></h2>
            <div class="price-container clearfix">
                <span class="sale-flag-percent">-29%</span>
                <span class="price-box">
                    <span class="price"><span data-currency-iso="EGP">EGP</span> <span dir="ltr" data-price="96">96</span> </span>
                    <span class="price -old"><span data-currency-iso="EGP">EGP</span> <span dir="ltr" data-price="135">135</span> </span>
                </span>
            </div>
            <div class="rating-stars">
                <div class="stars-container">
                    <div class="stars" style="width: 100%"></div>
                </div>
                <div class="total-ratings">(1)</div>
            </div>
            <span class="osh-icon -ShoppingSite-local shop_local--logo -block -mbs -mts"></span>
            <div class="list -sizes" data-selected-sku="">
                <span class="js-link sku-size" data-href="http://www.WebSite.com/leather-shop-genuine-leather-belt-black-712030.html?size=110">110</span>
                <span class="js-link sku-size"  data-href="http://www.WebSite.com/leather-shop-genuine-leather-belt-black-712030.html?size=115">115</span>
                <span class="js-link sku-size"  data-href="http://www.WebSite.com/leather-shop-genuine-leather-belt-black-712030.html?size=120">120</span>
                <span class="js-link sku-size"  data-href="http://www.WebSite.com/leather-shop-genuine-leather-belt-black-712030.html?size=125">125</span>
                <span class="js-link sku-size"  data-href="http://www.WebSite.com/leather-shop-genuine-leather-belt-black-712030.html?size=130">130</span>
                <span class="js-link sku-size"  data-href="http://www.WebSite.com/leather-shop-genuine-leather-belt-black-712030.html?size=135">135</span>
            </div>
        </a>
    </div>
</section>
<section class="products">
    <div class="sku -gallery -validate-size " data-sku="AG249FA0T2PSGNAFAMZ" ft-product-sizes="41,42,43,44,45" ft-product-color="Multicolour">
        <a class="link" href="http://www.WebSite.com/agu-bundle-of-2-sneakers-black-navy-blue-653884.html">
            <div class="image-wrapper default-state">
                <img class="lazy image -loaded" alt="Bundle Of 2 Sneakers - Black & Navy Blue" data-image-vertical="1" width="210" height="262" src="https://static.WebSite.com/p/agu-6208-488356-1-catalog_grid_3.jpg" data-sku="AG249FA0T2PSGNAFAMZ" data-src="https://static.WebSite.com/p/agu-6208-488356-1-catalog_grid_3.jpg" data-placeholder="placeholder_m_1.jpg">
                <noscript><img src="https://static.WebSite.com/p/agu-6208-488356-1-catalog_grid_3.jpg" width="210" height="262" class="image" /></noscript>
            </div>
            <h2 class="title">
                <span class="brand ">Agu&nbsp;</span>
                <span class="name" dir="ltr">Bundle Of 2 Sneakers - Black & Navy Blue</span>
            </h2>
            <div class="price-container clearfix">
                <span class="price-box">
                    <span class="price">
                        <span data-currency-iso="EGP">EGP</span>
                        <span dir="ltr" data-price="299">299</span>
                    </span>
                    <span class="price -old  -no-special"></span>
                </span>
            </div>
            <div class="rating-stars">
                <div class="stars-container">
                    <div class="stars" style="width: 62%"></div>
                </div>
                <div class="total-ratings">(30)</div>
            </div>
            <span class="shop-first-logo-container">
                <img src="http://www.WebSite.com/images/local/logos/shop_first/ShoppingSite/logo_normal.png" data-src="http://www.WebSite.com/images/local/logos/shop_first/ShoppingSite/logo_normal.png" class="lazy shop-first-logo-img -mbxs -loaded">
            </span>
            <span class="osh-icon -ShoppingSite-local shop_local--logo -block -mbs -mts"></span>
            <div class="list -sizes" data-selected-sku="">
                <span class="js-link sku-size" data-href="http://www.WebSite.com/agu-bundle-of-2-sneakers-black-navy-blue-653884.html?size=41">41</span>
                <span class="js-link sku-size" data-href="http://www.WebSite.com/agu-bundle-of-2-sneakers-black-navy-blue-653884.html?size=42">42</span>
                <span class="js-link sku-size" data-href="http://www.WebSite.com/agu-bundle-of-2-sneakers-black-navy-blue-653884.html?size=43">43</span>
                <span class="js-link sku-size" data-href="http://www.WebSite.com/agu-bundle-of-2-sneakers-black-navy-blue-653884.html?size=44">44</span>
                <span class="js-link sku-size" data-href="http://www.WebSite.com/agu-bundle-of-2-sneakers-black-navy-blue-653884.html?size=45">45</span>
            </div>
        </a>
    </div>
    <div class="sku -gallery -validate-size " data-sku="LE047FA01SRK4NAFAMZ" ft-product-sizes="110,115,120,125,130,135" ft-product-color="Black">
        <a class="link" href="http://www.WebSite.com/leather-shop-genuine-leather-belt-black-712030.html">
            <div class="image-wrapper default-state">
                <img class="lazy image -loaded" alt="Genuine Leather Belt - Black" data-image-vertical="1" width="210" height="262" src="https://static.WebSite.com/p/leather-shop-1831-030217-1-catalog_grid_3.jpg" data-sku="LE047FA01SRK4NAFAMZ" data-src="https://static.WebSite.com/p/leather-shop-1831-030217-1-catalog_grid_3.jpg" data-placeholder="placeholder_m_1.jpg">
                <noscript><img src="https://static.WebSite.com/p/leather-shop-1831-030217-1-catalog_grid_3.jpg" width="210" height="262" class="image" /></noscript>
            </div>
            <h2 class="title"><span class="brand ">Leather Shop&nbsp;</span> <span class="name" dir="ltr">Genuine Leather Belt - Black</span></h2>
            <div class="price-container clearfix">
                <span class="sale-flag-percent">-29%</span>
                <span class="price-box">
                    <span class="price"><span data-currency-iso="EGP">EGP</span> <span dir="ltr" data-price="96">96</span> </span>
                    <span class="price -old"><span data-currency-iso="EGP">EGP</span> <span dir="ltr" data-price="135">135</span> </span>
                </span>
            </div>
            <div class="rating-stars">
                <div class="stars-container">
                    <div class="stars" style="width: 100%"></div>
                </div>
                <div class="total-ratings">(1)</div>
            </div>
            <span class="osh-icon -ShoppingSite-local shop_local--logo -block -mbs -mts"></span>
            <div class="list -sizes" data-selected-sku="">
                <span class="js-link sku-size" data-href="http://www.WebSite.com/leather-shop-genuine-leather-belt-black-712030.html?size=110">110</span>
                <span class="js-link sku-size"  data-href="http://www.WebSite.com/leather-shop-genuine-leather-belt-black-712030.html?size=115">115</span>
                <span class="js-link sku-size"  data-href="http://www.WebSite.com/leather-shop-genuine-leather-belt-black-712030.html?size=120">120</span>
                <span class="js-link sku-size"  data-href="http://www.WebSite.com/leather-shop-genuine-leather-belt-black-712030.html?size=125">125</span>
                <span class="js-link sku-size"  data-href="http://www.WebSite.com/leather-shop-genuine-leather-belt-black-712030.html?size=130">130</span>
                <span class="js-link sku-size"  data-href="http://www.WebSite.com/leather-shop-genuine-leather-belt-black-712030.html?size=135">135</span>
            </div>
        </a>
    </div>
</section>
HTML

Welches Produktmodell soll ich erstellen?

Erstellen Sie ein Produktmodell für diesen Inhalt. Bei der Arbeit mit Shopping-Website-Scraping müssen alle relevanten Produktdetails erfasst werden:

public class Product
{
    /// <summary>
    /// Gets or sets the name.
    /// </summary>
    /// <value>
    /// The name.
    /// </value>
    public string Name { get; set; }

    /// <summary>
    /// Gets or sets the price.
    /// </summary>
    /// <value>
    /// The price.
    /// </value>
    public string Price { get; set; }

    /// <summary>
    /// Gets or sets the image.
    /// </summary>
    /// <value>
    /// The image.
    /// </value>
    public string Image { get; set; }

    // Additional properties for comprehensive data collection
    public string Brand { get; set; }
    public string OldPrice { get; set; }
    public string Discount { get; set; }
    public float Rating { get; set; }
    public int ReviewCount { get; set; }
    public List<string> AvailableSizes { get; set; }
    public string ProductUrl { get; set; }
    public string SKU { get; set; }
    public DateTime ScrapedDate { get; set; }
}
public class Product
{
    /// <summary>
    /// Gets or sets the name.
    /// </summary>
    /// <value>
    /// The name.
    /// </value>
    public string Name { get; set; }

    /// <summary>
    /// Gets or sets the price.
    /// </summary>
    /// <value>
    /// The price.
    /// </value>
    public string Price { get; set; }

    /// <summary>
    /// Gets or sets the image.
    /// </summary>
    /// <value>
    /// The image.
    /// </value>
    public string Image { get; set; }

    // Additional properties for comprehensive data collection
    public string Brand { get; set; }
    public string OldPrice { get; set; }
    public string Discount { get; set; }
    public float Rating { get; set; }
    public int ReviewCount { get; set; }
    public List<string> AvailableSizes { get; set; }
    public string ProductUrl { get; set; }
    public string SKU { get; set; }
    public DateTime ScrapedDate { get; set; }
}
IRON VB CONVERTER ERROR developers@ironsoftware.com
$vbLabelText   $csharpLabel

Wie füge ich Produkt-Scraping-Funktionalität hinzu?

Fügen Sie zum Scrapen von Kategorieseiten eine neue Scrape-Methode mit Fehlerbehandlung und Datenvalidierung hinzu:

public void ParseCategory(Response response)
{
    // List of Products
    var productList = new List<Product>();

    // Iterate through product links in the product section
    foreach (var Links in response.Css("section.products > div > a"))
    {
        try
        {
            var product = new Product
            {
                Name = Links.Css("h2.title > span.name").First().InnerText,
                Brand = Links.Css("h2.title > span.brand").FirstOrDefault()?.InnerText ?? "Unknown",
                Price = Links.Css("div.price-container > span.price-box > span.price > span[data-price]").First().InnerText,
                Image = Links.Css("div.image-wrapper.default-state > img").First().Attributes["src"],
                ProductUrl = Links.Attributes["href"],
                SKU = Links.ParentNode.Attributes["data-sku"],
                ScrapedDate = DateTime.Now
            };

            // Extract old price if available
            var oldPriceElement = Links.Css("span.price.-old > span[data-price]").FirstOrDefault();
            if (oldPriceElement != null)
            {
                product.OldPrice = oldPriceElement.InnerText;
            }

            // Extract discount percentage
            var discountElement = Links.Css("span.sale-flag-percent").FirstOrDefault();
            if (discountElement != null)
            {
                product.Discount = discountElement.InnerText;
            }

            // Extract rating information
            var ratingWidth = Links.Css("div.stars").FirstOrDefault()?.Attributes["style"];
            if (!string.IsNullOrEmpty(ratingWidth))
            {
                var width = System.Text.RegularExpressions.Regex.Match(ratingWidth, @"(\d+)%").Groups[1].Value;
                if (int.TryParse(width, out int ratingPercent))
                {
                    product.Rating = ratingPercent / 20.0f; // Convert percentage to 5-star scale
                }
            }

            // Extract review count
            var reviewText = Links.Css("div.total-ratings").FirstOrDefault()?.InnerText;
            if (!string.IsNullOrEmpty(reviewText))
            {
                var reviewCount = System.Text.RegularExpressions.Regex.Match(reviewText, @"\d+").Value;
                if (int.TryParse(reviewCount, out int count))
                {
                    product.ReviewCount = count;
                }
            }

            // Extract available sizes
            product.AvailableSizes = Links.Css("div.list.-sizes > span.sku-size")
                .Select(s => s.InnerText)
                .ToList();

            productList.Add(product);
        }
        catch (Exception ex)
        {
            // Log error and continue with next product
            Console.WriteLine($"Error parsing product: {ex.Message}");
        }
    }

    // Save the scraped product data into a JSONL file.
    Scrape(productList, "Products.jsonl");

    // Handle pagination if needed
    var nextPageLink = response.Css("a.pagination-next").FirstOrDefault();
    if (nextPageLink != null)
    {
        var nextPageUrl = nextPageLink.Attributes["href"];
        this.Request(nextPageUrl, ParseCategory);
    }
}
public void ParseCategory(Response response)
{
    // List of Products
    var productList = new List<Product>();

    // Iterate through product links in the product section
    foreach (var Links in response.Css("section.products > div > a"))
    {
        try
        {
            var product = new Product
            {
                Name = Links.Css("h2.title > span.name").First().InnerText,
                Brand = Links.Css("h2.title > span.brand").FirstOrDefault()?.InnerText ?? "Unknown",
                Price = Links.Css("div.price-container > span.price-box > span.price > span[data-price]").First().InnerText,
                Image = Links.Css("div.image-wrapper.default-state > img").First().Attributes["src"],
                ProductUrl = Links.Attributes["href"],
                SKU = Links.ParentNode.Attributes["data-sku"],
                ScrapedDate = DateTime.Now
            };

            // Extract old price if available
            var oldPriceElement = Links.Css("span.price.-old > span[data-price]").FirstOrDefault();
            if (oldPriceElement != null)
            {
                product.OldPrice = oldPriceElement.InnerText;
            }

            // Extract discount percentage
            var discountElement = Links.Css("span.sale-flag-percent").FirstOrDefault();
            if (discountElement != null)
            {
                product.Discount = discountElement.InnerText;
            }

            // Extract rating information
            var ratingWidth = Links.Css("div.stars").FirstOrDefault()?.Attributes["style"];
            if (!string.IsNullOrEmpty(ratingWidth))
            {
                var width = System.Text.RegularExpressions.Regex.Match(ratingWidth, @"(\d+)%").Groups[1].Value;
                if (int.TryParse(width, out int ratingPercent))
                {
                    product.Rating = ratingPercent / 20.0f; // Convert percentage to 5-star scale
                }
            }

            // Extract review count
            var reviewText = Links.Css("div.total-ratings").FirstOrDefault()?.InnerText;
            if (!string.IsNullOrEmpty(reviewText))
            {
                var reviewCount = System.Text.RegularExpressions.Regex.Match(reviewText, @"\d+").Value;
                if (int.TryParse(reviewCount, out int count))
                {
                    product.ReviewCount = count;
                }
            }

            // Extract available sizes
            product.AvailableSizes = Links.Css("div.list.-sizes > span.sku-size")
                .Select(s => s.InnerText)
                .ToList();

            productList.Add(product);
        }
        catch (Exception ex)
        {
            // Log error and continue with next product
            Console.WriteLine($"Error parsing product: {ex.Message}");
        }
    }

    // Save the scraped product data into a JSONL file.
    Scrape(productList, "Products.jsonl");

    // Handle pagination if needed
    var nextPageLink = response.Css("a.pagination-next").FirstOrDefault();
    if (nextPageLink != null)
    {
        var nextPageUrl = nextPageLink.Attributes["href"];
        this.Request(nextPageUrl, ParseCategory);
    }
}
IRON VB CONVERTER ERROR developers@ironsoftware.com
$vbLabelText   $csharpLabel

Dieser umfassende Ansatz für das Scraping von Shopping-Websites stellt sicher, dass alle relevanten Produktinformationen erfasst werden und gleichzeitig Fehler elegant behandelt werden. Für fortgeschrittene Szenarien sollten Sie die erweiterten Web-Scraping-Funktionen in IronWebScraper kennenlernen.

Häufig gestellte Fragen

Wie extrahiere ich Produktdaten aus Einkaufswebsites in C#?

IronWebscraper erleichtert das Extrahieren von Produktdaten aus Shopping-Websites durch die Verwendung von CSS-Selektoren. Sie können eine WebScraper-Klasse erstellen, die Parse-Methode außer Kraft setzen und response.Css() verwenden, um bestimmte HTML-Elemente wie Produktnamen, Preise und Bilder auszuwählen. Die extrahierten Daten können in verschiedenen Formaten, einschließlich JSON- und JSONL-Dateien, gespeichert werden.

Was sind die grundlegenden Schritte zur Erstellung eines Scrapers für Shopping-Websites?

So erstellen Sie einen Shopping-Website-Scraper mit IronWebScraper 1) Erstellen Sie ein Console App-Projekt, 2) Fügen Sie eine Klasse hinzu, die von WebScraper erbt, 3) Erstellen Sie Datenmodelle für Kategorien und Produkte, 4) Überschreiben Sie die Init()-Methode, um Ihre Start-URL festzulegen, 5) Überschreiben Sie die Parse()-Methode, um Daten mithilfe von CSS-Selektoren zu extrahieren, und 6) Führen Sie den Scraper aus, um Daten in Ihrem bevorzugten Format zu speichern.

Wie kann ich hierarchische Kategoriestrukturen beim Scrapen von E-Commerce-Websites behandeln?

IronWebscraper ermöglicht es Ihnen, hierarchische Strukturen zu handhaben, indem Sie geeignete Datenmodelle erstellen, die die Eltern-Kind-Beziehungen widerspiegeln (z. B. Mode > Männer > Schuhe). Sie können mit CSS-Selektoren durch verschachtelte HTML-Elemente navigieren und Ihre Kategorie-Baumstruktur programmatisch aufbauen, was besonders nützlich ist, wenn Sie mit den erweiterten Funktionen von IronWebscraper arbeiten.

Wie lässt sich die HTML-Struktur einer Shopping-Website vor dem Scraping am besten analysieren?

Bevor Sie IronWebscraper zum Scrapen einer Shopping-Site verwenden, sollten Sie die HTML-Struktur mit Hilfe von Browser-Entwickler-Tools untersuchen. Suchen Sie nach konsistenten Mustern in CSS-Klassen und Elementhierarchien. Diese Analyse hilft Ihnen, die richtigen CSS-Selektoren zu identifizieren, die Sie in Ihrer IronWebScraper Parse()-Methode verwenden können, um Produktinformationen, Kategorien und andere Datenelemente genau zu erfassen.

Kann ich sowohl die Produktlisten als auch die Kategorienavigation von derselben Seite extrahieren?

Ja, IronWebscraper ermöglicht es Ihnen, mehrere Datentypen aus einer einzigen Seite zu extrahieren. In Ihrer Parse()-Methode können Sie verschiedene CSS-Selektoren verwenden, um Kategorie-Links (z.B. '.category-item') und Produktlisten (z.B. '.product-item') gleichzeitig anzusprechen und sie dann in separaten Ausgabedateien oder Datenstrukturen zu speichern.

Wie speichere ich gescrapte Produktdaten in einer Datei?

IronWebscraper bietet eine integrierte Scrape()-Methode, die die extrahierten Daten automatisch speichert. Übergeben Sie einfach Ihr Datenobjekt und den Dateinamen an Scrape(item, "products.jsonl"). Die Bibliothek unterstützt verschiedene Ausgabeformate, darunter JSON, JSONL und CSV, so dass Sie Ihre gescrapten E-Commerce-Daten problemlos zur weiteren Verarbeitung exportieren können.

Curtis Chau
Technischer Autor

Curtis Chau hat einen Bachelor-Abschluss in Informatik von der Carleton University und ist spezialisiert auf Frontend-Entwicklung mit Expertise in Node.js, TypeScript, JavaScript und React. Leidenschaftlich widmet er sich der Erstellung intuitiver und ästhetisch ansprechender Benutzerschnittstellen und arbeitet gerne mit modernen Frameworks sowie der Erstellung gut strukturierter, optisch ansprechender ...

Weiterlesen
Bereit anzufangen?
Nuget Downloads 126,948 | Version: 2025.12 gerade veröffentlicht