使用 C# 和 IronWebScraper 抓取線上電影網站
IronWebscraper 透過解析 HTML 元素、建立用於儲存結構化資料的類型化物件,並利用元資料在頁面間導航,從網站中擷取電影資料,藉此建立完整的電影資訊資料集。 此 C# 網頁擷取庫能簡化將非結構化網頁內容轉換為有組織且可分析的資料的過程。
快速入門:使用 C# 抓取電影資料
- 透過 NuGet 套件管理員安裝
IronWebScraper - 建立一個繼承自 `` 的類別
- 覆寫 `` 以設定授權並請求目標 URL
- 覆寫 `` 以使用 CSS 選擇器擷取電影資料
- 使用 `` 方法將資料儲存為 JSON 格式
-
using NuGet 套件管理員安裝 https://www.nuget.org/packages/IronWebScraper
PM > Install-Package IronWebScraper -
請複製並執行此程式碼片段。
using IronWebScraper; using System; public class QuickstartMovieScraper : WebScraper { public override void Init() { // Set your license key License.LicenseKey = "YOUR-LICENSE-KEY"; // Configure scraper settings this.LoggingLevel = LogLevel.All; this.WorkingDirectory = @"C:\MovieData\Output\"; // Start scraping from the homepage this.Request("https://example-movie-site.com", Parse); } public override void Parse(Response response) { // Extract movie titles using CSS selectors foreach (var movieDiv in response.Css(".movie-item")) { var title = movieDiv.Css("h2")[0].TextContentClean; var url = movieDiv.Css("a")[0].Attributes["href"]; // Save the scraped data Scrape(new { Title = title, Url = url }, "movies.json"); } } } // Run the scraper var scraper = new QuickstartMovieScraper(); scraper.Start(); -
部署至您的生產環境進行測試
立即透過免費試用,在您的專案中開始使用 IronWebScraper
如何建立電影資料擷取類別?
請以實際網站範例作為開頭。 我們將運用《C# 網頁擷取》教學中所述的技術,對一個電影網站進行資料擷取。
新增一個類別並將其命名為 ``:
建立專用的爬蟲類別有助於整理程式碼,並使其具備可重複使用的特性。 此方法遵循物件導向原則,讓您日後能輕鬆擴充功能。
目標網站的結構是怎樣的?
檢視網站結構以進行擷取。 理解網站的結構對於有效的網頁抓取至關重要。 與我們關於"從線上電影網站抓取資料"的指南類似,請先分析 HTML 結構:
哪些 HTML 元素包含影片資料?
這是網站首頁 HTML 中的部分內容。檢視 HTML 結構有助於識別應使用的正確 CSS 選擇器:
<div id="movie-featured" class="movies-list movies-list-full tab-pane in fade active">
<div data-movie-id="20746" class="ml-item">
<a href="https://website.com/film/king-arthur-legend-of-the-sword-20746/">
<span class="mli-quality">CAM</span>
<img data-original="https://img.gocdn.online/2017/05/16/poster/2116d6719c710eabe83b377463230fbe-king-arthur-legend-of-the-sword.jpg"
class="lazy thumb mli-thumb" alt="King Arthur: Legend of the Sword"
src="https://img.gocdn.online/2017/05/16/poster/2116d6719c710eabe83b377463230fbe-king-arthur-legend-of-the-sword.jpg"
style="display: inline-block;">
<span class="mli-info"><h2>King Arthur: Legend of the Sword</h2></span>
</a>
</div>
<div data-movie-id="20724" class="ml-item">
<a href="https://website.com/film/snatched-20724/">
<span class="mli-quality">CAM</span>
<img data-original="https://img.gocdn.online/2017/05/16/poster/5ef66403dc331009bdb5aa37cfe819ba-snatched.jpg"
class="lazy thumb mli-thumb" alt="Snatched"
src="https://img.gocdn.online/2017/05/16/poster/5ef66403dc331009bdb5aa37cfe819ba-snatched.jpg"
style="display: inline-block;">
<span class="mli-info"><h2>Snatched</h2></span>
</a>
</div>
</div>
<div id="movie-featured" class="movies-list movies-list-full tab-pane in fade active">
<div data-movie-id="20746" class="ml-item">
<a href="https://website.com/film/king-arthur-legend-of-the-sword-20746/">
<span class="mli-quality">CAM</span>
<img data-original="https://img.gocdn.online/2017/05/16/poster/2116d6719c710eabe83b377463230fbe-king-arthur-legend-of-the-sword.jpg"
class="lazy thumb mli-thumb" alt="King Arthur: Legend of the Sword"
src="https://img.gocdn.online/2017/05/16/poster/2116d6719c710eabe83b377463230fbe-king-arthur-legend-of-the-sword.jpg"
style="display: inline-block;">
<span class="mli-info"><h2>King Arthur: Legend of the Sword</h2></span>
</a>
</div>
<div data-movie-id="20724" class="ml-item">
<a href="https://website.com/film/snatched-20724/">
<span class="mli-quality">CAM</span>
<img data-original="https://img.gocdn.online/2017/05/16/poster/5ef66403dc331009bdb5aa37cfe819ba-snatched.jpg"
class="lazy thumb mli-thumb" alt="Snatched"
src="https://img.gocdn.online/2017/05/16/poster/5ef66403dc331009bdb5aa37cfe819ba-snatched.jpg"
style="display: inline-block;">
<span class="mli-info"><h2>Snatched</h2></span>
</a>
</div>
</div>
我們有電影 ID、片名以及詳細頁面連結。 每個影片皆包含於 標籤中,並具有 類別,同時包含用於識別的唯一 `` 屬性。
如何實作基本的影片擷取?
開始抓取此資料集。 在執行任何擷取工具之前,請確保您已按照以下示例正確設定授權金鑰:
public class MovieScraper : WebScraper
{
public override void Init()
{
// Initialize scraper settings
License.LicenseKey = "LicenseKey";
this.LoggingLevel = WebScraper.LogLevel.All;
this.WorkingDirectory = AppSetting.GetAppRoot() + @"\MovieSample\Output\";
// Request homepage content for scraping
this.Request("www.website.com", Parse);
}
public override void Parse(Response response)
{
// Iterate over each movie div within the featured movie section
foreach (var div in response.Css("#movie-featured > div"))
{
if (div.Attributes["class"] != "clearfix")
{
var movieId = Convert.ToInt32(div.GetAttribute("data-movie-id"));
var link = div.Css("a")[0];
var movieTitle = link.TextContentClean;
// Scrape and store movie data as key-value pairs
Scrape(new ScrapedData()
{
{ "MovieId", movieId },
{ "MovieTitle", movieTitle }
}, "Movie.Jsonl");
}
}
}
}
public class MovieScraper : WebScraper
{
public override void Init()
{
// Initialize scraper settings
License.LicenseKey = "LicenseKey";
this.LoggingLevel = WebScraper.LogLevel.All;
this.WorkingDirectory = AppSetting.GetAppRoot() + @"\MovieSample\Output\";
// Request homepage content for scraping
this.Request("www.website.com", Parse);
}
public override void Parse(Response response)
{
// Iterate over each movie div within the featured movie section
foreach (var div in response.Css("#movie-featured > div"))
{
if (div.Attributes["class"] != "clearfix")
{
var movieId = Convert.ToInt32(div.GetAttribute("data-movie-id"));
var link = div.Css("a")[0];
var movieTitle = link.TextContentClean;
// Scrape and store movie data as key-value pairs
Scrape(new ScrapedData()
{
{ "MovieId", movieId },
{ "MovieTitle", movieTitle }
}, "Movie.Jsonl");
}
}
}
}
Public Class MovieScraper
Inherits WebScraper
Public Overrides Sub Init()
' Initialize scraper settings
License.LicenseKey = "LicenseKey"
Me.LoggingLevel = WebScraper.LogLevel.All
Me.WorkingDirectory = AppSetting.GetAppRoot() & "\MovieSample\Output\"
' Request homepage content for scraping
Me.Request("www.website.com", AddressOf Parse)
End Sub
Public Overrides Sub Parse(ByVal response As Response)
' Iterate over each movie div within the featured movie section
For Each div In response.Css("#movie-featured > div")
If div.Attributes("class") <> "clearfix" Then
Dim movieId = Convert.ToInt32(div.GetAttribute("data-movie-id"))
Dim link = div.Css("a")(0)
Dim movieTitle = link.TextContentClean
' Scrape and store movie data as key-value pairs
Scrape(New ScrapedData() From {
{ "MovieId", movieId },
{ "MovieTitle", movieTitle }
},
"Movie.Jsonl")
End If
Next div
End Sub
End Class
工作目錄屬性有何用途?
這段程式碼有什麼新功能?
Working Directory 屬性用於設定所有抓取資料及相關檔案的主要工作目錄。 這確保所有輸出檔案皆集中存放於單一位置,使管理大型抓取專案更加容易。 若該目錄不存在,系統將自動建立。
何時該使用 CSS 選擇器,何時該使用屬性?
其他注意事項:
當需要根據元素的結構位置或類別名稱來鎖定目標時,CSS 選擇器是理想選擇;而若要擷取 ID 或自訂資料屬性等特定值,則直接存取屬性會更為合適。 在此範例中,我們使用 CSS 選擇器 (#movie-featured > div) 來遍歷 DOM 結構,並利用屬性 (``) 來擷取特定值。
如何為抓取的資料建立類型化物件?
建立具型別的物件,以格式化物件的形式儲存抓取的資料。 使用強類型物件可提供更佳的程式碼組織、IntelliSense 支援以及編譯時類型檢查。
實作一個 `` 類別,用以存放格式化的資料:
public class Movie
{
public int Id { get; set; }
public string Title { get; set; }
public string URL { get; set; }
}
public class Movie
{
public int Id { get; set; }
public string Title { get; set; }
public string URL { get; set; }
}
Public Class Movie
Public Property Id As Integer
Public Property Title As String
Public Property URL As String
End Class
使用類型化物件如何改善資料組織?
請更新程式碼,改為使用具型別的 類別,取代通用的 字典:
public class MovieScraper : WebScraper
{
public override void Init()
{
// Initialize scraper settings
License.LicenseKey = "LicenseKey";
this.LoggingLevel = WebScraper.LogLevel.All;
this.WorkingDirectory = AppSetting.GetAppRoot() + @"\MovieSample\Output\";
// Request homepage content for scraping
this.Request("https://website.com/", Parse);
}
public override void Parse(Response response)
{
// Iterate over each movie div within the featured movie section
foreach (var div in response.Css("#movie-featured > div"))
{
if (div.Attributes["class"] != "clearfix")
{
var movie = new Movie
{
Id = Convert.ToInt32(div.GetAttribute("data-movie-id"))
};
var link = div.Css("a")[0];
movie.Title = link.TextContentClean;
movie.URL = link.Attributes["href"];
// Scrape and store movie object
Scrape(movie, "Movie.Jsonl");
}
}
}
}
public class MovieScraper : WebScraper
{
public override void Init()
{
// Initialize scraper settings
License.LicenseKey = "LicenseKey";
this.LoggingLevel = WebScraper.LogLevel.All;
this.WorkingDirectory = AppSetting.GetAppRoot() + @"\MovieSample\Output\";
// Request homepage content for scraping
this.Request("https://website.com/", Parse);
}
public override void Parse(Response response)
{
// Iterate over each movie div within the featured movie section
foreach (var div in response.Css("#movie-featured > div"))
{
if (div.Attributes["class"] != "clearfix")
{
var movie = new Movie
{
Id = Convert.ToInt32(div.GetAttribute("data-movie-id"))
};
var link = div.Css("a")[0];
movie.Title = link.TextContentClean;
movie.URL = link.Attributes["href"];
// Scrape and store movie object
Scrape(movie, "Movie.Jsonl");
}
}
}
}
Public Class MovieScraper
Inherits WebScraper
Public Overrides Sub Init()
' Initialize scraper settings
License.LicenseKey = "LicenseKey"
Me.LoggingLevel = WebScraper.LogLevel.All
Me.WorkingDirectory = AppSetting.GetAppRoot() & "\MovieSample\Output\"
' Request homepage content for scraping
Me.Request("https://website.com/", AddressOf Parse)
End Sub
Public Overrides Sub Parse(ByVal response As Response)
' Iterate over each movie div within the featured movie section
For Each div In response.Css("#movie-featured > div")
If div.Attributes("class") <> "clearfix" Then
Dim movie As New Movie With {.Id = Convert.ToInt32(div.GetAttribute("data-movie-id"))}
Dim link = div.Css("a")(0)
movie.Title = link.TextContentClean
movie.URL = link.Attributes("href")
' Scrape and store movie object
Scrape(movie, "Movie.Jsonl")
End If
Next div
End Sub
End Class
Scrape 方法對類型化物件使用何種格式?
有哪些新功能?
- 我們實作了 `` 類別來存放抓取的資料,以提供類型安全性並改善程式碼組織。
- 我們將電影物件傳遞給 `` 方法,該方法能解析我們的格式,並依照下圖所示的方式進行儲存:
輸出結果會自動序列化為 JSON 格式,便於匯入資料庫或其他應用程式。
如何擷取詳細的電影頁面?
開始抓取更詳細的頁面。 多頁抓取是常見的需求,而 IronWebScraper 透過其請求鏈結機制,讓此操作變得簡單直觀。
我還能從詳細資訊頁面中擷取哪些額外資料?
電影頁面呈現如下,包含每部電影的豐富元資料:
<div class="mvi-content">
<div class="thumb mvic-thumb"
style="background-image: url(https://img.gocdn.online/2017/04/28/poster/5a08e94ba02118f22dc30f298c603210-guardians-of-the-galaxy-vol-2.jpg);"></div>
<div class="mvic-desc">
<h3>Guardians of the Galaxy Vol. 2</h3>
<div class="desc">
Set to the backdrop of Awesome Mixtape #2, Marvel's Guardians of the Galaxy Vol. 2 continues the team's adventures as they travel throughout the cosmos to help Peter Quill learn more about his true parentage.
</div>
<div class="mvic-info">
<div class="mvici-left">
<p>
<strong>Genre: </strong>
<a href="https://Domain/genre/action/" title="Action">Action</a>,
<a href="https://Domain/genre/adventure/" title="Adventure">Adventure</a>,
<a href="https://Domain/genre/sci-fi/" title="Sci-Fi">Sci-Fi</a>
</p>
<p>
<strong>Actor: </strong>
<a target="_blank" href="https://Domain/actor/chris-pratt" title="Chris Pratt">Chris Pratt</a>,
<a target="_blank" href="https://Domain/actor/-zoe-saldana" title="Zoe Saldana">Zoe Saldana</a>,
<a target="_blank" href="https://Domain/actor/-dave-bautista-" title="Dave Bautista">Dave Bautista</a>
</p>
<p>
<strong>Director: </strong>
<a href="#" title="James Gunn">James Gunn</a>
</p>
<p>
<strong>Country: </strong>
<a href="https://Domain/country/us" title="United States">United States</a>
</p>
</div>
<div class="mvici-right">
<p><strong>Duration:</strong> 136 min</p>
<p><strong>Quality:</strong> <span class="quality">CAM</span></p>
<p><strong>Release:</strong> 2017</p>
<p><strong>IMDb:</strong> 8.3</p>
</div>
<div class="clearfix"></div>
</div>
<div class="clearfix"></div>
</div>
<div class="clearfix"></div>
</div>
<div class="mvi-content">
<div class="thumb mvic-thumb"
style="background-image: url(https://img.gocdn.online/2017/04/28/poster/5a08e94ba02118f22dc30f298c603210-guardians-of-the-galaxy-vol-2.jpg);"></div>
<div class="mvic-desc">
<h3>Guardians of the Galaxy Vol. 2</h3>
<div class="desc">
Set to the backdrop of Awesome Mixtape #2, Marvel's Guardians of the Galaxy Vol. 2 continues the team's adventures as they travel throughout the cosmos to help Peter Quill learn more about his true parentage.
</div>
<div class="mvic-info">
<div class="mvici-left">
<p>
<strong>Genre: </strong>
<a href="https://Domain/genre/action/" title="Action">Action</a>,
<a href="https://Domain/genre/adventure/" title="Adventure">Adventure</a>,
<a href="https://Domain/genre/sci-fi/" title="Sci-Fi">Sci-Fi</a>
</p>
<p>
<strong>Actor: </strong>
<a target="_blank" href="https://Domain/actor/chris-pratt" title="Chris Pratt">Chris Pratt</a>,
<a target="_blank" href="https://Domain/actor/-zoe-saldana" title="Zoe Saldana">Zoe Saldana</a>,
<a target="_blank" href="https://Domain/actor/-dave-bautista-" title="Dave Bautista">Dave Bautista</a>
</p>
<p>
<strong>Director: </strong>
<a href="#" title="James Gunn">James Gunn</a>
</p>
<p>
<strong>Country: </strong>
<a href="https://Domain/country/us" title="United States">United States</a>
</p>
</div>
<div class="mvici-right">
<p><strong>Duration:</strong> 136 min</p>
<p><strong>Quality:</strong> <span class="quality">CAM</span></p>
<p><strong>Release:</strong> 2017</p>
<p><strong>IMDb:</strong> 8.3</p>
</div>
<div class="clearfix"></div>
</div>
<div class="clearfix"></div>
</div>
<div class="clearfix"></div>
</div>
我該如何擴展我的 Movie 類別以新增屬性?
為 類別新增屬性 (, ,, IMDb 評分) 但僅使用 ,以及 作為此範例。 使用 @@--CODE-495--string>@@Liststring>@@--CODE-49
using System.Co/llections.Generic;
public class Movie
{
public int Id { get; set; }
public string Title { get; set; }
public string URL { get; set; }
public string Description { get; set; }
public List<string> Genre { get; set; }
public List<string> Actor { get; set; }
}
using System.Co/llections.Generic;
public class Movie
{
public int Id { get; set; }
public string Title { get; set; }
public string URL { get; set; }
public string Description { get; set; }
public List<string> Genre { get; set; }
public List<string> Actor { get; set; }
}
Imports System.Collections.Generic
Public Class Movie
Public Property Id As Integer
Public Property Title As String
Public Property URL As String
Public Property Description As String
Public Property Genre As List(Of String)
Public Property Actor As List(Of String)
End Class
在進行資料擷取時,該如何在頁面之間切換?
請前往詳細頁面進行擷取。 IronWebScraper 會自動處理執行緒安全性,允許同時處理多個網頁。
為何針對不同頁面類型使用多種解析函式?
IronWebScraper 允許加入多個抓取函式,以處理不同格式的網頁。 這種關注點的分離能讓您的程式碼更易於維護,並能妥善處理不同的頁面結構。 每個解析函式可專注於從特定類型的網頁中擷取資料。
元資料如何協助在解析函式之間傳遞物件?
MetaData 功能對於在請求之間維持狀態至關重要。 如需更進階的WEBSCRAPER功能,請參閱我們的詳細指南:
public class MovieScraper : WebScraper
{
public override void Init()
{
// Initialize scraper settings
License.LicenseKey = "LicenseKey";
this.LoggingLevel = WebScraper.LogLevel.All;
this.WorkingDirectory = AppSetting.GetAppRoot() + @"\MovieSample\Output\";
// Request homepage content for scraping
this.Request("https://domain/", Parse);
}
public override void Parse(Response response)
{
// Iterate over each movie div within the featured movie section
foreach (var div in response.Css("#movie-featured > div"))
{
if (div.Attributes["class"] != "clearfix")
{
var movie = new Movie
{
Id = Convert.ToInt32(div.GetAttribute("data-movie-id"))
};
var link = div.Css("a")[0];
movie.Title = link.TextContentClean;
movie.URL = link.Attributes["href"];
// Request detailed page
this.Request(movie.URL, ParseDetails, new MetaData() { { "movie", movie } });
}
}
}
public void ParseDetails(Response response)
{
// Retrieve movie object from metadata
var movie = response.MetaData.Get<Movie>("movie");
var div = response.Css("div.mvic-desc")[0];
// Extract description
movie.Description = div.Css("div.desc")[0].TextContentClean;
// Extract genres
movie.Genre = new List<string>(); // Initialize genre list
foreach(var genre in div.Css("div > p > a"))
{
movie.Genre.Add(genre.TextContentClean);
}
// Extract actors
movie.Actor = new List<string>(); // Initialize actor list
foreach (var actor in div.Css("div > p:nth-child(2) > a"))
{
movie.Actor.Add(actor.TextContentClean);
}
// Scrape and store detailed movie data
Scrape(movie, "Movie.Jsonl");
}
}
public class MovieScraper : WebScraper
{
public override void Init()
{
// Initialize scraper settings
License.LicenseKey = "LicenseKey";
this.LoggingLevel = WebScraper.LogLevel.All;
this.WorkingDirectory = AppSetting.GetAppRoot() + @"\MovieSample\Output\";
// Request homepage content for scraping
this.Request("https://domain/", Parse);
}
public override void Parse(Response response)
{
// Iterate over each movie div within the featured movie section
foreach (var div in response.Css("#movie-featured > div"))
{
if (div.Attributes["class"] != "clearfix")
{
var movie = new Movie
{
Id = Convert.ToInt32(div.GetAttribute("data-movie-id"))
};
var link = div.Css("a")[0];
movie.Title = link.TextContentClean;
movie.URL = link.Attributes["href"];
// Request detailed page
this.Request(movie.URL, ParseDetails, new MetaData() { { "movie", movie } });
}
}
}
public void ParseDetails(Response response)
{
// Retrieve movie object from metadata
var movie = response.MetaData.Get<Movie>("movie");
var div = response.Css("div.mvic-desc")[0];
// Extract description
movie.Description = div.Css("div.desc")[0].TextContentClean;
// Extract genres
movie.Genre = new List<string>(); // Initialize genre list
foreach(var genre in div.Css("div > p > a"))
{
movie.Genre.Add(genre.TextContentClean);
}
// Extract actors
movie.Actor = new List<string>(); // Initialize actor list
foreach (var actor in div.Css("div > p:nth-child(2) > a"))
{
movie.Actor.Add(actor.TextContentClean);
}
// Scrape and store detailed movie data
Scrape(movie, "Movie.Jsonl");
}
}
Public Class MovieScraper
Inherits WebScraper
Public Overrides Sub Init()
' Initialize scraper settings
License.LicenseKey = "LicenseKey"
Me.LoggingLevel = WebScraper.LogLevel.All
Me.WorkingDirectory = AppSetting.GetAppRoot() & "\MovieSample\Output\"
' Request homepage content for scraping
Me.Request("https://domain/", AddressOf Parse)
End Sub
Public Overrides Sub Parse(ByVal response As Response)
' Iterate over each movie div within the featured movie section
For Each div In response.Css("#movie-featured > div")
If div.Attributes("class") <> "clearfix" Then
Dim movie As New Movie With {.Id = Convert.ToInt32(div.GetAttribute("data-movie-id"))}
Dim link = div.Css("a")(0)
movie.Title = link.TextContentClean
movie.URL = link.Attributes("href")
' Request detailed page
Me.Request(movie.URL, AddressOf ParseDetails, New MetaData() From {
{ "movie", movie }
})
End If
Next div
End Sub
Public Sub ParseDetails(ByVal response As Response)
' Retrieve movie object from metadata
Dim movie = response.MetaData.Get(Of Movie)("movie")
Dim div = response.Css("div.mvic-desc")(0)
' Extract description
movie.Description = div.Css("div.desc")(0).TextContentClean
' Extract genres
movie.Genre = New List(Of String)() ' Initialize genre list
For Each genre In div.Css("div > p > a")
movie.Genre.Add(genre.TextContentClean)
Next genre
' Extract actors
movie.Actor = New List(Of String)() ' Initialize actor list
For Each actor In div.Css("div > p:nth-child(2) > a")
movie.Actor.Add(actor.TextContentClean)
Next actor
' Scrape and store detailed movie data
Scrape(movie, "Movie.Jsonl")
End Sub
End Class
此多頁抓取方法的主要特點為何?
有哪些新功能?
- 新增抓取功能(例如:``),用於抓取詳細頁面,類似於從購物網站抓取資料時所採用的技術。
- 將負責產生檔案的 `` 函式移至新函式中,確保僅在所有資料收集完畢後才儲存資料。
- 使用 IronWebScraper 的功能 (``),將電影物件傳遞給新的抓取函式,並在跨請求時維持物件狀態。
- 抓取網頁並將影片物件資料儲存至檔案中,同時保留完整資訊。
如需瞭解可用方法與屬性的更多資訊,請參閱 API 參考手冊。 IronWebscraper 提供了一個強大的框架,用於從網站中提取結構化資料,使其成為資料收集與分析專案中不可或缺的工具。
常見問題
如何使用 C# 從 HTML 中擷取電影名稱?
IronWebScraper 提供 CSS 選擇器方法,用於從 HTML 中提取電影標題。請使用 response.Css() 方法搭配適當的選擇器(例如 '.movie-item h2')來鎖定標題元素,然後透過 TextContentClean 屬性取得純文字值。
在多個電影頁面之間切換的最佳方式是什麼?
IronWebScraper 透過 Request() 方法處理頁面導航。您可以使用 CSS 選擇器提取分頁連結,然後針對每個 URL 呼叫 Request() 來抓取多頁面的資料,從而自動建立完整的電影資料集。
如何將抓取的電影資料以結構化格式儲存?
using IronWebScraper 的 Scrape() 方法將資料儲存為 JSON 格式。建立包含電影屬性(如片名、網址和評分)的匿名物件或類型化類別,然後將其連同檔案名稱一併傳遞給 Scrape(),系統便會自動序列化並儲存資料。
我該使用哪些 CSS 選擇器來擷取電影資訊?
IronWebScraper 支援標準 CSS 選擇器。針對電影網站,請使用如 '.movie-item' 作為容器選擇器、'h2' 作為標題選擇器、'a[href]' 作為連結選擇器,並使用特定類別名稱來識別評分或類型。Css() 方法會傳回可供迭代的集合。
在抓取的資料中,該如何處理「CAM」這類電影畫質標示?
IronWebscraper 允許您透過鎖定特定的 HTML 元素,來擷取並處理品質指標。使用 CSS 選擇器定位品質標章或文字,然後將其作為屬性納入您擷取的資料物件中,以獲得完整的電影資訊。
我可以為我的電影資料抓取操作設定日誌記錄嗎?
是的,IronWebScraper 內建日誌記錄功能。請在 Init() 方法中將 LoggingLevel 屬性設定為 LogLevel.All,以追蹤所有抓取活動、錯誤及進度,這有助於除錯並監控您的影片資料擷取作業。
如何正確設定爬取資料的工作目錄?
IronWebscraper 允許您在 Init() 方法中設定 WorkingDirectory 屬性。請指定如 'C:\MovieData\Output\' 這樣的路徑,用以儲存抓取的電影資料檔案。此舉可集中管理輸出檔案,並使您的資料保持井然有序。
如何正確地繼承 WEBSCRAPER 類別?
建立一個繼承自 IronWebScraper 的 WEBSCRAPER 基類的新類別。覆寫 Init() 方法以進行設定,並覆寫 Parse() 方法以實作資料擷取邏輯。這種物件導向的方法能讓您的電影擷取工具具備可重複使用與易維護的特性。






