IronWebScraper 如何使用 抓取線上電影網站 使用 C# 和 IronWebScraper 抓取線上電影網站 Darrius Serrant 更新:6月 10, 2025 下載 IronWebScraper NuGet 下載 DLL 下載 開始免費試用 法學碩士副本 法學碩士副本 將頁面複製為 Markdown 格式,用於 LLMs 在 ChatGPT 中打開 請向 ChatGPT 諮詢此頁面 在雙子座打開 請向 Gemini 詢問此頁面 在雙子座打開 請向 Gemini 詢問此頁面 打開困惑 向 Perplexity 詢問有關此頁面的信息 分享 在 Facebook 上分享 分享到 X(Twitter) 在 LinkedIn 上分享 複製連結 電子郵件文章 This article was translated from English: Does it need improvement? Translated View the article in English 讓我們從另一個真實網站的例子開始。我們選擇抓取一個電影網站。 讓我們新增一個新類,並將其命名為"MovieScraper": 現在我們來看看我們將要抓取資料的網站: 這是我們在網站首頁看到的HTML程式碼的一部分: <div id="movie-featured" class="movies-list movies-list-full tab-pane in fade active"> <div data-movie-id="20746" class="ml-item"> <a href="https://website.com/film/king-arthur-legend-of-the-sword-20746/"> <span class="mli-quality">CAM</span> <img data-original="https://img.gocdn.online/2017/05/16/poster/2116d6719c710eabe83b377463230fbe-king-arthur-legend-of-the-sword.jpg" class="lazy thumb mli-thumb" alt="King Arthur: Legend of the Sword" src="https://img.gocdn.online/2017/05/16/poster/2116d6719c710eabe83b377463230fbe-king-arthur-legend-of-the-sword.jpg" style="display: inline-block;"> <span class="mli-info"><h2>King Arthur: Legend of the Sword</h2></span> </a> </div> <div data-movie-id="20724" class="ml-item"> <a href="https://website.com/film/snatched-20724/"> <span class="mli-quality">CAM</span> <img data-original="https://img.gocdn.online/2017/05/16/poster/5ef66403dc331009bdb5aa37cfe819ba-snatched.jpg" class="lazy thumb mli-thumb" alt="Snatched" src="https://img.gocdn.online/2017/05/16/poster/5ef66403dc331009bdb5aa37cfe819ba-snatched.jpg" style="display: inline-block;"> <span class="mli-info"><h2>Snatched</h2></span> </a> </div> </div> <div id="movie-featured" class="movies-list movies-list-full tab-pane in fade active"> <div data-movie-id="20746" class="ml-item"> <a href="https://website.com/film/king-arthur-legend-of-the-sword-20746/"> <span class="mli-quality">CAM</span> <img data-original="https://img.gocdn.online/2017/05/16/poster/2116d6719c710eabe83b377463230fbe-king-arthur-legend-of-the-sword.jpg" class="lazy thumb mli-thumb" alt="King Arthur: Legend of the Sword" src="https://img.gocdn.online/2017/05/16/poster/2116d6719c710eabe83b377463230fbe-king-arthur-legend-of-the-sword.jpg" style="display: inline-block;"> <span class="mli-info"><h2>King Arthur: Legend of the Sword</h2></span> </a> </div> <div data-movie-id="20724" class="ml-item"> <a href="https://website.com/film/snatched-20724/"> <span class="mli-quality">CAM</span> <img data-original="https://img.gocdn.online/2017/05/16/poster/5ef66403dc331009bdb5aa37cfe819ba-snatched.jpg" class="lazy thumb mli-thumb" alt="Snatched" src="https://img.gocdn.online/2017/05/16/poster/5ef66403dc331009bdb5aa37cfe819ba-snatched.jpg" style="display: inline-block;"> <span class="mli-info"><h2>Snatched</h2></span> </a> </div> </div> HTML 我們可以看到,我們有電影 ID、標題和指向詳細頁面的連結。 讓我們開始抓取這組資料: public class MovieScraper : WebScraper { public override void Init() { // Initialize scraper settings License.LicenseKey = "LicenseKey"; this.LoggingLevel = WebScraper.LogLevel.All; this.WorkingDirectory = AppSetting.GetAppRoot() + @"\MovieSample\Output\"; // Request homepage content for scraping this.Request("www.website.com", Parse); } public override void Parse(Response response) { // Iterate over each movie div within the featured movie section foreach (var div in response.Css("#movie-featured > div")) { if (div.Attributes["class"] != "clearfix") { var movieId = Convert.ToInt32(div.GetAttribute("data-movie-id")); var link = div.Css("a")[0]; var movieTitle = link.TextContentClean; // Scrape and store movie data as key-value pairs Scrape(new ScrapedData() { { "MovieId", movieId }, { "MovieTitle", movieTitle } }, "Movie.Jsonl"); } } } } public class MovieScraper : WebScraper { public override void Init() { // Initialize scraper settings License.LicenseKey = "LicenseKey"; this.LoggingLevel = WebScraper.LogLevel.All; this.WorkingDirectory = AppSetting.GetAppRoot() + @"\MovieSample\Output\"; // Request homepage content for scraping this.Request("www.website.com", Parse); } public override void Parse(Response response) { // Iterate over each movie div within the featured movie section foreach (var div in response.Css("#movie-featured > div")) { if (div.Attributes["class"] != "clearfix") { var movieId = Convert.ToInt32(div.GetAttribute("data-movie-id")); var link = div.Css("a")[0]; var movieTitle = link.TextContentClean; // Scrape and store movie data as key-value pairs Scrape(new ScrapedData() { { "MovieId", movieId }, { "MovieTitle", movieTitle } }, "Movie.Jsonl"); } } } } Public Class MovieScraper Inherits WebScraper Public Overrides Sub Init() ' Initialize scraper settings License.LicenseKey = "LicenseKey" Me.LoggingLevel = WebScraper.LogLevel.All Me.WorkingDirectory = AppSetting.GetAppRoot() & "\MovieSample\Output\" ' Request homepage content for scraping Me.Request("www.website.com", AddressOf Parse) End Sub Public Overrides Sub Parse(ByVal response As Response) ' Iterate over each movie div within the featured movie section For Each div In response.Css("#movie-featured > div") If div.Attributes("class") <> "clearfix" Then Dim movieId = Convert.ToInt32(div.GetAttribute("data-movie-id")) Dim link = div.Css("a")(0) Dim movieTitle = link.TextContentClean ' Scrape and store movie data as key-value pairs Scrape(New ScrapedData() From { { "MovieId", movieId }, { "MovieTitle", movieTitle } }, "Movie.Jsonl") End If Next div End Sub End Class $vbLabelText $csharpLabel 這段程式碼有什麼新內容? 工作目錄屬性用於設定所有抓取資料及其相關文件的主工作目錄。 讓我們做得更多。 如果我們需要建立類型化物件來保存以格式化物件形式抓取的數據,該怎麼辦? 讓我們實作一個Movie類別來保存格式化後的資料: public class Movie { public int Id { get; set; } public string Title { get; set; } public string URL { get; set; } } public class Movie { public int Id { get; set; } public string Title { get; set; } public string URL { get; set; } } IRON VB CONVERTER ERROR developers@ironsoftware.com $vbLabelText $csharpLabel 現在我們將更新程式碼: public class MovieScraper : WebScraper { public override void Init() { // Initialize scraper settings License.LicenseKey = "LicenseKey"; this.LoggingLevel = WebScraper.LogLevel.All; this.WorkingDirectory = AppSetting.GetAppRoot() + @"\MovieSample\Output\"; // Request homepage content for scraping this.Request("https://website.com/", Parse); } public override void Parse(Response response) { // Iterate over each movie div within the featured movie section foreach (var div in response.Css("#movie-featured > div")) { if (div.Attributes["class"] != "clearfix") { var movie = new Movie { Id = Convert.ToInt32(div.GetAttribute("data-movie-id")) }; var link = div.Css("a")[0]; movie.Title = link.TextContentClean; movie.URL = link.Attributes["href"]; // Scrape and store movie object Scrape(movie, "Movie.Jsonl"); } } } } public class MovieScraper : WebScraper { public override void Init() { // Initialize scraper settings License.LicenseKey = "LicenseKey"; this.LoggingLevel = WebScraper.LogLevel.All; this.WorkingDirectory = AppSetting.GetAppRoot() + @"\MovieSample\Output\"; // Request homepage content for scraping this.Request("https://website.com/", Parse); } public override void Parse(Response response) { // Iterate over each movie div within the featured movie section foreach (var div in response.Css("#movie-featured > div")) { if (div.Attributes["class"] != "clearfix") { var movie = new Movie { Id = Convert.ToInt32(div.GetAttribute("data-movie-id")) }; var link = div.Css("a")[0]; movie.Title = link.TextContentClean; movie.URL = link.Attributes["href"]; // Scrape and store movie object Scrape(movie, "Movie.Jsonl"); } } } } Public Class MovieScraper Inherits WebScraper Public Overrides Sub Init() ' Initialize scraper settings License.LicenseKey = "LicenseKey" Me.LoggingLevel = WebScraper.LogLevel.All Me.WorkingDirectory = AppSetting.GetAppRoot() & "\MovieSample\Output\" ' Request homepage content for scraping Me.Request("https://website.com/", AddressOf Parse) End Sub Public Overrides Sub Parse(ByVal response As Response) ' Iterate over each movie div within the featured movie section For Each div In response.Css("#movie-featured > div") If div.Attributes("class") <> "clearfix" Then Dim movie As New Movie With {.Id = Convert.ToInt32(div.GetAttribute("data-movie-id"))} Dim link = div.Css("a")(0) movie.Title = link.TextContentClean movie.URL = link.Attributes("href") ' Scrape and store movie object Scrape(movie, "Movie.Jsonl") End If Next div End Sub End Class $vbLabelText $csharpLabel 什麼是新的? 我們實作了一個Movie類別來保存我們抓取的資料。 我們將電影物件傳遞給Scrape方法,該方法能夠理解我們的格式,並以定義的方式保存,如下圖所示: 讓我們開始抓取更詳細的頁面資料。 電影頁面如下圖所示: <div class="mvi-content"> <div class="thumb mvic-thumb" style="background-image: url(https://img.gocdn.online/2017/04/28/poster/5a08e94ba02118f22dc30f298c603210-guardians-of-the-galaxy-vol-2.jpg);"></div> <div class="mvic-desc"> <h3>Guardians of the Galaxy Vol. 2</h3> <div class="desc"> Set to the backdrop of Awesome Mixtape #2, Marvel's Guardians of the Galaxy Vol. 2 continues the team's adventures as they travel throughout the cosmos to help Peter Quill learn more about his true parentage. </div> <div class="mvic-info"> <div class="mvici-left"> <p> <strong>Genre: </strong> <a href="https://Domain/genre/action/" title="Action">Action</a>, <a href="https://Domain/genre/adventure/" title="Adventure">Adventure</a>, <a href="https://Domain/genre/sci-fi/" title="Sci-Fi">Sci-Fi</a> </p> <p> <strong>Actor: </strong> <a target="_blank" href="https://Domain/actor/chris-pratt" title="Chris Pratt">Chris Pratt</a>, <a target="_blank" href="https://Domain/actor/-zoe-saldana" title="Zoe Saldana">Zoe Saldana</a>, <a target="_blank" href="https://Domain/actor/-dave-bautista-" title="Dave Bautista">Dave Bautista</a> </p> <p> <strong>Director: </strong> <a href="#" title="James Gunn">James Gunn</a> </p> <p> <strong>Country: </strong> <a href="https://Domain/country/us" title="United States">United States</a> </p> </div> <div class="mvici-right"> <p><strong>Duration:</strong> 136 min</p> <p><strong>Quality:</strong> <span class="quality">CAM</span></p> <p><strong>Release:</strong> 2017</p> <p><strong>IMDb:</strong> 8.3</p> </div> <div class="clearfix"></div> </div> <div class="clearfix"></div> </div> <div class="clearfix"></div> </div> <div class="mvi-content"> <div class="thumb mvic-thumb" style="background-image: url(https://img.gocdn.online/2017/04/28/poster/5a08e94ba02118f22dc30f298c603210-guardians-of-the-galaxy-vol-2.jpg);"></div> <div class="mvic-desc"> <h3>Guardians of the Galaxy Vol. 2</h3> <div class="desc"> Set to the backdrop of Awesome Mixtape #2, Marvel's Guardians of the Galaxy Vol. 2 continues the team's adventures as they travel throughout the cosmos to help Peter Quill learn more about his true parentage. </div> <div class="mvic-info"> <div class="mvici-left"> <p> <strong>Genre: </strong> <a href="https://Domain/genre/action/" title="Action">Action</a>, <a href="https://Domain/genre/adventure/" title="Adventure">Adventure</a>, <a href="https://Domain/genre/sci-fi/" title="Sci-Fi">Sci-Fi</a> </p> <p> <strong>Actor: </strong> <a target="_blank" href="https://Domain/actor/chris-pratt" title="Chris Pratt">Chris Pratt</a>, <a target="_blank" href="https://Domain/actor/-zoe-saldana" title="Zoe Saldana">Zoe Saldana</a>, <a target="_blank" href="https://Domain/actor/-dave-bautista-" title="Dave Bautista">Dave Bautista</a> </p> <p> <strong>Director: </strong> <a href="#" title="James Gunn">James Gunn</a> </p> <p> <strong>Country: </strong> <a href="https://Domain/country/us" title="United States">United States</a> </p> </div> <div class="mvici-right"> <p><strong>Duration:</strong> 136 min</p> <p><strong>Quality:</strong> <span class="quality">CAM</span></p> <p><strong>Release:</strong> 2017</p> <p><strong>IMDb:</strong> 8.3</p> </div> <div class="clearfix"></div> </div> <div class="clearfix"></div> </div> <div class="clearfix"></div> </div> HTML 我們可以使用新屬性(描述、類型、演員、導演、國家、時長、IMDb 評分)來擴展我們的Movie類,但在我們的範例中,我們將只使用(描述、類型、演員)。 using System.Collections.Generic; public class Movie { public int Id { get; set; } public string Title { get; set; } public string URL { get; set; } public string Description { get; set; } public List<string> Genre { get; set; } public List<string> Actor { get; set; } } using System.Collections.Generic; public class Movie { public int Id { get; set; } public string Title { get; set; } public string URL { get; set; } public string Description { get; set; } public List<string> Genre { get; set; } public List<string> Actor { get; set; } } Imports System.Collections.Generic Public Class Movie Public Property Id() As Integer Public Property Title() As String Public Property URL() As String Public Property Description() As String Public Property Genre() As List(Of String) Public Property Actor() As List(Of String) End Class $vbLabelText $csharpLabel 現在我們將導航到詳細資訊頁面進行抓取。 IronWebScraper 可讓您為抓取功能添加更多內容,以抓取不同類型的頁面格式。 正如我們在這裡看到的: public class MovieScraper : WebScraper { public override void Init() { // Initialize scraper settings License.LicenseKey = "LicenseKey"; this.LoggingLevel = WebScraper.LogLevel.All; this.WorkingDirectory = AppSetting.GetAppRoot() + @"\MovieSample\Output\"; // Request homepage content for scraping this.Request("https://domain/", Parse); } public override void Parse(Response response) { // Iterate over each movie div within the featured movie section foreach (var div in response.Css("#movie-featured > div")) { if (div.Attributes["class"] != "clearfix") { var movie = new Movie { Id = Convert.ToInt32(div.GetAttribute("data-movie-id")) }; var link = div.Css("a")[0]; movie.Title = link.TextContentClean; movie.URL = link.Attributes["href"]; // Request detailed page this.Request(movie.URL, ParseDetails, new MetaData() { { "movie", movie } }); } } } public void ParseDetails(Response response) { // Retrieve movie object from metadata var movie = response.MetaData.Get<Movie>("movie"); var div = response.Css("div.mvic-desc")[0]; // Extract description movie.Description = div.Css("div.desc")[0].TextContentClean; // Extract genres movie.Genre = new List<string>(); // Initialize genre list foreach(var genre in div.Css("div > p > a")) { movie.Genre.Add(genre.TextContentClean); } // Extract actors movie.Actor = new List<string>(); // Initialize actor list foreach (var actor in div.Css("div > p:nth-child(2) > a")) { movie.Actor.Add(actor.TextContentClean); } // Scrape and store detailed movie data Scrape(movie, "Movie.Jsonl"); } } public class MovieScraper : WebScraper { public override void Init() { // Initialize scraper settings License.LicenseKey = "LicenseKey"; this.LoggingLevel = WebScraper.LogLevel.All; this.WorkingDirectory = AppSetting.GetAppRoot() + @"\MovieSample\Output\"; // Request homepage content for scraping this.Request("https://domain/", Parse); } public override void Parse(Response response) { // Iterate over each movie div within the featured movie section foreach (var div in response.Css("#movie-featured > div")) { if (div.Attributes["class"] != "clearfix") { var movie = new Movie { Id = Convert.ToInt32(div.GetAttribute("data-movie-id")) }; var link = div.Css("a")[0]; movie.Title = link.TextContentClean; movie.URL = link.Attributes["href"]; // Request detailed page this.Request(movie.URL, ParseDetails, new MetaData() { { "movie", movie } }); } } } public void ParseDetails(Response response) { // Retrieve movie object from metadata var movie = response.MetaData.Get<Movie>("movie"); var div = response.Css("div.mvic-desc")[0]; // Extract description movie.Description = div.Css("div.desc")[0].TextContentClean; // Extract genres movie.Genre = new List<string>(); // Initialize genre list foreach(var genre in div.Css("div > p > a")) { movie.Genre.Add(genre.TextContentClean); } // Extract actors movie.Actor = new List<string>(); // Initialize actor list foreach (var actor in div.Css("div > p:nth-child(2) > a")) { movie.Actor.Add(actor.TextContentClean); } // Scrape and store detailed movie data Scrape(movie, "Movie.Jsonl"); } } Public Class MovieScraper Inherits WebScraper Public Overrides Sub Init() ' Initialize scraper settings License.LicenseKey = "LicenseKey" Me.LoggingLevel = WebScraper.LogLevel.All Me.WorkingDirectory = AppSetting.GetAppRoot() & "\MovieSample\Output\" ' Request homepage content for scraping Me.Request("https://domain/", AddressOf Parse) End Sub Public Overrides Sub Parse(ByVal response As Response) ' Iterate over each movie div within the featured movie section For Each div In response.Css("#movie-featured > div") If div.Attributes("class") <> "clearfix" Then Dim movie As New Movie With {.Id = Convert.ToInt32(div.GetAttribute("data-movie-id"))} Dim link = div.Css("a")(0) movie.Title = link.TextContentClean movie.URL = link.Attributes("href") ' Request detailed page Me.Request(movie.URL, AddressOf ParseDetails, New MetaData() From { { "movie", movie } }) End If Next div End Sub Public Sub ParseDetails(ByVal response As Response) ' Retrieve movie object from metadata Dim movie = response.MetaData.Get(Of Movie)("movie") Dim div = response.Css("div.mvic-desc")(0) ' Extract description movie.Description = div.Css("div.desc")(0).TextContentClean ' Extract genres movie.Genre = New List(Of String)() ' Initialize genre list For Each genre In div.Css("div > p > a") movie.Genre.Add(genre.TextContentClean) Next genre ' Extract actors movie.Actor = New List(Of String)() ' Initialize actor list For Each actor In div.Css("div > p:nth-child(2) > a") movie.Actor.Add(actor.TextContentClean) Next actor ' Scrape and store detailed movie data Scrape(movie, "Movie.Jsonl") End Sub End Class $vbLabelText $csharpLabel 什麼是新的? 我們可以新增抓取函數(例如ParseDetails )來抓取詳細頁面。 我們將產生檔案的Scrape函數移到了新函數中。 我們使用 IronWebScraper 功能( MetaData )將我們的影片物件傳遞給新的抓取函數。 我們抓取了頁面並將電影物件資料保存到文件中。 常見問題解答 如何從線上電影網站搜刮資料? 您可以使用 IronWebScraper 從線上電影網站搜刮資料。首先建立一個「MovieScraper」類別,設定 scraper 設定,並要求擷取首頁內容。 在 web scraping 中,'Movie' 類的功能是什麼? 在 IronWebScraper 中,「Movie」類別用來將搜刮到的資料結構化並儲存為物件,這些物件具有 Id、Title、URL、Description、Genre 和 Actor 等屬性,可確保資料處理井井有條。 如何瀏覽和擷取詳細的電影資訊? IronWebScraper 允許您實作「ParseDetails」函式,以存取詳細的電影頁面並擷取其他資訊,例如描述、類型和演員。 MetaData" 功能在網路搜刮中扮演什麼角色? IronWebScraper 的「MetaData」功能對於在 scrape 函式之間傳輸資料至關重要,例如將影片物件傳送至「ParseDetails」函式以進一步處理。 如何在 scraping 時處理不同的頁面格式? 使用 IronWebScraper,您可以建立多種 scrape 功能,以管理各種頁面格式,並有效率地擷取各種資料。 如何使用 IronWebScraper 擷取電影 ID 和標題? 您可以遍歷 IronWebScraper 中特色電影區段內的每個電影 div,透過存取資料屬性和文字內容來擷取電影 ID 和標題。 在 scraper 設定中登錄的意義是什麼? IronWebScraper 的「LoggingLevel」屬性可讓您設定日誌輸出的冗長度,有助於有效監控搜刮程序並排除故障。 工作目錄在網頁搜刮專案中如何運作? IronWebScraper 中的工作目錄指定用來儲存所有 scraped 資料和相關檔案,集中化資料管理流程。 IronWebScraper 可以用來自動執行資料擷取任務嗎? 是的,IronWebScraper 旨在自動化資料擷取任務,允許使用者建立類別和方法,以便有系統地從網頁中搜刮和儲存資料。 Darrius Serrant 立即與工程團隊聊天 全棧軟件工程師 (WebOps) Darrius Serrant 擁有邁阿密大學計算機科學學士學位,目前任職於 Iron Software 的全栈 WebOps 市場營銷工程師。從小就迷上編碼,他認為計算既神秘又可接近,是創意和解決問題的完美媒介。在 Iron Software,Darrius 喜歡創造新事物,並簡化複雜概念以便於理解。作為我們的駐場開發者之一,他也自願教學生,分享他的專業知識給下一代。對 Darrius 來說,工作令人滿意因為它被重視且有實際影響。 準備好開始了嗎? Nuget 下載 125,527 | Version: 2025.11 剛發表 免費下載 NuGet 下載總數:125,527 檢視授權