使用 IronWord 從 DOCX 擷取文字

更新:2026年1月10日

Translated

View the article in English

IronWord 的 ExtractText() 方法可讓您透過存取整個文件、特定段落或表格儲存格，從 DOCX 檔案中萃取文字，為 C# 中的文件處理和資料分析任務提供簡單的 API。

快速入門：從 DOCX 擷取文字。

1.安裝 IronWord NuGet 套件：Install-Package IronWord。 2.建立或載入 WordDocument：WordDocument doc = new WordDocument("document.docx"); 3.擷取所有文字：string text = doc.ExtractText(); 4.從特定段落擷取：string para = doc.Paragraphs[0].ExtractText(); 5.從表格儲存格中萃取：string cell = doc.Tables[0].Rows[0].Cells[0].ExtractText();

使用 NuGet 套件管理器安裝 IronWord

PM > Install-Package IronWord

複製並運行這段程式碼。

using IronWord;

// Quick example: Extract all text from DOCX
WordDocument doc = new WordDocument("sample.docx");
string allText = doc.ExtractText();
Console.WriteLine(allText);

部署到您的生產環境進行測試

立即開始在您的專案中使用 IronWord，免費試用！

免費試用30天

從 DOCX 檔案中萃取文字是文件處理和資料分析的常見需求。 IronWord 提供了從現有 DOCX 檔案中讀取與擷取文字內容的直接方法，讓您能以程式化的方式存取段落、表格和其他文字元素。

本教學詳細介紹 ExtractText() 方法，並示範如何從各種文件元素存取文字。無論您是要建立文件索引系統、內容管理解決方案或資料擷取管道，瞭解如何有效率地從 Word 文件擷取文字都是非常重要的。

## 如何從 DOCX 文件中提取文本

下載從 DOCX 擷取文字的 C# 函式庫
建立新的 Word 文件
使用 ExtractText 存取並抽取文字內容
處理或匯出提取的文本

如何從 DOCX 文件中萃取所有文字？

ExtractText()方法從整個 Word 文件中檢索文字內容。在這個範例中，我們建立一個新的文件，加入文字，使用 ExtractText() 擷取文字，並顯示在控制台中。這展示了主要的文字擷取工作流程。

摘錄的文字必須保持文件的邏輯閱讀順序。此方法會依序處理標題、段落、列表和其他文字元素，因此非常適合內容分析和搜尋索引的應用。

:path=/static-assets/word/content-code-examples/how-to/extract-text-simple.cs

using IronWord;

// Instantiate a new DOCX file
WordDocument doc = new WordDocument();

// Add text
doc.AddText("Hello, World!");

// Print extracted text from the document to the console
Console.WriteLine(doc.ExtractText());

Imports IronWord

' Instantiate a new DOCX file
Dim doc As New WordDocument()

' Add text
doc.AddText("Hello, World!")

' Print extracted text from the document to the console
Console.WriteLine(doc.ExtractText())

$vbLabelText $csharpLabel

擷取的文字是什麼樣子？

我應該期待控制台有哪些輸出？

如何從特定段落擷取文字？

為了更好地控制，您可以從特定段落中提取文本，而不是從整個文件中提取。透過存取 Paragraphs 套件，您可以針對並處理任何需要的段落。在處理具有結構化內容的文件或需要獨立處理特定部分時，這種細粒度的方法非常有用。

在這個範例中，我們從第一段和最後一段抽取文字，將它們合併，然後將結果儲存到 .txt 檔案中。此技術常用於文件摘要工具，您可能想要抽取文件的引言和結語。類似於使用授權金鑰來解鎖功能，Paragraphs集合可讓您存取特定的文件元素。

:path=/static-assets/word/content-code-examples/how-to/extract-text-paragraphs.cs

using IronWord;
using System.IO;

// Load an existing DOCX file
WordDocument doc = new WordDocument("document.docx");

// Extract text and assign variables
string firstParagraph = doc.Paragraphs[0].ExtractText();
string lastParagraph = doc.Paragraphs.Last().ExtractText();

// Combine the texts
string newText = firstParagraph + " " + lastParagraph;

// Export the combined text as a new .txt file
File.WriteAllText("output.txt", newText);

Imports IronWord
Imports System.IO

' Load an existing DOCX file
Dim doc As New WordDocument("document.docx")

' Extract text and assign variables
Dim firstParagraph As String = doc.Paragraphs(0).ExtractText()
Dim lastParagraph As String = doc.Paragraphs.Last().ExtractText()

' Combine the texts
Dim newText As String = firstParagraph & " " & lastParagraph

' Export the combined text as a new .txt file
File.WriteAllText("output.txt", newText)

$vbLabelText $csharpLabel

在結合文件分析需求時，抽取特定段落的能力變得非常強大。例如，您可能會根據格式、位置或內容模式來擷取關鍵段落。這種選擇性的擷取方式有助於縮短處理時間，並將重點放在最相關的內容上。

從第一段擷取哪些內容？

從最後一段擷取哪些內容？

合併後的文字在輸出檔案中如何顯示？

上面的截圖顯示了第一段萃取、最後一段萃取，以及儲存至文字檔的合併輸出。請注意擷取過程如何保留文字內容，同時移除格式資訊，使其適合純文字處理。

如何從 DOCX 中的表格萃取資料？

表格通常包含需要擷取進行處理或分析的結構化資料。 IronWord 讓您可以透過瀏覽行和儲存格來存取表格資料。在這個例子中，我們載入一個包含 API 統計表的文檔，並從第 2 行第 4 列提取一個特定的單元格值。

表格抽取對於資料遷移專案、報表產生和自動化資料收集工作流程來說是不可或缺的。在處理表格資料時，了解基於零的索引系統至關重要 - 第一個表是 Tables[0], 第一行是 Rows[0], 以此類推。此系統化方法類似於授權結構，可提供可預測的存取模式。

:path=/static-assets/word/content-code-examples/how-to/extract-text-table.cs

using IronWord;

// Load the API statistics document
WordDocument apiStatsDoc = new WordDocument("api-statistics.docx");

// Extract text from the 1st table, 4th column and 2nd row
string extractedValue = apiStatsDoc.Tables[0].Rows[2].Cells[3].ExtractText();

// Print extracted value
Console.WriteLine($"Target success rate: {extractedValue}");

Imports IronWord

' Load the API statistics document
Dim apiStatsDoc As New WordDocument("api-statistics.docx")

' Extract text from the 1st table, 4th column and 2nd row
Dim extractedValue As String = apiStatsDoc.Tables(0).Rows(2).Cells(3).ExtractText()

' Print extracted value
Console.WriteLine($"Target success rate: {extractedValue}")

$vbLabelText $csharpLabel

原始碼表是什麼樣子？

從表格儲存格讀取什麼值？

進階文字萃取情境

在處理複雜的文件時，您可能需要結合多種萃取技術。以下是一個示範從多個元素中抽取文字並進行不同處理的範例：

using IronWord;
using System.Text;
using System.Linq;

// Load a complex document
WordDocument complexDoc = new WordDocument("report.docx");

// Create a StringBuilder for efficient string concatenation
StringBuilder extractedContent = new StringBuilder();

// Extract and process headers (assuming they're in the first few paragraphs)
var headers = complexDoc.Paragraphs
    .Take(3)
    .Select(p => p.ExtractText())
    .Where(text => !string.IsNullOrWhiteSpace(text));

foreach (var header in headers)
{
    extractedContent.AppendLine($"HEADER: {header}");
}

// Extract table summaries
foreach (var table in complexDoc.Tables)
{
    // Get first cell as table header/identifier
    string tableIdentifier = table.Rows[0].Cells[0].ExtractText();
    extractedContent.AppendLine($"\nTABLE: {tableIdentifier}");

    // Extract key metrics (last row often contains totals)
    if (table.Rows.Count > 1)
    {
        var lastRow = table.Rows.Last();
        var totals = lastRow.Cells.Select(cell => cell.ExtractText());
        extractedContent.AppendLine($"Totals: {string.Join(", ", totals)}");
    }
}

// Save the structured extraction
System.IO.File.WriteAllText("structured-extract.txt", extractedContent.ToString());

using IronWord;
using System.Text;
using System.Linq;

// Load a complex document
WordDocument complexDoc = new WordDocument("report.docx");

// Create a StringBuilder for efficient string concatenation
StringBuilder extractedContent = new StringBuilder();

// Extract and process headers (assuming they're in the first few paragraphs)
var headers = complexDoc.Paragraphs
    .Take(3)
    .Select(p => p.ExtractText())
    .Where(text => !string.IsNullOrWhiteSpace(text));

foreach (var header in headers)
{
    extractedContent.AppendLine($"HEADER: {header}");
}

// Extract table summaries
foreach (var table in complexDoc.Tables)
{
    // Get first cell as table header/identifier
    string tableIdentifier = table.Rows[0].Cells[0].ExtractText();
    extractedContent.AppendLine($"\nTABLE: {tableIdentifier}");

    // Extract key metrics (last row often contains totals)
    if (table.Rows.Count > 1)
    {
        var lastRow = table.Rows.Last();
        var totals = lastRow.Cells.Select(cell => cell.ExtractText());
        extractedContent.AppendLine($"Totals: {string.Join(", ", totals)}");
    }
}

// Save the structured extraction
System.IO.File.WriteAllText("structured-extract.txt", extractedContent.ToString());

Imports IronWord
Imports System.Text
Imports System.Linq

' Load a complex document
Dim complexDoc As New WordDocument("report.docx")

' Create a StringBuilder for efficient string concatenation
Dim extractedContent As New StringBuilder()

' Extract and process headers (assuming they're in the first few paragraphs)
Dim headers = complexDoc.Paragraphs _
    .Take(3) _
    .Select(Function(p) p.ExtractText()) _
    .Where(Function(text) Not String.IsNullOrWhiteSpace(text))

For Each header In headers
    extractedContent.AppendLine($"HEADER: {header}")
Next

' Extract table summaries
For Each table In complexDoc.Tables
    ' Get first cell as table header/identifier
    Dim tableIdentifier As String = table.Rows(0).Cells(0).ExtractText()
    extractedContent.AppendLine(vbCrLf & $"TABLE: {tableIdentifier}")

    ' Extract key metrics (last row often contains totals)
    If table.Rows.Count > 1 Then
        Dim lastRow = table.Rows.Last()
        Dim totals = lastRow.Cells.Select(Function(cell) cell.ExtractText())
        extractedContent.AppendLine($"Totals: {String.Join(", ", totals)}")
    End If
Next

' Save the structured extraction
System.IO.File.WriteAllText("structured-extract.txt", extractedContent.ToString())

$vbLabelText $csharpLabel

這個進階範例展示如何結合不同的文件元素來建立結構化的萃取。此方法適用於產生文件摘要、建立索引或準備資料以作進一步處理。正如升級可增強軟體功能一樣，結合萃取方法可增強您的文件處理能力。

文字萃取的最佳實務

在生產應用程式中實作文字抽取時，請考慮這些最佳實務：

1.錯誤處理：請務必使用 try-catch 區塊包覆抽取程式碼，以處理可能損毀或具有意外結構的文件。

2.效能最佳化：對於大型文件或批次處理，可考慮僅擷取必要的部分，而非整個文件內容。

3.字元編碼：保存擷取的文字時，請注意字元編碼，尤其是包含特殊字符或多國語言的文件。

4.記憶體管理：在處理多個文件時，請適當地處置 WordDocument 物件，以防止記憶體洩漏。

請記住，文字抽取會保留邏輯閱讀順序，但會移除格式。如果您需要維護格式化資訊，請考慮使用額外的 IronWord 功能或另外儲存元資料。對於生產部署，請檢閱更新紀錄，以隨時更新最新功能和改進。

摘要

IronWord 的 ExtractText() 方法提供了從 DOCX 檔案中提取文字的強大且靈活的方式。無論您需要擷取整個文件、特定段落或表格資料，API 都能提供直接的方法來達成您的目標。將這些技術與適當的錯誤處理和最佳化策略結合，您就可以建立強大的文件處理應用程式，有效率地處理各種文字萃取情境。

如需更進階的使用情境及探索其他功能，請參閱 extensions 及其他文件資源，以增強您的文件處理能力。

常見問題解答

如何用 C# 從 Word 文件中提取所有文字？

在 WordDocument 物件上使用 IronWord 的 ExtractText() 方法。只需用 WordDocument doc = new WordDocument("document.docx"); 載入您的 DOCX 檔案，然後調用 string text = doc.ExtractText(); 即可擷取文件中的所有文字內容。

我可以從特定段落而非整個文件中抽取文字嗎？

是的，IronWord 允許您透過存取段落集合，從特定段落中抽取文字。使用 doc.Paragraphs[index].ExtractText() 針對個別段落進行更仔細的文字擷取。

如何從 DOCX 檔案的表格中提取文字？

IronWord 可透過 Tables 集合擷取表格文字。使用 doc.Tables[0].Rows[0].Cells[0].ExtractText() 存取特定的儲存格，從文件中的任何表格儲存格擷取文字內容。

使用 ExtractText() 時，提取的文字會依循什麼順序？

IronWord 的 ExtractText() 方法可維持文件的邏輯閱讀順序，依序處理標題、段落、列表和其他文字元素，非常適合內容分析和搜尋索引。

開始從 DOCX 檔案提取文字的基本步驟是什麼？

首先透過 NuGet 安裝 IronWord (Install-Package IronWord)，然後建立或載入一個 WordDocument，最後根據需要使用 ExtractText() 方法擷取整個文件、特定段落或表格單元格中的文字。

文字抽取是否適合建立文件索引系統？

是的，IronWord 的文字擷取功能非常適合建立文件索引系統、內容管理解決方案和資料擷取管道，提供高效率的 Word 文件內容程式化存取。

Ahmad Sohail

立即與工程團隊聊天

全堆疊開發人員

Ahmad 是一名全堆疊開發人員，在 C#、Python 和 Web 技術方面有深厚的基礎。

在加入 Iron Software 團隊之前，Ahmad 從事自動化專案和 API 整合工作，專注於改善效能和開發人員體驗。

在空閒時間，他喜歡嘗試 UI/UX 想法，為開源工具貢獻心力，偶爾也會鑽研技術撰寫和文件，讓複雜的主題更容易理解。

準備好開始了嗎？

Nuget 下載 32,629 | 版本： 2026.2 剛剛發布

查看許可證

客戶亮點：

開發者焦點：

網絡研討會：

開始免費 30 天試用

在這頁

使用 IronWord 從 DOCX 擷取文字

立即開始使用 NuGet 建立 PDF 檔案：

使用 NuGet 套件管理器安裝 IronWord

複製並運行這段程式碼。

部署到您的生產環境進行測試

如何從 DOCX 文件中萃取所有文字？

擷取的文字是什麼樣子？

我應該期待控制台有哪些輸出？

如何從特定段落擷取文字？

從第一段擷取哪些內容？

從最後一段擷取哪些內容？

合併後的文字在輸出檔案中如何顯示？

如何從 DOCX 中的表格萃取資料？

原始碼表是什麼樣子？

從表格儲存格讀取什麼值？

進階文字萃取情境

文字萃取的最佳實務

摘要

常見問題解答

如何用 C# 從 Word 文件中提取所有文字？

我可以從特定段落而非整個文件中抽取文字嗎？

如何從 DOCX 檔案的表格中提取文字？

使用 ExtractText() 時，提取的文字會依循什麼順序？

開始從 DOCX 檔案提取文字的基本步驟是什麼？

文字抽取是否適合建立文件索引系統？

開始免費 30 天試用

在這頁

使用 IronWord 從 DOCX 擷取文字

立即開始使用 NuGet 建立 PDF 檔案：

使用 NuGet 套件管理器安裝 IronWord

複製並運行這段程式碼。

部署到您的生產環境進行測試

如何從 DOCX 文件中萃取所有文字？

擷取的文字是什麼樣子？

我應該期待控制台有哪些輸出？

如何從特定段落擷取文字？

從第一段擷取哪些內容？

從最後一段擷取哪些內容？

合併後的文字在輸出檔案中如何顯示？

如何從 DOCX 中的表格萃取資料？

原始碼表是什麼樣子？

從表格儲存格讀取什麼值？

進階文字萃取情境

文字萃取的最佳實務

摘要

常見問題解答

如何用 C# 從 Word 文件中提取所有文字？

我可以從特定段落而非整個文件中抽取文字嗎？

如何從 DOCX 檔案的表格中提取文字？

使用 ExtractText() 時，提取的文字會依循什麼順序？

開始從 DOCX 檔案提取文字的基本步驟是什麼？

文字抽取是否適合建立文件索引系統？

下一步：開始免費 30 天試用

下一步：開始免費 30 天試用

深受全球數百萬工程師信賴