使用IronWord從DOCX中提取文字

Ahmad Sohail

已更新:2026年6月3日

Translated

View the article in English

IronWord的ExtractText()方法使您能夠從DOCX文件中提取文字，方式包括存取整個文件、特定段落或表格單元格，為使用C#進行文件處理和資料分析任務提供簡單的API。

快速入門：從DOCX提取文字

使用NuGet套件管理器安裝https://www.nuget.org/packages/IronWord
PM > Install-Package IronWord

複製並運行這段程式碼片段。

using IronWord;

// Quick example: Extract all text from DOCX
WordDocument doc = new WordDocument("sample.docx");
string allText = doc.ExtractText();
Console.WriteLine(allText);

部署以在您的實時環境中測試

今天就開始在您的專案中使用IronWord，透過免費試用

最少工作流程（5步驟）

Install the IronWord C# library
使用new WordDocument()載入現有的Word文件
在文件上調用ExtractText()以檢索所有文字
使用Paragraphs集從特定段落中提取文字
處理或導出提取出的文字內容

我如何從DOCX文件中提取所有文字？

ExtractText()方法從整個Word文件中檢索文字內容。在此範例中，我們建立了一個新文件，向其中新增文字，使用ExtractText()提取文字，並在控制台中顯示。這演示了主要的文字提取工作流程。

提取出的文字保持文件的邏輯閱讀順序。該方法按順序處理標題、段落、列表和其他文字元素，這使其特別適合於內容分析和搜索索引應用。

:path=/static-assets/word/content-code-examples/how-to/extract-text-simple.cs

using System;
using IronWord;

// Instantiate a new DOCX file
WordDocument doc = new WordDocument();

// Add text
doc.AddText("Hello, World!");

// Print extracted text from the document to the console
Console.WriteLine(doc.ExtractText());

Imports System
Imports IronWord

' Instantiate a new DOCX file
Dim doc As New WordDocument()

' Add text
doc.AddText("Hello, World!")

' Print extracted text from the document to the console
Console.WriteLine(doc.ExtractText())

$vbLabelText $csharpLabel

提取出的文字看起來像什麼？

我應該在控制台中期望什麼輸出？

我如何從特定段落中提取文字？

為了更好的控制，您可以從特定段落中提取文字而不是整個文件。通過存取Paragraphs集合，您可以定位和處理所需的段落。這種細粒度的方法在處理具有結構化內容的文件或需要單獨處理特定部分時非常有用。

在此範例中，我們從第一個和最後一個段落中提取文字，將其結合，並將結果保存到.txt文件中。這種技術在文件摘要工具中常用，您可能希望提取文件的引言和結論。類似於如何使用授權金鑰解鎖功能，Paragraphs集合讓您可以存取特定的文件元素。

:path=/static-assets/word/content-code-examples/how-to/extract-text-paragraphs.cs

using System.IO;
using System.Linq;
using IronWord;

// Load an existing DOCX file
WordDocument doc = new WordDocument("document.docx");

// Extract text and assign variables
string firstParagraph = doc.Paragraphs[0].ExtractText();
string lastParagraph = doc.Paragraphs.Last().ExtractText();

// Combine the texts
string newText = firstParagraph + " " + lastParagraph;

// Export the combined text as a new .txt file
File.WriteAllText("output.txt", newText);

Imports System.IO
Imports System.Linq
Imports IronWord

' Load an existing DOCX file
Dim doc As New WordDocument("document.docx")

' Extract text and assign variables
Dim firstParagraph As String = doc.Paragraphs(0).ExtractText()
Dim lastParagraph As String = doc.Paragraphs.Last().ExtractText()

' Combine the texts
Dim newText As String = firstParagraph & " " & lastParagraph

' Export the combined text as a new .txt file
File.WriteAllText("output.txt", newText)

$vbLabelText $csharpLabel

結合文件分析需求，提取特定段落的能力顯得格外強大。例如，您可能根據其格式、位置或內容模式提取關鍵段落。這種選擇性提取方法有助於縮短處理時間並集中於最相關的內容。

從第一段提取的內容是什麼？

從最後一段提取的內容是什麼？

結合的文字在輸出文件中怎麼顯示？

上面的截圖展示了第一段提取、最後一段提取以及保存到文字文件的結合輸出。請注意提取過程如何在移除格式化資訊的同時保留文字內容，使其適合純文字處理。

我如何從DOCX中的表格中提取資料？

表格通常包含需要提取以進行處理或分析的結構化資料。 IronWord允許您通過遍歷行和單元格存取表格資料。在此範例中，我們載入了一個包含API統計表的文件，並從第二行的第四列中提取特定的單元格值。

表格提取對資料遷移項目、報告生成和自動資料收集工作流程至關重要。處理表格資料時，理解零基索引系統很重要——第一個表格是Rows[0]，依此類推。這種系統化的方法，類似於授權結構，提供了可預測的存取模式。

:path=/static-assets/word/content-code-examples/how-to/extract-text-table.cs

using System;
using IronWord;
using IronWord.Models;

// Load the API statistics document
WordDocument apiStatsDoc = new WordDocument("api-statistics.docx");

// Extract text from the 1st table, 4th column and 3rd row
string extractedValue = ((TableCell)apiStatsDoc.Tables[0].Rows[2].Cells[3]).ExtractText();

// Print extracted value
Console.WriteLine($"Target success rate: {extractedValue}");

Imports System
Imports IronWord
Imports IronWord.Models

' Load the API statistics document
Dim apiStatsDoc As New WordDocument("api-statistics.docx")

' Extract text from the 1st table, 4th column and 3rd row
Dim extractedValue As String = CType(apiStatsDoc.Tables(0).Rows(2).Cells(3), TableCell).ExtractText()

' Print extracted value
Console.WriteLine($"Target success rate: {extractedValue}")

$vbLabelText $csharpLabel

程式碼演示了如何使用集合屬性Tables, Rows, 和 Cells存取表格單元格。請注意，Cells 集合返回((TableCell)cell).ExtractText()。這需要將using IronWord.Models;新增到您的命名空間聲明中。

源表格看起來是什麼樣的？

從表格單元格中檢索到的值是什麼？

高級文字提取場景

處理複雜的文件時，您可能需要結合多種提取技術。這是一個範例，演示了從多個元素中提取文字並以不同方式處理它們：

:path=/static-assets/word/content-code-examples/how-to/extract-text-5.cs

using IronWord;
using IronWord.Models;
using System.Text;
using System.Linq;

// Load a complex document
WordDocument complexDoc = new WordDocument("report.docx");

// Create a StringBuilder for efficient string concatenation
StringBuilder extractedContent = new StringBuilder();

// Extract and process headers (assuming they're in the first few paragraphs)
var headers = complexDoc.Paragraphs
    .Take(3)
    .Select(p => p.ExtractText())
    .Where(text => !string.IsNullOrWhiteSpace(text));

foreach (var header in headers)
{
    extractedContent.AppendLine($"HEADER: {header}");
}

// Extract table summaries
foreach (var table in complexDoc.Tables)
{
    // Get first cell as table header/identifier
    string tableIdentifier = ((TableCell)table.Rows[0].Cells[0]).ExtractText();
    extractedContent.AppendLine($"\nTABLE: {tableIdentifier}");
    
    // Extract key metrics (last row often contains totals)
    if (table.Rows.Count > 1)
    {
        var lastRow = table.Rows.Last();
        var totals = lastRow.Cells.Select(cell => ((TableCell)cell).ExtractText());
        extractedContent.AppendLine($"Totals: {string.Join(", ", totals)}");
    }
}

// Save the structured extraction
System.IO.File.WriteAllText("structured-extract.txt", extractedContent.ToString());

Imports IronWord
Imports IronWord.Models
Imports System.Text
Imports System.Linq

' Load a complex document
Dim complexDoc As New WordDocument("report.docx")

' Create a StringBuilder for efficient string concatenation
Dim extractedContent As New StringBuilder()

' Extract and process headers (assuming they're in the first few paragraphs)
Dim headers = complexDoc.Paragraphs _
    .Take(3) _
    .Select(Function(p) p.ExtractText()) _
    .Where(Function(text) Not String.IsNullOrWhiteSpace(text))

For Each header In headers
    extractedContent.AppendLine($"HEADER: {header}")
Next

' Extract table summaries
For Each table In complexDoc.Tables
    ' Get first cell as table header/identifier
    Dim tableIdentifier As String = DirectCast(table.Rows(0).Cells(0), TableCell).ExtractText()
    extractedContent.AppendLine(vbCrLf & $"TABLE: {tableIdentifier}")
    
    ' Extract key metrics (last row often contains totals)
    If table.Rows.Count > 1 Then
        Dim lastRow = table.Rows.Last()
        Dim totals = lastRow.Cells.Select(Function(cell) DirectCast(cell, TableCell).ExtractText())
        extractedContent.AppendLine($"Totals: {String.Join(", ", totals)}")
    End If
Next

' Save the structured extraction
System.IO.File.WriteAllText("structured-extract.txt", extractedContent.ToString())

$vbLabelText $csharpLabel

此高級範例展示了如何通過組合不同的文件元素來建立結構化提取。這種方法對於生成文件摘要、建立索引或為進一步處理準備資料特別有用。就如同升級增強了軟體功能，結合提取方法增強了您的文件處理能力。

文字提取最佳實踐

在生產應用中實施文字提取時，請考慮以下最佳實踐：

錯誤處理：總是將提取程式碼包圍在try-catch塊中以處理可能損壞或結構異常的文件。
性能優化：對於大型文件或批處理，請考慮僅提取所需部分而非整個文件內容。
字元編碼：保存提取的文字時請注意字元編碼，特別是對於包含特殊字元或多語言的文件。
記憶體管理：處理多個文件時，正確處置WordDocument物件以防止記憶體泄漏。

記住，文字提取保留了邏輯閱讀順序但移除了格式。如果需要保持格式資訊，請考慮使用額外的IronWord功能或分開儲存元資料。對於生產部署，請檢查變更日誌以保持更新最新功能和改進。

摘要

IronWord的ExtractText()方法提供了一種強大而靈活的方法來從DOCX文件中提取文字。無論您需要提取整個文件、特定段落或表格資料，該API都提供了完成目標的直接方法。通過將這些技術與適當的錯誤處理和優化策略相結合，您可以構建健壯的文件處理應用，能有效地處理各種文字提取場景。

對於更高級的場景以及探索附加功能，請查看擴展和其他文件資源，以增強您的文件處理能力。

常見問題

如何在C#中從Word文件提取所有文字？

在WordDocument物件上使用IronWord的ExtractText()方法。只需使用WordDocument doc = new WordDocument("document.docx");載入您的DOCX文件，然後呼叫string text = doc.ExtractText();來從文件中檢索所有文字內容。

我可以從特定段落提取文字而不是整個文件嗎？

可以，IronWord允許您透過存取段落集合來從特定段落提取文字。使用doc.Paragraphs[index].ExtractText()來目標個別段落以進行更細緻的文字提取。

如何從DOCX文件中的表格中提取文字？

IronWord透過Tables集合來啟用表格文字提取。使用doc.Tables[0].Rows[0].Cells[0].ExtractText()存取特定單元格，可以從您文件中任何表格單元格檢索文字內容。

使用ExtractText()提取的文字遵循什麼順序？

IronWord的ExtractText()方法保持文件的邏輯閱讀順序，按序處理標題、段落、列表和其他文字元素，使其非常適合內容分析和搜尋索引。

開始從DOCX文件提取文字的基本步驟是什麼？

首先透過NuGet（Install-Package IronWord）安裝IronWord，然後建立或載入一個WordDocument，最後使用ExtractText()方法從整個文件、特定段落或表格單元格中檢索文字。

文字提取適合用於建立文件索引系統嗎？

是的，IronWord的文字提取功能非常適合用於建立文件索引系統、內容管理解決方案和資料提取管道，提供對Word文件內容的高效程式化存取。

Ahmad Sohail

立即與工程團隊聊天

全端開發人員

Ahmad是一位擁有C#、Python和網頁技術堅實基礎的全端開發人員。他對構建可擴展的軟體解決方案深感興趣，並喜歡探索設計和功能在現實世界應用中的結合。

加入Iron Software團隊之前，Ahmad曾致力於自動化專案和API整合，專注於提升性能和開發者體驗。

在空閒時間，他喜歡試驗UI/UX理念，為開放原始碼工具做出貢獻，偶爾也會潛心編寫技術文件和文章以簡化複雜的主題。

準備開始了嗎？

Nuget 下載 49,323 | 版本： 2026.7 剛剛發布

查看授權

還在滾動？

想要快速證明嗎？ PM > Install-Package IronWord
運行範例觀看您的資料變成Word檔。

查看授權

客戶亮點：

開發者聚焦：

網路研討會：

開始免費30天試用

本頁內容

使用IronWord從DOCX中提取文字

使用NuGet套件管理器安裝https://www.nuget.org/packages/IronWord

複製並運行這段程式碼片段。

部署以在您的實時環境中測試

最少工作流程（5步驟）

我如何從DOCX文件中提取所有文字？

提取出的文字看起來像什麼？

我應該在控制台中期望什麼輸出？

我如何從特定段落中提取文字？

從第一段提取的內容是什麼？

從最後一段提取的內容是什麼？

結合的文字在輸出文件中怎麼顯示？

我如何從DOCX中的表格中提取資料？

源表格看起來是什麼樣的？

從表格單元格中檢索到的值是什麼？

高級文字提取場景

文字提取最佳實踐

摘要

常見問題

如何在C#中從Word文件提取所有文字？

我可以從特定段落提取文字而不是整個文件嗎？

如何從DOCX文件中的表格中提取文字？

使用ExtractText()提取的文字遵循什麼順序？

開始從DOCX文件提取文字的基本步驟是什麼？

文字提取適合用於建立文件索引系統嗎？

還在滾動？

您的授權金鑰已經發送到您的收件箱

您的演示請求已提交。

Iron 支援團隊

開始免費30天試用

本頁內容

使用IronWord從DOCX中提取文字

使用NuGet套件管理器安裝https://www.nuget.org/packages/IronWord

複製並運行這段程式碼片段。

部署以在您的實時環境中測試

最少工作流程（5步驟）

我如何從DOCX文件中提取所有文字？

提取出的文字看起來像什麼？

我應該在控制台中期望什麼輸出？

我如何從特定段落中提取文字？

從第一段提取的內容是什麼？

從最後一段提取的內容是什麼？

結合的文字在輸出文件中怎麼顯示？

我如何從DOCX中的表格中提取資料？

源表格看起來是什麼樣的？

從表格單元格中檢索到的值是什麼？

高級文字提取場景

文字提取最佳實踐

摘要

常見問題

如何在C#中從Word文件提取所有文字？

我可以從特定段落提取文字而不是整個文件嗎？

如何從DOCX文件中的表格中提取文字？

使用ExtractText()提取的文字遵循什麼順序？

開始從DOCX文件提取文字的基本步驟是什麼？

文字提取適合用於建立文件索引系統嗎？

還在滾動？

下一步：開始免費30天試用

Thank You

下一步：開始免費30天試用

想免費將 IronSuite 部署到實際專案中嗎？

包含什麼？

您的授權金鑰已經發送到您的收件箱

您的演示請求已提交。

全球數百萬工程師信賴

Iron 支援團隊