Extract Text

When extracting a large volume of text on documents, the process can be inefficient and time-consuming, especially when dealing with tables and large amounts of paragraphs. However, IronWord's ExtractText method is a time-saving solution. It enables developers to easily extract all of a specific number of text within the document, eliminating the need for additional loops and simplifying access for the Text property. This method ensures that developers can work efficiently and save valuable time.

In this example, we'll showcase several ways to use the ExtractText method and boost your efficiency when retrieving text from documents.

Useful Ways to Extract Text from a Docx

  • using IronWord;
  • WordDocument doc = new WordDocument("multi-paragraph.docx");
  • Console.WriteLine(doc.ExtractText());
  • Console.WriteLine(doc.Paragraphs[0].ExtractText());
  • Console.WriteLine(doc.Paragraphs.Last().ExtractText());

Extract Text

Using the IronWord library, extracting text from a Word document is a straightforward process. We start by importing the library and initializing the WordDocument class. This step allows us to load an existing document with paragraphs. We then call the ExtractText method and print the document's entire text to the console.

Extract Specific Text

The example above extracts the entire document's text, but with the IronWord library, you have full control over the extraction process. If you only want specific portions or paragraphs, you can use the Paragraphs property in the WordDocument to return an array of Paragraphs. As a generic list, this array can be manipulated as per your requirements, either by calling the index as shown above with doc.Paragraphs[0] or by using the built-in array methods for C# collections.

When accessing the index of the Paragraphs, we only return and extract the text from the document's first paragraph and print it out to the console. Subsequently, we also call it Last on the Paragraphs array to return and extract the text of the last paragraph only from the document.