Exporting Images of OCR Elements

This example shows how IronTesseract can extract the image and coordinates for every character, word, line, or paragraph of text in any OCR document.

// Import necessary libraries
using IronOcr;

// Instantiate the IronTesseract class
var Ocr = new IronTesseract();

// Load a document/image for OCR processing. Ensure the file path is correct and the image is accessible.
using (var Input = new OcrInput(@"path-to-your-document-or-image"))
{
    // Perform OCR on the input image and retrieve results
    var Result = Ocr.Read(Input);

    // Iterate through each page in the OCR result
    foreach (var Page in Result.Pages)
    {
        // Iterate through each paragraph in the page
        foreach (var Paragraph in Page.Paragraphs)
        {
            Console.WriteLine($"\n\nParagraph: '{Paragraph.Text}'\n");

            // Iterate through each line in the paragraph
            foreach (var Line in Paragraph.Lines)
            {
                Console.WriteLine($"  Line: '{Line.Text}'");

                // Iterate through each word in the line
                foreach (var Word in Line.Words)
                {
                    Console.WriteLine($"    Word: '{Word.Text}'");

                    // Iterate through each character in the word
                    foreach (var Character in Word.Characters)
                    {
                        // Output the character, its position in the document, and its bounding rectangle
                        Console.WriteLine($"      Character: '{Character.Text}', Position: {Character.Position}, Bounds: {Character.Bounds}");
                    }
                }
            }
        }
    }
}
// Import necessary libraries
using IronOcr;

// Instantiate the IronTesseract class
var Ocr = new IronTesseract();

// Load a document/image for OCR processing. Ensure the file path is correct and the image is accessible.
using (var Input = new OcrInput(@"path-to-your-document-or-image"))
{
    // Perform OCR on the input image and retrieve results
    var Result = Ocr.Read(Input);

    // Iterate through each page in the OCR result
    foreach (var Page in Result.Pages)
    {
        // Iterate through each paragraph in the page
        foreach (var Paragraph in Page.Paragraphs)
        {
            Console.WriteLine($"\n\nParagraph: '{Paragraph.Text}'\n");

            // Iterate through each line in the paragraph
            foreach (var Line in Paragraph.Lines)
            {
                Console.WriteLine($"  Line: '{Line.Text}'");

                // Iterate through each word in the line
                foreach (var Word in Line.Words)
                {
                    Console.WriteLine($"    Word: '{Word.Text}'");

                    // Iterate through each character in the word
                    foreach (var Character in Word.Characters)
                    {
                        // Output the character, its position in the document, and its bounding rectangle
                        Console.WriteLine($"      Character: '{Character.Text}', Position: {Character.Position}, Bounds: {Character.Bounds}");
                    }
                }
            }
        }
    }
}
' Import necessary libraries
Imports Microsoft.VisualBasic
Imports IronOcr

' Instantiate the IronTesseract class
Private Ocr = New IronTesseract()

' Load a document/image for OCR processing. Ensure the file path is correct and the image is accessible.
Using Input = New OcrInput("path-to-your-document-or-image")
	' Perform OCR on the input image and retrieve results
	Dim Result = Ocr.Read(Input)

	' Iterate through each page in the OCR result
	For Each Page In Result.Pages
		' Iterate through each paragraph in the page
		For Each Paragraph In Page.Paragraphs
			Console.WriteLine($vbLf & vbLf & "Paragraph: '{Paragraph.Text}'" & vbLf)

			' Iterate through each line in the paragraph
			For Each Line In Paragraph.Lines
				Console.WriteLine($"  Line: '{Line.Text}'")

				' Iterate through each word in the line
				For Each Word In Line.Words
					Console.WriteLine($"    Word: '{Word.Text}'")

					' Iterate through each character in the word
					For Each Character In Word.Characters
						' Output the character, its position in the document, and its bounding rectangle
						Console.WriteLine($"      Character: '{Character.Text}', Position: {Character.Position}, Bounds: {Character.Bounds}")
					Next Character
				Next Word
			Next Line
		Next Paragraph
	Next Page
End Using
$vbLabelText   $csharpLabel

Explanation

  • IronTesseract is used to perform OCR (Optical Character Recognition) on the provided document/image.
  • An instance of IronTesseract is created to manage the OCR operations.
  • The OcrInput object is initialized with the path to the target document or image; this path should be adjusted to point to a valid file.
  • The Read method is called to process the document, storing the extracted text and related data in the Result.
  • The code iterates through pages, paragraphs, lines, words, and characters to extract and display the text and additional data, such as text position and bounding box information.
  • This approach helps in getting detailed OCR analysis with positional and structural information of the recognized text.

Ensure that you have the IronOcr library installed and properly referenced in your project to compile and run this code successfully.