Exporting Images of OCR Elements
This example shows how IronTesseract
can extract the image and coordinates for every character, word, line, or paragraph of text in any OCR document.
// Import necessary libraries
using IronOcr;
// Instantiate the IronTesseract class
var Ocr = new IronTesseract();
// Load a document/image for OCR processing. Ensure the file path is correct and the image is accessible.
using (var Input = new OcrInput(@"path-to-your-document-or-image"))
{
// Perform OCR on the input image and retrieve results
var Result = Ocr.Read(Input);
// Iterate through each page in the OCR result
foreach (var Page in Result.Pages)
{
// Iterate through each paragraph in the page
foreach (var Paragraph in Page.Paragraphs)
{
Console.WriteLine($"\n\nParagraph: '{Paragraph.Text}'\n");
// Iterate through each line in the paragraph
foreach (var Line in Paragraph.Lines)
{
Console.WriteLine($" Line: '{Line.Text}'");
// Iterate through each word in the line
foreach (var Word in Line.Words)
{
Console.WriteLine($" Word: '{Word.Text}'");
// Iterate through each character in the word
foreach (var Character in Word.Characters)
{
// Output the character, its position in the document, and its bounding rectangle
Console.WriteLine($" Character: '{Character.Text}', Position: {Character.Position}, Bounds: {Character.Bounds}");
}
}
}
}
}
}
// Import necessary libraries
using IronOcr;
// Instantiate the IronTesseract class
var Ocr = new IronTesseract();
// Load a document/image for OCR processing. Ensure the file path is correct and the image is accessible.
using (var Input = new OcrInput(@"path-to-your-document-or-image"))
{
// Perform OCR on the input image and retrieve results
var Result = Ocr.Read(Input);
// Iterate through each page in the OCR result
foreach (var Page in Result.Pages)
{
// Iterate through each paragraph in the page
foreach (var Paragraph in Page.Paragraphs)
{
Console.WriteLine($"\n\nParagraph: '{Paragraph.Text}'\n");
// Iterate through each line in the paragraph
foreach (var Line in Paragraph.Lines)
{
Console.WriteLine($" Line: '{Line.Text}'");
// Iterate through each word in the line
foreach (var Word in Line.Words)
{
Console.WriteLine($" Word: '{Word.Text}'");
// Iterate through each character in the word
foreach (var Character in Word.Characters)
{
// Output the character, its position in the document, and its bounding rectangle
Console.WriteLine($" Character: '{Character.Text}', Position: {Character.Position}, Bounds: {Character.Bounds}");
}
}
}
}
}
}
' Import necessary libraries
Imports Microsoft.VisualBasic
Imports IronOcr
' Instantiate the IronTesseract class
Private Ocr = New IronTesseract()
' Load a document/image for OCR processing. Ensure the file path is correct and the image is accessible.
Using Input = New OcrInput("path-to-your-document-or-image")
' Perform OCR on the input image and retrieve results
Dim Result = Ocr.Read(Input)
' Iterate through each page in the OCR result
For Each Page In Result.Pages
' Iterate through each paragraph in the page
For Each Paragraph In Page.Paragraphs
Console.WriteLine($vbLf & vbLf & "Paragraph: '{Paragraph.Text}'" & vbLf)
' Iterate through each line in the paragraph
For Each Line In Paragraph.Lines
Console.WriteLine($" Line: '{Line.Text}'")
' Iterate through each word in the line
For Each Word In Line.Words
Console.WriteLine($" Word: '{Word.Text}'")
' Iterate through each character in the word
For Each Character In Word.Characters
' Output the character, its position in the document, and its bounding rectangle
Console.WriteLine($" Character: '{Character.Text}', Position: {Character.Position}, Bounds: {Character.Bounds}")
Next Character
Next Word
Next Line
Next Paragraph
Next Page
End Using
Explanation
- IronTesseract is used to perform OCR (Optical Character Recognition) on the provided document/image.
- An instance of
IronTesseract
is created to manage the OCR operations. - The
OcrInput
object is initialized with the path to the target document or image; this path should be adjusted to point to a valid file. - The
Read
method is called to process the document, storing the extracted text and related data in theResult
. - The code iterates through pages, paragraphs, lines, words, and characters to extract and display the text and additional data, such as text position and bounding box information.
- This approach helps in getting detailed OCR analysis with positional and structural information of the recognized text.
Ensure that you have the IronOcr library installed and properly referenced in your project to compile and run this code successfully.