OcrResult Class
IronOCR returns an advanced result object for each page it scans using Tesseract 5. This contains location data, images, text, statistical confidence, alternative symbol choices, font-names, font-sizes decoration, font weights, and position for each:
- Page
- Paragraph
- Line of Text
- Word
- Individual Character
- Barcode
Here is an example of how you might retrieve and work with these data points using C# with IronOCR:
// Import the IronOCR library
using IronOcr;
class OCRExample
{
static void Main()
{
// Create a new instance of the IronTesseract engine
var OcrEngine = new IronTesseract();
// Specify the file path of the scanned document
var Input = new OcrInput(@"path_to_your_image_file.jpg");
// Perform OCR on the input image
OcrResult result = OcrEngine.Read(Input);
// Check the number of pages detected
Console.WriteLine($"Detected {result.Pages.Count} page(s)");
// Iterate through each page
foreach (var page in result.Pages)
{
// Output the page text
Console.WriteLine($"Page Text: {page.Text}");
// Iterate through each paragraph in the page
foreach (var paragraph in page.Paragraphs)
{
Console.WriteLine($"Paragraph Text: {paragraph.Text}");
// Iterate through each line in the paragraph
foreach (var line in paragraph.Lines)
{
Console.WriteLine($"Line Text: {line.Text}");
// Iterate through each word in the line
foreach (var word in line.Words)
{
Console.WriteLine($"Word Text: {word.Text}");
// Iterate through each character in the word
foreach (var character in word.Characters)
{
Console.WriteLine($"Character Text: {character.Text}");
}
}
}
}
//Detect barcodes within the page and output their values
foreach (var barcode in page.Barcodes)
{
Console.WriteLine($"Barcode Value: {barcode.Value}");
}
}
}
}
// Import the IronOCR library
using IronOcr;
class OCRExample
{
static void Main()
{
// Create a new instance of the IronTesseract engine
var OcrEngine = new IronTesseract();
// Specify the file path of the scanned document
var Input = new OcrInput(@"path_to_your_image_file.jpg");
// Perform OCR on the input image
OcrResult result = OcrEngine.Read(Input);
// Check the number of pages detected
Console.WriteLine($"Detected {result.Pages.Count} page(s)");
// Iterate through each page
foreach (var page in result.Pages)
{
// Output the page text
Console.WriteLine($"Page Text: {page.Text}");
// Iterate through each paragraph in the page
foreach (var paragraph in page.Paragraphs)
{
Console.WriteLine($"Paragraph Text: {paragraph.Text}");
// Iterate through each line in the paragraph
foreach (var line in paragraph.Lines)
{
Console.WriteLine($"Line Text: {line.Text}");
// Iterate through each word in the line
foreach (var word in line.Words)
{
Console.WriteLine($"Word Text: {word.Text}");
// Iterate through each character in the word
foreach (var character in word.Characters)
{
Console.WriteLine($"Character Text: {character.Text}");
}
}
}
}
//Detect barcodes within the page and output their values
foreach (var barcode in page.Barcodes)
{
Console.WriteLine($"Barcode Value: {barcode.Value}");
}
}
}
}
CONVERTER NOT RUNNING
Explanation:
IronTesseract Engine: This is used to initiate the OCR process on the provided image input.
OcrInput: Represents the image file that will be processed. You need to specify the path to your image file.
Read Method: This processes the image and returns an
OcrResult
with all extracted data.Iterating Structure: The example provided utilizes nested loops to dive deep from pages to characters and barcodes, allowing access to every element's text and properties.
- Console Output: The program writes each text element to the console. Replace these actions with any function you need to perform using these elements.
This structured approach enables a detailed exploration and utilization of the various data points retrieved by IronOCR.