How to Extract Read Results

by Chaknith Bin

The read or OCR result encompasses a wealth of information pertaining to detected paragraphs, lines, words, and individual characters. For each of these elements, the result provides a comprehensive set of details.

For each element, it provides the text content, precise X and Y coordinates, dimensions (width and height), text direction (Left to Right or Top to Bottom), and location in a CropRectangle object.


C# NuGet Library for OCR

Install with NuGet

Install-Package IronOcr
or
Java PDF JAR

Download DLL

Download DLL

Manually install into your project

Data in OcrResult

The result value doesn't only contain the extracted text but also provides information about pages, paragraphs, lines, words, characters, and barcodes discovered in the PDF and image document by IronOcr. You can access this information from the returned OcrResult object using the Read method.

:path=/static-assets/ocr/content-code-examples/how-to/read-results-output-information.cs
using IronOcr;
using System;
using static IronOcr.OcrResult;

// Instantiate IronTesseract
IronTesseract ocrTesseract = new IronTesseract();

// Add image
using var imageInput = new OcrImageInput("sample.jpg");
// Perform OCR
OcrResult ocrResult = ocrTesseract.Read(imageInput);

// Retrieve list of detected paragraphs
Paragraph[] paragraphs = ocrResult.Paragraphs;

// Output information to console
Console.WriteLine($"Text: {paragraphs[0].Text}");
Console.WriteLine($"X: {paragraphs[0].X}");
Console.WriteLine($"Y: {paragraphs[0].Y}");
Console.WriteLine($"Width: {paragraphs[0].Width}");
Console.WriteLine($"Height: {paragraphs[0].Height}");
Console.WriteLine($"Text direction: {paragraphs[0].TextDirection}");
Imports IronOcr
Imports System
Imports IronOcr.OcrResult

' Instantiate IronTesseract
Private ocrTesseract As New IronTesseract()

' Add image
Private imageInput = New OcrImageInput("sample.jpg")
' Perform OCR
Private ocrResult As OcrResult = ocrTesseract.Read(imageInput)

' Retrieve list of detected paragraphs
Private paragraphs() As Paragraph = ocrResult.Paragraphs

' Output information to console
Console.WriteLine($"Text: {paragraphs(0).Text}")
Console.WriteLine($"X: {paragraphs(0).X}")
Console.WriteLine($"Y: {paragraphs(0).Y}")
Console.WriteLine($"Width: {paragraphs(0).Width}")
Console.WriteLine($"Height: {paragraphs(0).Height}")
Console.WriteLine($"Text direction: {paragraphs(0).TextDirection}")
VB   C#
Data in OcrResult

For each part of the text, like paragraphs, lines, words, and individual characters, we provide the following information:

  • Text: The actual text as a string.
  • X: The position from the left edge of the page in pixels.
  • Y: The position from the top edge of the page in pixels.
  • Width: The width in pixels.
  • Height: The height in pixels.
  • Text Direction: The direction in which the text was read, like 'Left to Right' or 'Top to Bottom.'
  • Location: A rectangle showing where this text is on the page in pixels.

Paragraph, Line, Word, and Character Comparison

Below is the comparison of the detected paragraphs, lines, words, and characters.

Highlight paragraph
Highlight line
Highlight word
Highlight character

Barcode and QR Code

That’s correct! IronOcr can read barcodes and QR codes. While the feature may not be as robust as IronBarcode, IronOcr does provide support for common barcode types. To enable barcode detection, set the Configuration.ReadBarCodes property to true.

Additionally, valuable information can be extracted from the detected barcode, including its format, value, coordinates (x, y), height, width, and location as IronSoftware.Drawing.Rectangle object. This Rectangle class in IronDrawing allows for precise positioning on the document.

:path=/static-assets/ocr/content-code-examples/how-to/read-results-barcodes.cs
using IronOcr;
using System;
using static IronOcr.OcrResult;

// Instantiate IronTesseract
IronTesseract ocrTesseract = new IronTesseract();

// Enable barcodes detection
ocrTesseract.Configuration.ReadBarCodes = true;

// Add image
using OcrInput ocrInput = new OcrInput();
ocrInput.LoadPdf("sample.pdf");

// Perform OCR
OcrResult ocrResult = ocrTesseract.Read(ocrInput);

// Output information to console
foreach(var barcode in ocrResult.Barcodes)
{
    Console.WriteLine("Format = " + barcode.Format);
    Console.WriteLine("Value = " + barcode.Value);
    Console.WriteLine("X = " + barcode.X);
    Console.WriteLine("Y = " + barcode.Y);
}
Console.WriteLine(ocrResult.Text);
Imports IronOcr
Imports System
Imports IronOcr.OcrResult

' Instantiate IronTesseract
Private ocrTesseract As New IronTesseract()

' Enable barcodes detection
ocrTesseract.Configuration.ReadBarCodes = True

' Add image
Using ocrInput As New OcrInput()
	ocrInput.LoadPdf("sample.pdf")
	
	' Perform OCR
	Dim ocrResult As OcrResult = ocrTesseract.Read(ocrInput)
	
	' Output information to console
	For Each barcode In ocrResult.Barcodes
		Console.WriteLine("Format = " & barcode.Format)
		Console.WriteLine("Value = " & barcode.Value)
		Console.WriteLine("X = " & barcode.X)
		Console.WriteLine("Y = " & barcode.Y)
	Next barcode
	Console.WriteLine(ocrResult.Text)
End Using
VB   C#

Output

Detect barcodes

Chaknith Bin

Software Engineer

Chaknith is the Sherlock Holmes of developers. It first occurred to him he might have a future in software engineering, when he was doing code challenges for fun. His focus is on IronXL and IronBarcode, but he takes pride in helping customers with every product. Chaknith leverages his knowledge from talking directly with customers, to help further improve the products themselves. His anecdotal feedback goes beyond Jira tickets and supports product development, documentation and marketing, to improve customer’s overall experience.When he isn’t in the office, he can be found learning about machine learning, coding and hiking.