How to Extract Read Results

by Chaknith Bin

The read or OCR result encompasses a wealth of information pertaining to detected paragraphs, lines, words, and individual characters. For each of these elements, the result provides a comprehensive set of details.

For each element, it provides the text content, precise X and Y coordinates, dimensions (width and height), text direction (Left to Right or Top to Bottom), and location in a CropRectangle object.


C# NuGet Library for OCR

Install with NuGet

Install-Package IronOcr
or
C# OCR DLL

Download DLL

Download DLL

Manually install into your project

Data in OcrResult

The result value doesn't only contain the extracted text but also provides information about pages, paragraphs, lines, words, characters, and barcodes discovered in the PDF and image document by IronOcr. You can access this information from the returned OcrResult object using the Read method.

:path=/static-assets/ocr/content-code-examples/how-to/read-results-output-information.cs
using IronOcr;
using System;
using static IronOcr.OcrResult;

// Instantiate IronTesseract
IronTesseract ocrTesseract = new IronTesseract();

// Add image
using var imageInput = new OcrImageInput("sample.jpg");
// Perform OCR
OcrResult ocrResult = ocrTesseract.Read(imageInput);

// Retrieve list of detected paragraphs
Paragraph[] paragraphs = ocrResult.Paragraphs;

// Output information to console
Console.WriteLine($"Text: {paragraphs[0].Text}");
Console.WriteLine($"X: {paragraphs[0].X}");
Console.WriteLine($"Y: {paragraphs[0].Y}");
Console.WriteLine($"Width: {paragraphs[0].Width}");
Console.WriteLine($"Height: {paragraphs[0].Height}");
Console.WriteLine($"Text direction: {paragraphs[0].TextDirection}");
Imports IronOcr
Imports System
Imports IronOcr.OcrResult

' Instantiate IronTesseract
Private ocrTesseract As New IronTesseract()

' Add image
Private imageInput = New OcrImageInput("sample.jpg")
' Perform OCR
Private ocrResult As OcrResult = ocrTesseract.Read(imageInput)

' Retrieve list of detected paragraphs
Private paragraphs() As Paragraph = ocrResult.Paragraphs

' Output information to console
Console.WriteLine($"Text: {paragraphs(0).Text}")
Console.WriteLine($"X: {paragraphs(0).X}")
Console.WriteLine($"Y: {paragraphs(0).Y}")
Console.WriteLine($"Width: {paragraphs(0).Width}")
Console.WriteLine($"Height: {paragraphs(0).Height}")
Console.WriteLine($"Text direction: {paragraphs(0).TextDirection}")
VB   C#
Data in OcrResult

For each part of the text, like paragraphs, lines, words, and individual characters, we provide the following information:

  • Text: The actual text as a string.
  • X: The position from the left edge of the page in pixels.
  • Y: The position from the top edge of the page in pixels.
  • Width: The width in pixels.
  • Height: The height in pixels.
  • Text Direction: The direction in which the text was read, like 'Left to Right' or 'Top to Bottom.'
  • Location: A rectangle showing where this text is on the page in pixels.

Paragraph, Line, Word, and Character Comparison

Below is the comparison of the detected paragraphs, lines, words, and characters.

Highlight paragraph
Highlight line
Highlight word
Highlight character

Chaknith Bin

Software Engineer

Chaknith is the Sherlock Holmes of developers. It first occurred to him he might have a future in software engineering, when he was doing code challenges for fun. His focus is on IronXL and IronBarcode, but he takes pride in helping customers with every product. Chaknith leverages his knowledge from talking directly with customers, to help further improve the products themselves. His anecdotal feedback goes beyond Jira tickets and supports product development, documentation and marketing, to improve customer’s overall experience.When he isn’t in the office, he can be found learning about machine learning, coding and hiking.