How to Extract Read Results
The read or OCR result encompasses a wealth of information pertaining to detected paragraphs, lines, words, and individual characters. For each of these elements, the result provides a comprehensive set of details.
For each element, it provides the text content, precise X and Y coordinates, dimensions (width and height), text direction (Left to Right or Top to Bottom), and location in a CropRectangle object.
How to Extract Read Results
- Download a C# library for accessing read results
- Prepare the target image and PDF document
- Use the
Read
method to perform OCR on the imported document - Access the X, Y, width, height, and text direction of the result
- Check the detected paragraphs, lines, words, and character comparisons
Install with NuGet
Install-Package IronOcr
Download DLL
Manually install into your project
Data in OcrResult
The result value doesn't only contain the extracted text but also provides information about pages, paragraphs, lines, words, characters, and barcodes discovered in the PDF and image document by IronOcr. You can access this information from the returned OcrResult object using the Read
method.
:path=/static-assets/ocr/content-code-examples/how-to/read-results-output-information.cs
using IronOcr;
using System;
using static IronOcr.OcrResult;
// Instantiate IronTesseract
IronTesseract ocrTesseract = new IronTesseract();
// Add image
using var imageInput = new OcrImageInput("sample.jpg");
// Perform OCR
OcrResult ocrResult = ocrTesseract.Read(imageInput);
// Retrieve list of detected paragraphs
Paragraph[] paragraphs = ocrResult.Paragraphs;
// Output information to console
Console.WriteLine($"Text: {paragraphs[0].Text}");
Console.WriteLine($"X: {paragraphs[0].X}");
Console.WriteLine($"Y: {paragraphs[0].Y}");
Console.WriteLine($"Width: {paragraphs[0].Width}");
Console.WriteLine($"Height: {paragraphs[0].Height}");
Console.WriteLine($"Text direction: {paragraphs[0].TextDirection}");
Imports IronOcr
Imports System
Imports IronOcr.OcrResult
' Instantiate IronTesseract
Private ocrTesseract As New IronTesseract()
' Add image
Private imageInput = New OcrImageInput("sample.jpg")
' Perform OCR
Private ocrResult As OcrResult = ocrTesseract.Read(imageInput)
' Retrieve list of detected paragraphs
Private paragraphs() As Paragraph = ocrResult.Paragraphs
' Output information to console
Console.WriteLine($"Text: {paragraphs(0).Text}")
Console.WriteLine($"X: {paragraphs(0).X}")
Console.WriteLine($"Y: {paragraphs(0).Y}")
Console.WriteLine($"Width: {paragraphs(0).Width}")
Console.WriteLine($"Height: {paragraphs(0).Height}")
Console.WriteLine($"Text direction: {paragraphs(0).TextDirection}")
For each part of the text, like paragraphs, lines, words, and individual characters, we provide the following information:
- Text: The actual text as a string.
- X: The position from the left edge of the page in pixels.
- Y: The position from the top edge of the page in pixels.
- Width: The width in pixels.
- Height: The height in pixels.
- Text Direction: The direction in which the text was read, like 'Left to Right' or 'Top to Bottom.'
- Location: A rectangle showing where this text is on the page in pixels.
Paragraph, Line, Word, and Character Comparison
Below is the comparison of the detected paragraphs, lines, words, and characters.
Paragraph | Line |
Word | Character |