Word and Character OCR Data in C# (Coordinates, Confidence, Bounding Boxes)

After running OCR on a document, the extracted text alone is often not enough. To locate specific values on a page, exclude low-quality detections, or reconstruct the natural reading order on multi-column layouts, you need per-word coordinates, page numbers, region indices, and confidence scores.

The Words and Characters collections on AdvancedOcrResultBase expose this data. Both ReadDocumentAdvanced() for layout-aware documents and ReadPhoto() for camera input return the same granularity available through the standard OcrResult.Words collection.

This guide walks through five common patterns: iterating word data, reconstructing reading order, filtering by confidence, working at the character level, and cropping the source image from a bounding box.

Start a free 30-day trial to test these collections in your pipeline.

NuGet Install with NuGet

PM >  Install-Package IronOcr

Check out IronOCR on NuGet for quick installation. With over 10 million downloads, it’s transforming PDF development with C#. You can also download the DLL or Windows installer.

Quickstart: Read Word and Character Data from an OCR Result

Call ReadDocumentAdvanced (or ReadPhoto) and iterate result.Words to get every recognized word with its coordinates, page number, and confidence score in a few lines.

  1. Install IronOCR with NuGet Package Manager

    PM > Install-Package IronOcr
  2. Copy and run this code snippet.

    var result = new IronTesseract().ReadDocumentAdvanced(new OcrInput("scan.png"));
    foreach (var word in result.Words)
        Console.WriteLine($"{word.Text} @ ({word.X},{word.Y}) conf:{word.RegionConfidence:P0}");
  3. Deploy to test on your live environment

    Start using IronOCR in your project today with a free trial

    arrow pointer


How Do You Iterate Words with Coordinates and Confidence?

The Words collection returns every detected word across every page. Each entry (an AdvancedWord or AdvancedCharacter, both inheriting from AdvancedOcrElement) exposes the text, pixel coordinates, dimensions, the page it belongs to, the region index identifying which detected text block contains it, and a confidence score for that region.

:path=/static-assets/ocr/content-code-examples/how-to/read-document-advanced-iterate-words.cs
using IronOcr;

var ocr = new IronTesseract();
using var input = new OcrInput();
input.LoadImage("receipt.png");

var result = ocr.ReadDocumentAdvanced(input);

foreach (var word in result.Words)
{
    Console.WriteLine(
        $"Page {word.PageNumber} | " +
        $"Region {word.RegionIndex} | " +
        $"'{word.Text}' | " +
        $"Position: ({word.X}, {word.Y}) | " +
        $"Size: {word.Width}x{word.Height} | " +
        $"Confidence: {word.RegionConfidence:P1}"
    );
}

// ToString() override for diagnostic logging
Console.WriteLine(result.Words.First().ToString());
Imports IronOcr

Dim ocr As New IronTesseract()
Using input As New OcrInput()
    input.LoadImage("receipt.png")

    Dim result = ocr.ReadDocumentAdvanced(input)

    For Each word In result.Words
        Console.WriteLine(
            $"Page {word.PageNumber} | " &
            $"Region {word.RegionIndex} | " &
            $"'{word.Text}' | " &
            $"Position: ({word.X}, {word.Y}) | " &
            $"Size: {word.Width}x{word.Height} | " &
            $"Confidence: {word.RegionConfidence:P1}"
        )
    Next
End Using

' ToString() override for diagnostic logging
Console.WriteLine(result.Words.First().ToString())
$vbLabelText   $csharpLabel

TipsPageNumber is 1-based: page one is 1, not 0. This differs from most .NET collections, which use zero-based indexing. RegionIndex follows the standard 0-based convention.

To pass coordinates to drawing or cropping APIs, use the BoundingBox property. It bundles position and size into a single IronSoftware.Drawing.Rectangle.

How Do You Reconstruct Reading Order?

On multi-column layouts, the Words collection iteration order does not match the visual reading order on the page. Words are grouped by detected region, so columns and table cells can be returned out of sequence.

To rebuild a natural top-to-bottom, left-to-right order, sort the collection by Y coordinate first, then by X within each line. A small Y tolerance groups words sitting on the same baseline.

:path=/static-assets/ocr/content-code-examples/how-to/read-document-advanced-reading-order.cs
using IronOcr;
using System.Linq;

var ocr = new IronTesseract();
using var input = new OcrInput();
input.LoadImage("multi-column-doc.png");

var result = ocr.ReadDocumentAdvanced(input);

int targetPage = 1;
int lineThreshold = 10; // pixel tolerance for grouping same-line words

// Sort by line (Y), then left-to-right (X)
var pageWords = result.Words
    .Where(w => w.PageNumber == targetPage)
    .OrderBy(w => w.Y / lineThreshold)
    .ThenBy(w => w.X)
    .ToList();

foreach (var word in pageWords)
{
    Console.Write($"{word.Text} ");
}
Console.WriteLine();
Imports IronOcr
Imports System.Linq

Dim ocr As New IronTesseract()
Using input As New OcrInput()
    input.LoadImage("multi-column-doc.png")

    Dim result = ocr.ReadDocumentAdvanced(input)

    Dim targetPage As Integer = 1
    Dim lineThreshold As Integer = 10 ' pixel tolerance for grouping same-line words

    ' Sort by line (Y), then left-to-right (X)
    Dim pageWords = result.Words _
        .Where(Function(w) w.PageNumber = targetPage) _
        .OrderBy(Function(w) w.Y \ lineThreshold) _
        .ThenBy(Function(w) w.X) _
        .ToList()

    For Each word In pageWords
        Console.Write($"{word.Text} ")
    Next
    Console.WriteLine()
End Using
$vbLabelText   $csharpLabel

Tune lineThreshold to match your document: 10–15 pixels works for standard 12pt text at 300 DPI. Larger headings or handwritten input call for a wider tolerance. This pattern is especially useful on multi-column pages and inside table cells, where the engine detects each column or cell as its own region.

How Do You Filter Low-Confidence Words?

To exclude low-quality detections before they reach your database, search index, or downstream extraction, filter the collection by RegionConfidence. The score ranges from 0.0 to 1.0, with higher values indicating greater confidence in the detected text.

:path=/static-assets/ocr/content-code-examples/how-to/read-document-advanced-confidence-filter.cs
using IronOcr;
using System.Linq;

var ocr = new IronTesseract();
using var input = new OcrInput();
input.LoadImage("noisy-scan.png");

var result = ocr.ReadDocumentAdvanced(input);

double threshold = 0.75;

var highConfidenceWords = result.Words
    .Where(w => w.RegionConfidence >= threshold)
    .ToList();

var lowConfidenceWords = result.Words
    .Where(w => w.RegionConfidence < threshold)
    .ToList();

Console.WriteLine($"Accepted: {highConfidenceWords.Count} words");
Console.WriteLine($"Rejected: {lowConfidenceWords.Count} words");

// Log rejected words for manual review
foreach (var word in lowConfidenceWords)
{
    Console.WriteLine(
        $"  LOW CONF: '{word.Text}' at ({word.X},{word.Y}) — {word.RegionConfidence:P1}"
    );
}
Imports IronOcr
Imports System.Linq

Dim ocr As New IronTesseract()
Using input As New OcrInput()
    input.LoadImage("noisy-scan.png")

    Dim result = ocr.ReadDocumentAdvanced(input)

    Dim threshold As Double = 0.75

    Dim highConfidenceWords = result.Words _
        .Where(Function(w) w.RegionConfidence >= threshold) _
        .ToList()

    Dim lowConfidenceWords = result.Words _
        .Where(Function(w) w.RegionConfidence < threshold) _
        .ToList()

    Console.WriteLine($"Accepted: {highConfidenceWords.Count} words")
    Console.WriteLine($"Rejected: {lowConfidenceWords.Count} words")

    ' Log rejected words for manual review
    For Each word In lowConfidenceWords
        Console.WriteLine(
            $"  LOW CONF: '{word.Text}' at ({word.X},{word.Y}) — {word.RegionConfidence:P1}"
        )
    Next
End Using
$vbLabelText   $csharpLabel

For scans with mixed quality (clear print in some areas, degraded sections elsewhere), this prevents low-confidence output from reaching downstream systems. To raise confidence scores at the source, the image preprocessing filters (Deskew, DeNoise, Binarize) improve quality before the threshold is applied.

How Do You Iterate at the Character Level?

For OCR verification overlays, character-level diffing against ground truth, or precise spatial analysis on form fields, use the Characters collection. It mirrors Words but resolves down to individual characters.

:path=/static-assets/ocr/content-code-examples/how-to/read-document-advanced-characters.cs
using IronOcr;

var ocr = new IronTesseract();
using var input = new OcrInput();
input.LoadImage("form-field.png");

var result = ocr.ReadDocumentAdvanced(input);

foreach (var ch in result.Characters)
{
    Console.WriteLine(
        $"'{ch.Text}' | " +
        $"Box: ({ch.X}, {ch.Y}, {ch.Width}, {ch.Height}) | " +
        $"Page {ch.PageNumber}"
    );
}

// ToString() override provides diagnostic-friendly output
Console.WriteLine(result.Characters.First().ToString());
Imports IronOcr

Dim ocr = New IronTesseract()
Using input = New OcrInput()
    input.LoadImage("form-field.png")

    Dim result = ocr.ReadDocumentAdvanced(input)

    For Each ch In result.Characters
        Console.WriteLine($"'{ch.Text}' | Box: ({ch.X}, {ch.Y}, {ch.Width}, {ch.Height}) | Page {ch.PageNumber}")
    Next

    ' ToString() override provides diagnostic-friendly output
    Console.WriteLine(result.Characters.First().ToString())
End Using
$vbLabelText   $csharpLabel

Please noteBoth Words and Characters are computed lazily and cached. The first access triggers the computation; subsequent accesses return the cached result, so iterating a second time costs nothing.

How Do You Crop the Original Image Using a BoundingBox?

To extract the visual region of a word for verification, annotation, or building labeled training data, pass the BoundingBox property to AnyBitmap.CropRegion(). The bounding box maps directly to the word's position in the source image.

:path=/static-assets/ocr/content-code-examples/how-to/read-document-advanced-crop-boundingbox.cs
using IronOcr;
using IronSoftware.Drawing;

var ocr = new IronTesseract();
using var input = new OcrInput();
input.LoadImage("invoice.png");

var result = ocr.ReadDocumentAdvanced(input);

// Load the original image for cropping
var originalImage = AnyBitmap.FromFile("invoice.png");

// Find a specific word and crop its region
var targetWord = result.Words.FirstOrDefault(w => w.Text == "Total");
if (targetWord != null)
{
    Rectangle cropRect = targetWord.BoundingBox;
    AnyBitmap croppedRegion = originalImage.CropRegion(cropRect);
    croppedRegion.SaveAs("total-region.png");

    Console.WriteLine(
        $"Cropped '{targetWord.Text}' from " +
        $"({cropRect.X}, {cropRect.Y}, {cropRect.Width}, {cropRect.Height})"
    );
}
Imports IronOcr
Imports IronSoftware.Drawing

Dim ocr As New IronTesseract()
Using input As New OcrInput()
    input.LoadImage("invoice.png")

    Dim result = ocr.ReadDocumentAdvanced(input)

    ' Load the original image for cropping
    Dim originalImage = AnyBitmap.FromFile("invoice.png")

    ' Find a specific word and crop its region
    Dim targetWord = result.Words.FirstOrDefault(Function(w) w.Text = "Total")
    If targetWord IsNot Nothing Then
        Dim cropRect As Rectangle = targetWord.BoundingBox
        Dim croppedRegion As AnyBitmap = originalImage.CropRegion(cropRect)
        croppedRegion.SaveAs("total-region.png")

        Console.WriteLine(
            $"Cropped '{targetWord.Text}' from " &
            $"({cropRect.X}, {cropRect.Y}, {cropRect.Width}, {cropRect.Height})"
        )
    End If
End Using
$vbLabelText   $csharpLabel

This pattern scales to bulk operations: iterate every word, crop each box, and export a labeled dataset for custom font training or downstream ML pipelines. Coordinates reflect the post-preprocessing image; if filters like EnhanceResolution changed the dimensions, the bounding box matches the processed image, not the original on disk.

Next Steps

The advanced pipeline provides the same spatial detail as IronTesseract.Read(), with additional layout intelligence on top. Related topics:

Start your free 30-day trial or view licensing options.

Frequently Asked Questions

What is OCR and why is it important?

OCR, or Optical Character Recognition, is a technology that converts different types of documents, such as scanned paper documents, PDFs, or images captured by a digital camera, into editable and searchable data. OCR is important because it automates data extraction, reduces manual data entry, and makes information easily accessible and editable.

How does IronOCR enhance the OCR process?

IronOCR enhances the OCR process by providing accurate and high-speed text recognition capabilities. It supports multiple languages and includes features like image pre-processing to improve text recognition accuracy.

Can IronOCR handle multi-page documents?

Yes, IronOCR can process multi-page documents efficiently, extracting text from each page and allowing users to work with the entire document as a cohesive unit.

What file formats does IronOCR support?

IronOCR supports a wide range of file formats including PDF, TIFF, JPEG, PNG, and BMP, allowing flexibility in the types of documents it can process.

Is IronOCR suitable for recognizing text in low-quality images?

Yes, IronOCR includes advanced image pre-processing features that enhance the quality of low-resolution or poor-quality images, increasing the accuracy of text recognition.

Does IronOCR support multiple languages?

IronOCR supports multiple languages, making it a versatile tool for global applications that require text recognition in different languages.

Can IronOCR be integrated into existing applications?

IronOCR is designed to be easily integrated into existing applications using C#, allowing developers to add OCR functionality to their software with minimal effort.

What are the benefits of using IronOCR for document management?

Using IronOCR for document management streamlines the workflow by converting scanned documents into searchable and editable text, reducing the need for manual data entry and improving document accessibility.

How can IronOCR improve data accuracy?

IronOCR improves data accuracy through its advanced recognition algorithms and image correction features, ensuring that the text extraction process is both reliable and precise.

Is there a free trial available for IronOCR?

Yes, Iron Software offers a free trial of IronOCR, allowing users to test its features and capabilities before making a purchase decision.

Darrius Serrant
Full Stack Software Engineer (WebOps)

Darrius Serrant holds a Bachelor’s degree in Computer Science from the University of Miami and works as a Full Stack WebOps Marketing Engineer at Iron Software. Drawn to coding from a young age, he saw computing as both mysterious and accessible, making it the perfect medium for creativity ...

Read More
Ready to Get Started?
Nuget Downloads 5,879,654 | Version: 2026.5 just released
Still Scrolling Icon

Still Scrolling?

Want proof fast? PM > Install-Package IronOcr
run a sample watch your image become searchable text.