Word and Character OCR Data in C# (Coordinates, Confidence, Bounding Boxes)

After running OCR on a document, the extracted text alone is often not enough. To locate specific values on a page, exclude low-quality detections, or reconstruct the natural reading order on multi-column layouts, you need per-word coordinates, page numbers, region indices, and confidence scores.

The Words and Characters collections on AdvancedOcrResultBase expose this data. Both ReadDocumentAdvanced() for layout-aware documents and ReadPhoto() for camera input return the same granularity available through the standard OcrResult.Words collection.

This guide walks through five common patterns: iterating word data, reconstructing reading order, filtering by confidence, working at the character level, and cropping the source image from a bounding box.

Start a free 30-day trial to test these collections in your pipeline.

NuGet Install with NuGet

PM >  Install-Package IronOcr

Check out IronOCR on NuGet for quick installation. With over 10 million downloads, it’s transforming PDF development with C#. You can also download the DLL or Windows installer.

Quickstart: Read Word and Character Data from an OCR Result

Call ReadDocumentAdvanced (or ReadPhoto) and iterate result.Words to get every recognized word with its coordinates, page number, and confidence score in a few lines.

  1. Install IronOCR with NuGet Package Manager

    PM > Install-Package IronOcr
  2. Copy and run this code snippet.

    var result = new IronTesseract().ReadDocumentAdvanced(new OcrInput("scan.png"));
    foreach (var word in result.Words)
        Console.WriteLine($"{word.Text} @ ({word.X},{word.Y}) conf:{word.RegionConfidence:P0}");
  3. Deploy to test on your live environment

    Start using IronOCR in your project today with a free trial

    arrow pointer


How Do You Iterate Words with Coordinates and Confidence?

The Words collection returns every detected word across every page. Each entry (an AdvancedWord or AdvancedCharacter, both inheriting from AdvancedOcrElement) exposes the text, pixel coordinates, dimensions, the page it belongs to, the region index identifying which detected text block contains it, and a confidence score for that region.

:path=/static-assets/ocr/content-code-examples/how-to/read-document-advanced-iterate-words.cs
using IronOcr;

var ocr = new IronTesseract();
using var input = new OcrInput();
input.LoadImage("receipt.png");

var result = ocr.ReadDocumentAdvanced(input);

foreach (var word in result.Words)
{
    Console.WriteLine(
        $"Page {word.PageNumber} | " +
        $"Region {word.RegionIndex} | " +
        $"'{word.Text}' | " +
        $"Position: ({word.X}, {word.Y}) | " +
        $"Size: {word.Width}x{word.Height} | " +
        $"Confidence: {word.RegionConfidence:P1}"
    );
}

// ToString() override for diagnostic logging
Console.WriteLine(result.Words.First().ToString());
Imports IronOcr

Dim ocr As New IronTesseract()
Using input As New OcrInput()
    input.LoadImage("receipt.png")

    Dim result = ocr.ReadDocumentAdvanced(input)

    For Each word In result.Words
        Console.WriteLine(
            $"Page {word.PageNumber} | " &
            $"Region {word.RegionIndex} | " &
            $"'{word.Text}' | " &
            $"Position: ({word.X}, {word.Y}) | " &
            $"Size: {word.Width}x{word.Height} | " &
            $"Confidence: {word.RegionConfidence:P1}"
        )
    Next
End Using

' ToString() override for diagnostic logging
Console.WriteLine(result.Words.First().ToString())
$vbLabelText   $csharpLabel

TipsPageNumber is 1-based: page one is 1, not 0. This differs from most .NET collections, which use zero-based indexing. RegionIndex follows the standard 0-based convention.

To pass coordinates to drawing or cropping APIs, use the BoundingBox property. It bundles position and size into a single IronSoftware.Drawing.Rectangle.

How Do You Reconstruct Reading Order?

On multi-column layouts, the Words collection iteration order does not match the visual reading order on the page. Words are grouped by detected region, so columns and table cells can be returned out of sequence.

To rebuild a natural top-to-bottom, left-to-right order, sort the collection by Y coordinate first, then by X within each line. A small Y tolerance groups words sitting on the same baseline.

:path=/static-assets/ocr/content-code-examples/how-to/read-document-advanced-reading-order.cs
using IronOcr;
using System.Linq;

var ocr = new IronTesseract();
using var input = new OcrInput();
input.LoadImage("multi-column-doc.png");

var result = ocr.ReadDocumentAdvanced(input);

int targetPage = 1;
int lineThreshold = 10; // pixel tolerance for grouping same-line words

// Sort by line (Y), then left-to-right (X)
var pageWords = result.Words
    .Where(w => w.PageNumber == targetPage)
    .OrderBy(w => w.Y / lineThreshold)
    .ThenBy(w => w.X)
    .ToList();

foreach (var word in pageWords)
{
    Console.Write($"{word.Text} ");
}
Console.WriteLine();
Imports IronOcr
Imports System.Linq

Dim ocr As New IronTesseract()
Using input As New OcrInput()
    input.LoadImage("multi-column-doc.png")

    Dim result = ocr.ReadDocumentAdvanced(input)

    Dim targetPage As Integer = 1
    Dim lineThreshold As Integer = 10 ' pixel tolerance for grouping same-line words

    ' Sort by line (Y), then left-to-right (X)
    Dim pageWords = result.Words _
        .Where(Function(w) w.PageNumber = targetPage) _
        .OrderBy(Function(w) w.Y \ lineThreshold) _
        .ThenBy(Function(w) w.X) _
        .ToList()

    For Each word In pageWords
        Console.Write($"{word.Text} ")
    Next
    Console.WriteLine()
End Using
$vbLabelText   $csharpLabel

Tune lineThreshold to match your document: 10–15 pixels works for standard 12pt text at 300 DPI. Larger headings or handwritten input call for a wider tolerance. This pattern is especially useful on multi-column pages and inside table cells, where the engine detects each column or cell as its own region.

How Do You Filter Low-Confidence Words?

To exclude low-quality detections before they reach your database, search index, or downstream extraction, filter the collection by RegionConfidence. The score ranges from 0.0 to 1.0, with higher values indicating greater confidence in the detected text.

:path=/static-assets/ocr/content-code-examples/how-to/read-document-advanced-confidence-filter.cs
using IronOcr;
using System.Linq;

var ocr = new IronTesseract();
using var input = new OcrInput();
input.LoadImage("noisy-scan.png");

var result = ocr.ReadDocumentAdvanced(input);

double threshold = 0.75;

var highConfidenceWords = result.Words
    .Where(w => w.RegionConfidence >= threshold)
    .ToList();

var lowConfidenceWords = result.Words
    .Where(w => w.RegionConfidence < threshold)
    .ToList();

Console.WriteLine($"Accepted: {highConfidenceWords.Count} words");
Console.WriteLine($"Rejected: {lowConfidenceWords.Count} words");

// Log rejected words for manual review
foreach (var word in lowConfidenceWords)
{
    Console.WriteLine(
        $"  LOW CONF: '{word.Text}' at ({word.X},{word.Y}) — {word.RegionConfidence:P1}"
    );
}
Imports IronOcr
Imports System.Linq

Dim ocr As New IronTesseract()
Using input As New OcrInput()
    input.LoadImage("noisy-scan.png")

    Dim result = ocr.ReadDocumentAdvanced(input)

    Dim threshold As Double = 0.75

    Dim highConfidenceWords = result.Words _
        .Where(Function(w) w.RegionConfidence >= threshold) _
        .ToList()

    Dim lowConfidenceWords = result.Words _
        .Where(Function(w) w.RegionConfidence < threshold) _
        .ToList()

    Console.WriteLine($"Accepted: {highConfidenceWords.Count} words")
    Console.WriteLine($"Rejected: {lowConfidenceWords.Count} words")

    ' Log rejected words for manual review
    For Each word In lowConfidenceWords
        Console.WriteLine(
            $"  LOW CONF: '{word.Text}' at ({word.X},{word.Y}) — {word.RegionConfidence:P1}"
        )
    Next
End Using
$vbLabelText   $csharpLabel

For scans with mixed quality (clear print in some areas, degraded sections elsewhere), this prevents low-confidence output from reaching downstream systems. To raise confidence scores at the source, the image preprocessing filters (Deskew, DeNoise, Binarize) improve quality before the threshold is applied.

How Do You Iterate at the Character Level?

For OCR verification overlays, character-level diffing against ground truth, or precise spatial analysis on form fields, use the Characters collection. It mirrors Words but resolves down to individual characters.

:path=/static-assets/ocr/content-code-examples/how-to/read-document-advanced-characters.cs
using IronOcr;

var ocr = new IronTesseract();
using var input = new OcrInput();
input.LoadImage("form-field.png");

var result = ocr.ReadDocumentAdvanced(input);

foreach (var ch in result.Characters)
{
    Console.WriteLine(
        $"'{ch.Text}' | " +
        $"Box: ({ch.X}, {ch.Y}, {ch.Width}, {ch.Height}) | " +
        $"Page {ch.PageNumber}"
    );
}

// ToString() override provides diagnostic-friendly output
Console.WriteLine(result.Characters.First().ToString());
Imports IronOcr

Dim ocr = New IronTesseract()
Using input = New OcrInput()
    input.LoadImage("form-field.png")

    Dim result = ocr.ReadDocumentAdvanced(input)

    For Each ch In result.Characters
        Console.WriteLine($"'{ch.Text}' | Box: ({ch.X}, {ch.Y}, {ch.Width}, {ch.Height}) | Page {ch.PageNumber}")
    Next

    ' ToString() override provides diagnostic-friendly output
    Console.WriteLine(result.Characters.First().ToString())
End Using
$vbLabelText   $csharpLabel

Please noteBoth Words and Characters are computed lazily and cached. The first access triggers the computation; subsequent accesses return the cached result, so iterating a second time costs nothing.

How Do You Crop the Original Image Using a BoundingBox?

To extract the visual region of a word for verification, annotation, or building labeled training data, pass the BoundingBox property to AnyBitmap.CropRegion(). The bounding box maps directly to the word's position in the source image.

:path=/static-assets/ocr/content-code-examples/how-to/read-document-advanced-crop-boundingbox.cs
using IronOcr;
using IronSoftware.Drawing;

var ocr = new IronTesseract();
using var input = new OcrInput();
input.LoadImage("invoice.png");

var result = ocr.ReadDocumentAdvanced(input);

// Load the original image for cropping
var originalImage = AnyBitmap.FromFile("invoice.png");

// Find a specific word and crop its region
var targetWord = result.Words.FirstOrDefault(w => w.Text == "Total");
if (targetWord != null)
{
    Rectangle cropRect = targetWord.BoundingBox;
    AnyBitmap croppedRegion = originalImage.Clone(cropRect);
    croppedRegion.SaveAs("total-region.png");

    Console.WriteLine(
        $"Cropped '{targetWord.Text}' from " +
        $"({cropRect.X}, {cropRect.Y}, {cropRect.Width}, {cropRect.Height})"
    );
}
Imports IronOcr
Imports IronSoftware.Drawing

Dim ocr As New IronTesseract()
Using input As New OcrInput()
    input.LoadImage("invoice.png")

    Dim result = ocr.ReadDocumentAdvanced(input)

    ' Load the original image for cropping
    Dim originalImage = AnyBitmap.FromFile("invoice.png")

    ' Find a specific word and crop its region
    Dim targetWord = result.Words.FirstOrDefault(Function(w) w.Text = "Total")
    If targetWord IsNot Nothing Then
        Dim cropRect As Rectangle = targetWord.BoundingBox
        Dim croppedRegion As AnyBitmap = originalImage.Clone(cropRect)
        croppedRegion.SaveAs("total-region.png")

        Console.WriteLine(
            $"Cropped '{targetWord.Text}' from " &
            $"({cropRect.X}, {cropRect.Y}, {cropRect.Width}, {cropRect.Height})"
        )
    End If
End Using
$vbLabelText   $csharpLabel

This pattern scales to bulk operations: iterate every word, crop each box, and export a labeled dataset for custom font training or downstream ML pipelines. Coordinates reflect the post-preprocessing image; if filters like EnhanceResolution changed the dimensions, the bounding box matches the processed image, not the original on disk.

Next Steps

The advanced pipeline provides the same spatial detail as IronTesseract.Read(), with additional layout intelligence on top. Related topics:

Start your free 30-day trial or view licensing options.

Frequently Asked Questions

What is Advanced OCR in C#?

Advanced OCR in C# refers to the process of using Optical Character Recognition to extract detailed word and character data, including coordinates, confidence levels, and bounding boxes, using IronOCR's advanced pipeline.

How can I access word data using IronOCR?

You can access word data in IronOCR by iterating through the AdvancedWord collection, which provides detailed information about each word's position and confidence score in the scanned document.

What is the significance of bounding boxes in OCR?

Bounding boxes are crucial in OCR as they define the exact location and dimensions of recognized text elements on the scanned image, enabling precise text extraction and image manipulation.

Can I filter OCR results by confidence score?

Yes, using IronOCR, you can filter OCR results by confidence score to ensure that only text with a high recognition accuracy is considered for further processing.

How do I reconstruct the reading order in OCR results?

Reconstructing the reading order in OCR results is possible by analyzing the sequence of AdvancedWord and AdvancedCharacter objects provided by IronOCR, which reflect the natural reading flow of the document.

Is it possible to crop source images using IronOCR?

IronOCR allows you to crop source images based on the analysis of text data, which includes the bounding boxes and coordinates of recognized words and characters.

What are AdvancedWord and AdvancedCharacter collections?

AdvancedWord and AdvancedCharacter collections in IronOCR are data structures that store detailed information about each recognized word and character, including their coordinates, confidence levels, and bounding boxes.

How does IronOCR handle character recognition?

IronOCR handles character recognition by utilizing an advanced pipeline that analyzes each character's features, providing detailed data such as its position, size, and recognition confidence.

What type of documents can be processed with IronOCR?

IronOCR can process a wide range of document types including PDFs, scanned images, and photos, extracting text data with high accuracy and detail.

Is there a free trial available for IronOCR?

Yes, Iron Software offers a free trial of IronOCR, allowing users to test its features and capabilities before making a purchase decision.

Darrius Serrant
Full Stack Software Engineer (WebOps)

Darrius Serrant holds a Bachelor’s degree in Computer Science from the University of Miami and works as a Full Stack WebOps Marketing Engineer at Iron Software. Drawn to coding from a young age, he saw computing as both mysterious and accessible, making it the perfect medium for creativity ...

Read More
Ready to Get Started?
Nuget Downloads 5,954,371 | Version: 2026.6 just released
Still Scrolling Icon

Still Scrolling?

Want proof fast? PM > Install-Package IronOcr
run a sample watch your image become searchable text.