Word and Character OCR Data in C# (Coordinates, Confidence, Bounding Boxes)
After running OCR on a document, the extracted text alone is often not enough. To locate specific values on a page, exclude low-quality detections, or reconstruct the natural reading order on multi-column layouts, you need per-word coordinates, page numbers, region indices, and confidence scores.
The Words and Characters collections on AdvancedOcrResultBase expose this data. Both ReadDocumentAdvanced() for layout-aware documents and ReadPhoto() for camera input return the same granularity available through the standard OcrResult.Words collection.
This guide walks through five common patterns: iterating word data, reconstructing reading order, filtering by confidence, working at the character level, and cropping the source image from a bounding box.
Start a free 30-day trial to test these collections in your pipeline.
Quickstart: Read Word and Character Data from an OCR Result
Call ReadDocumentAdvanced (or ReadPhoto) and iterate result.Words to get every recognized word with its coordinates, page number, and confidence score in a few lines.
-
Install IronOCR with NuGet Package Manager
PM > Install-Package IronOcr -
Copy and run this code snippet.
var result = new IronTesseract().ReadDocumentAdvanced(new OcrInput("scan.png")); foreach (var word in result.Words) Console.WriteLine($"{word.Text} @ ({word.X},{word.Y}) conf:{word.RegionConfidence:P0}"); -
Deploy to test on your live environment
Start using IronOCR in your project today with a free trial
Minimal Workflow (3 steps)
- Download the C# OCR library from NuGet
- Run advanced OCR with
ReadDocumentAdvancedorReadPhotoon your input - Iterate
result.Wordsorresult.Charactersfor coordinates, confidence, and bounding boxes
How Do You Iterate Words with Coordinates and Confidence?
The Words collection returns every detected word across every page. Each entry (an AdvancedWord or AdvancedCharacter, both inheriting from AdvancedOcrElement) exposes the text, pixel coordinates, dimensions, the page it belongs to, the region index identifying which detected text block contains it, and a confidence score for that region.
:path=/static-assets/ocr/content-code-examples/how-to/read-document-advanced-iterate-words.cs
using IronOcr;
var ocr = new IronTesseract();
using var input = new OcrInput();
input.LoadImage("receipt.png");
var result = ocr.ReadDocumentAdvanced(input);
foreach (var word in result.Words)
{
Console.WriteLine(
$"Page {word.PageNumber} | " +
$"Region {word.RegionIndex} | " +
$"'{word.Text}' | " +
$"Position: ({word.X}, {word.Y}) | " +
$"Size: {word.Width}x{word.Height} | " +
$"Confidence: {word.RegionConfidence:P1}"
);
}
// ToString() override for diagnostic logging
Console.WriteLine(result.Words.First().ToString());
Imports IronOcr
Dim ocr As New IronTesseract()
Using input As New OcrInput()
input.LoadImage("receipt.png")
Dim result = ocr.ReadDocumentAdvanced(input)
For Each word In result.Words
Console.WriteLine(
$"Page {word.PageNumber} | " &
$"Region {word.RegionIndex} | " &
$"'{word.Text}' | " &
$"Position: ({word.X}, {word.Y}) | " &
$"Size: {word.Width}x{word.Height} | " &
$"Confidence: {word.RegionConfidence:P1}"
)
Next
End Using
' ToString() override for diagnostic logging
Console.WriteLine(result.Words.First().ToString())
PageNumber is 1-based: page one is 1, not 0. This differs from most .NET collections, which use zero-based indexing. RegionIndex follows the standard 0-based convention.To pass coordinates to drawing or cropping APIs, use the BoundingBox property. It bundles position and size into a single IronSoftware.Drawing.Rectangle.
How Do You Reconstruct Reading Order?
On multi-column layouts, the Words collection iteration order does not match the visual reading order on the page. Words are grouped by detected region, so columns and table cells can be returned out of sequence.
To rebuild a natural top-to-bottom, left-to-right order, sort the collection by Y coordinate first, then by X within each line. A small Y tolerance groups words sitting on the same baseline.
:path=/static-assets/ocr/content-code-examples/how-to/read-document-advanced-reading-order.cs
using IronOcr;
using System.Linq;
var ocr = new IronTesseract();
using var input = new OcrInput();
input.LoadImage("multi-column-doc.png");
var result = ocr.ReadDocumentAdvanced(input);
int targetPage = 1;
int lineThreshold = 10; // pixel tolerance for grouping same-line words
// Sort by line (Y), then left-to-right (X)
var pageWords = result.Words
.Where(w => w.PageNumber == targetPage)
.OrderBy(w => w.Y / lineThreshold)
.ThenBy(w => w.X)
.ToList();
foreach (var word in pageWords)
{
Console.Write($"{word.Text} ");
}
Console.WriteLine();
Imports IronOcr
Imports System.Linq
Dim ocr As New IronTesseract()
Using input As New OcrInput()
input.LoadImage("multi-column-doc.png")
Dim result = ocr.ReadDocumentAdvanced(input)
Dim targetPage As Integer = 1
Dim lineThreshold As Integer = 10 ' pixel tolerance for grouping same-line words
' Sort by line (Y), then left-to-right (X)
Dim pageWords = result.Words _
.Where(Function(w) w.PageNumber = targetPage) _
.OrderBy(Function(w) w.Y \ lineThreshold) _
.ThenBy(Function(w) w.X) _
.ToList()
For Each word In pageWords
Console.Write($"{word.Text} ")
Next
Console.WriteLine()
End Using
Tune lineThreshold to match your document: 10–15 pixels works for standard 12pt text at 300 DPI. Larger headings or handwritten input call for a wider tolerance. This pattern is especially useful on multi-column pages and inside table cells, where the engine detects each column or cell as its own region.
How Do You Filter Low-Confidence Words?
To exclude low-quality detections before they reach your database, search index, or downstream extraction, filter the collection by RegionConfidence. The score ranges from 0.0 to 1.0, with higher values indicating greater confidence in the detected text.
:path=/static-assets/ocr/content-code-examples/how-to/read-document-advanced-confidence-filter.cs
using IronOcr;
using System.Linq;
var ocr = new IronTesseract();
using var input = new OcrInput();
input.LoadImage("noisy-scan.png");
var result = ocr.ReadDocumentAdvanced(input);
double threshold = 0.75;
var highConfidenceWords = result.Words
.Where(w => w.RegionConfidence >= threshold)
.ToList();
var lowConfidenceWords = result.Words
.Where(w => w.RegionConfidence < threshold)
.ToList();
Console.WriteLine($"Accepted: {highConfidenceWords.Count} words");
Console.WriteLine($"Rejected: {lowConfidenceWords.Count} words");
// Log rejected words for manual review
foreach (var word in lowConfidenceWords)
{
Console.WriteLine(
$" LOW CONF: '{word.Text}' at ({word.X},{word.Y}) — {word.RegionConfidence:P1}"
);
}
Imports IronOcr
Imports System.Linq
Dim ocr As New IronTesseract()
Using input As New OcrInput()
input.LoadImage("noisy-scan.png")
Dim result = ocr.ReadDocumentAdvanced(input)
Dim threshold As Double = 0.75
Dim highConfidenceWords = result.Words _
.Where(Function(w) w.RegionConfidence >= threshold) _
.ToList()
Dim lowConfidenceWords = result.Words _
.Where(Function(w) w.RegionConfidence < threshold) _
.ToList()
Console.WriteLine($"Accepted: {highConfidenceWords.Count} words")
Console.WriteLine($"Rejected: {lowConfidenceWords.Count} words")
' Log rejected words for manual review
For Each word In lowConfidenceWords
Console.WriteLine(
$" LOW CONF: '{word.Text}' at ({word.X},{word.Y}) — {word.RegionConfidence:P1}"
)
Next
End Using
For scans with mixed quality (clear print in some areas, degraded sections elsewhere), this prevents low-confidence output from reaching downstream systems. To raise confidence scores at the source, the image preprocessing filters (Deskew, DeNoise, Binarize) improve quality before the threshold is applied.
How Do You Iterate at the Character Level?
For OCR verification overlays, character-level diffing against ground truth, or precise spatial analysis on form fields, use the Characters collection. It mirrors Words but resolves down to individual characters.
:path=/static-assets/ocr/content-code-examples/how-to/read-document-advanced-characters.cs
using IronOcr;
var ocr = new IronTesseract();
using var input = new OcrInput();
input.LoadImage("form-field.png");
var result = ocr.ReadDocumentAdvanced(input);
foreach (var ch in result.Characters)
{
Console.WriteLine(
$"'{ch.Text}' | " +
$"Box: ({ch.X}, {ch.Y}, {ch.Width}, {ch.Height}) | " +
$"Page {ch.PageNumber}"
);
}
// ToString() override provides diagnostic-friendly output
Console.WriteLine(result.Characters.First().ToString());
Imports IronOcr
Dim ocr = New IronTesseract()
Using input = New OcrInput()
input.LoadImage("form-field.png")
Dim result = ocr.ReadDocumentAdvanced(input)
For Each ch In result.Characters
Console.WriteLine($"'{ch.Text}' | Box: ({ch.X}, {ch.Y}, {ch.Width}, {ch.Height}) | Page {ch.PageNumber}")
Next
' ToString() override provides diagnostic-friendly output
Console.WriteLine(result.Characters.First().ToString())
End Using
Words and Characters are computed lazily and cached. The first access triggers the computation; subsequent accesses return the cached result, so iterating a second time costs nothing.How Do You Crop the Original Image Using a BoundingBox?
To extract the visual region of a word for verification, annotation, or building labeled training data, pass the BoundingBox property to AnyBitmap.CropRegion(). The bounding box maps directly to the word's position in the source image.
:path=/static-assets/ocr/content-code-examples/how-to/read-document-advanced-crop-boundingbox.cs
using IronOcr;
using IronSoftware.Drawing;
var ocr = new IronTesseract();
using var input = new OcrInput();
input.LoadImage("invoice.png");
var result = ocr.ReadDocumentAdvanced(input);
// Load the original image for cropping
var originalImage = AnyBitmap.FromFile("invoice.png");
// Find a specific word and crop its region
var targetWord = result.Words.FirstOrDefault(w => w.Text == "Total");
if (targetWord != null)
{
Rectangle cropRect = targetWord.BoundingBox;
AnyBitmap croppedRegion = originalImage.Clone(cropRect);
croppedRegion.SaveAs("total-region.png");
Console.WriteLine(
$"Cropped '{targetWord.Text}' from " +
$"({cropRect.X}, {cropRect.Y}, {cropRect.Width}, {cropRect.Height})"
);
}
Imports IronOcr
Imports IronSoftware.Drawing
Dim ocr As New IronTesseract()
Using input As New OcrInput()
input.LoadImage("invoice.png")
Dim result = ocr.ReadDocumentAdvanced(input)
' Load the original image for cropping
Dim originalImage = AnyBitmap.FromFile("invoice.png")
' Find a specific word and crop its region
Dim targetWord = result.Words.FirstOrDefault(Function(w) w.Text = "Total")
If targetWord IsNot Nothing Then
Dim cropRect As Rectangle = targetWord.BoundingBox
Dim croppedRegion As AnyBitmap = originalImage.Clone(cropRect)
croppedRegion.SaveAs("total-region.png")
Console.WriteLine(
$"Cropped '{targetWord.Text}' from " &
$"({cropRect.X}, {cropRect.Y}, {cropRect.Width}, {cropRect.Height})"
)
End If
End Using
This pattern scales to bulk operations: iterate every word, crop each box, and export a labeled dataset for custom font training or downstream ML pipelines. Coordinates reflect the post-preprocessing image; if filters like EnhanceResolution changed the dimensions, the bounding box matches the processed image, not the original on disk.
Next Steps
The advanced pipeline provides the same spatial detail as IronTesseract.Read(), with additional layout intelligence on top. Related topics:
- Table extraction guide: covers the
Tablesproperty onReadDocumentAdvancedfor structured cell data. - Reading OCR results: word data for the standard pipeline.
- Image quality correction: preprocessing filters that raise confidence scores.
- OCR tutorial: end-to-end setup for new users.
Frequently Asked Questions
What is Advanced OCR in C#?
Advanced OCR in C# refers to the process of using Optical Character Recognition to extract detailed word and character data, including coordinates, confidence levels, and bounding boxes, using IronOCR's advanced pipeline.
How can I access word data using IronOCR?
You can access word data in IronOCR by iterating through the AdvancedWord collection, which provides detailed information about each word's position and confidence score in the scanned document.
What is the significance of bounding boxes in OCR?
Bounding boxes are crucial in OCR as they define the exact location and dimensions of recognized text elements on the scanned image, enabling precise text extraction and image manipulation.
Can I filter OCR results by confidence score?
Yes, using IronOCR, you can filter OCR results by confidence score to ensure that only text with a high recognition accuracy is considered for further processing.
How do I reconstruct the reading order in OCR results?
Reconstructing the reading order in OCR results is possible by analyzing the sequence of AdvancedWord and AdvancedCharacter objects provided by IronOCR, which reflect the natural reading flow of the document.
Is it possible to crop source images using IronOCR?
IronOCR allows you to crop source images based on the analysis of text data, which includes the bounding boxes and coordinates of recognized words and characters.
What are AdvancedWord and AdvancedCharacter collections?
AdvancedWord and AdvancedCharacter collections in IronOCR are data structures that store detailed information about each recognized word and character, including their coordinates, confidence levels, and bounding boxes.
How does IronOCR handle character recognition?
IronOCR handles character recognition by utilizing an advanced pipeline that analyzes each character's features, providing detailed data such as its position, size, and recognition confidence.
What type of documents can be processed with IronOCR?
IronOCR can process a wide range of document types including PDFs, scanned images, and photos, extracting text data with high accuracy and detail.
Is there a free trial available for IronOCR?
Yes, Iron Software offers a free trial of IronOCR, allowing users to test its features and capabilities before making a purchase decision.

