Word and Character OCR Data in C# (Coordinates, Confidence, Bounding Boxes)
After running OCR on a document, the extracted text alone is often not enough. To locate specific values on a page, exclude low-quality detections, or reconstruct the natural reading order on multi-column layouts, you need per-word coordinates, page numbers, region indices, and confidence scores.
The Words and Characters collections on AdvancedOcrResultBase expose this data. Both ReadDocumentAdvanced() for layout-aware documents and ReadPhoto() for camera input return the same granularity available through the standard OcrResult.Words collection.
This guide walks through five common patterns: iterating word data, reconstructing reading order, filtering by confidence, working at the character level, and cropping the source image from a bounding box.
Start a free 30-day trial to test these collections in your pipeline.
Quickstart: Read Word and Character Data from an OCR Result
Call ReadDocumentAdvanced (or ReadPhoto) and iterate result.Words to get every recognized word with its coordinates, page number, and confidence score in a few lines.
-
Install IronOCR with NuGet Package Manager
PM > Install-Package IronOcr -
Copy and run this code snippet.
var result = new IronTesseract().ReadDocumentAdvanced(new OcrInput("scan.png")); foreach (var word in result.Words) Console.WriteLine($"{word.Text} @ ({word.X},{word.Y}) conf:{word.RegionConfidence:P0}"); -
Deploy to test on your live environment
Start using IronOCR in your project today with a free trial
Minimal Workflow (3 steps)
- Download the C# OCR library from NuGet
- Run advanced OCR with
ReadDocumentAdvancedorReadPhotoon your input - Iterate
result.Wordsorresult.Charactersfor coordinates, confidence, and bounding boxes
How Do You Iterate Words with Coordinates and Confidence?
The Words collection returns every detected word across every page. Each entry (an AdvancedWord or AdvancedCharacter, both inheriting from AdvancedOcrElement) exposes the text, pixel coordinates, dimensions, the page it belongs to, the region index identifying which detected text block contains it, and a confidence score for that region.
:path=/static-assets/ocr/content-code-examples/how-to/read-document-advanced-iterate-words.cs
using IronOcr;
var ocr = new IronTesseract();
using var input = new OcrInput();
input.LoadImage("receipt.png");
var result = ocr.ReadDocumentAdvanced(input);
foreach (var word in result.Words)
{
Console.WriteLine(
$"Page {word.PageNumber} | " +
$"Region {word.RegionIndex} | " +
$"'{word.Text}' | " +
$"Position: ({word.X}, {word.Y}) | " +
$"Size: {word.Width}x{word.Height} | " +
$"Confidence: {word.RegionConfidence:P1}"
);
}
// ToString() override for diagnostic logging
Console.WriteLine(result.Words.First().ToString());
Imports IronOcr
Dim ocr As New IronTesseract()
Using input As New OcrInput()
input.LoadImage("receipt.png")
Dim result = ocr.ReadDocumentAdvanced(input)
For Each word In result.Words
Console.WriteLine(
$"Page {word.PageNumber} | " &
$"Region {word.RegionIndex} | " &
$"'{word.Text}' | " &
$"Position: ({word.X}, {word.Y}) | " &
$"Size: {word.Width}x{word.Height} | " &
$"Confidence: {word.RegionConfidence:P1}"
)
Next
End Using
' ToString() override for diagnostic logging
Console.WriteLine(result.Words.First().ToString())
PageNumber is 1-based: page one is 1, not 0. This differs from most .NET collections, which use zero-based indexing. RegionIndex follows the standard 0-based convention.To pass coordinates to drawing or cropping APIs, use the BoundingBox property. It bundles position and size into a single IronSoftware.Drawing.Rectangle.
How Do You Reconstruct Reading Order?
On multi-column layouts, the Words collection iteration order does not match the visual reading order on the page. Words are grouped by detected region, so columns and table cells can be returned out of sequence.
To rebuild a natural top-to-bottom, left-to-right order, sort the collection by Y coordinate first, then by X within each line. A small Y tolerance groups words sitting on the same baseline.
:path=/static-assets/ocr/content-code-examples/how-to/read-document-advanced-reading-order.cs
using IronOcr;
using System.Linq;
var ocr = new IronTesseract();
using var input = new OcrInput();
input.LoadImage("multi-column-doc.png");
var result = ocr.ReadDocumentAdvanced(input);
int targetPage = 1;
int lineThreshold = 10; // pixel tolerance for grouping same-line words
// Sort by line (Y), then left-to-right (X)
var pageWords = result.Words
.Where(w => w.PageNumber == targetPage)
.OrderBy(w => w.Y / lineThreshold)
.ThenBy(w => w.X)
.ToList();
foreach (var word in pageWords)
{
Console.Write($"{word.Text} ");
}
Console.WriteLine();
Imports IronOcr
Imports System.Linq
Dim ocr As New IronTesseract()
Using input As New OcrInput()
input.LoadImage("multi-column-doc.png")
Dim result = ocr.ReadDocumentAdvanced(input)
Dim targetPage As Integer = 1
Dim lineThreshold As Integer = 10 ' pixel tolerance for grouping same-line words
' Sort by line (Y), then left-to-right (X)
Dim pageWords = result.Words _
.Where(Function(w) w.PageNumber = targetPage) _
.OrderBy(Function(w) w.Y \ lineThreshold) _
.ThenBy(Function(w) w.X) _
.ToList()
For Each word In pageWords
Console.Write($"{word.Text} ")
Next
Console.WriteLine()
End Using
Tune lineThreshold to match your document: 10–15 pixels works for standard 12pt text at 300 DPI. Larger headings or handwritten input call for a wider tolerance. This pattern is especially useful on multi-column pages and inside table cells, where the engine detects each column or cell as its own region.
How Do You Filter Low-Confidence Words?
To exclude low-quality detections before they reach your database, search index, or downstream extraction, filter the collection by RegionConfidence. The score ranges from 0.0 to 1.0, with higher values indicating greater confidence in the detected text.
:path=/static-assets/ocr/content-code-examples/how-to/read-document-advanced-confidence-filter.cs
using IronOcr;
using System.Linq;
var ocr = new IronTesseract();
using var input = new OcrInput();
input.LoadImage("noisy-scan.png");
var result = ocr.ReadDocumentAdvanced(input);
double threshold = 0.75;
var highConfidenceWords = result.Words
.Where(w => w.RegionConfidence >= threshold)
.ToList();
var lowConfidenceWords = result.Words
.Where(w => w.RegionConfidence < threshold)
.ToList();
Console.WriteLine($"Accepted: {highConfidenceWords.Count} words");
Console.WriteLine($"Rejected: {lowConfidenceWords.Count} words");
// Log rejected words for manual review
foreach (var word in lowConfidenceWords)
{
Console.WriteLine(
$" LOW CONF: '{word.Text}' at ({word.X},{word.Y}) — {word.RegionConfidence:P1}"
);
}
Imports IronOcr
Imports System.Linq
Dim ocr As New IronTesseract()
Using input As New OcrInput()
input.LoadImage("noisy-scan.png")
Dim result = ocr.ReadDocumentAdvanced(input)
Dim threshold As Double = 0.75
Dim highConfidenceWords = result.Words _
.Where(Function(w) w.RegionConfidence >= threshold) _
.ToList()
Dim lowConfidenceWords = result.Words _
.Where(Function(w) w.RegionConfidence < threshold) _
.ToList()
Console.WriteLine($"Accepted: {highConfidenceWords.Count} words")
Console.WriteLine($"Rejected: {lowConfidenceWords.Count} words")
' Log rejected words for manual review
For Each word In lowConfidenceWords
Console.WriteLine(
$" LOW CONF: '{word.Text}' at ({word.X},{word.Y}) — {word.RegionConfidence:P1}"
)
Next
End Using
For scans with mixed quality (clear print in some areas, degraded sections elsewhere), this prevents low-confidence output from reaching downstream systems. To raise confidence scores at the source, the image preprocessing filters (Deskew, DeNoise, Binarize) improve quality before the threshold is applied.
How Do You Iterate at the Character Level?
For OCR verification overlays, character-level diffing against ground truth, or precise spatial analysis on form fields, use the Characters collection. It mirrors Words but resolves down to individual characters.
:path=/static-assets/ocr/content-code-examples/how-to/read-document-advanced-characters.cs
using IronOcr;
var ocr = new IronTesseract();
using var input = new OcrInput();
input.LoadImage("form-field.png");
var result = ocr.ReadDocumentAdvanced(input);
foreach (var ch in result.Characters)
{
Console.WriteLine(
$"'{ch.Text}' | " +
$"Box: ({ch.X}, {ch.Y}, {ch.Width}, {ch.Height}) | " +
$"Page {ch.PageNumber}"
);
}
// ToString() override provides diagnostic-friendly output
Console.WriteLine(result.Characters.First().ToString());
Imports IronOcr
Dim ocr = New IronTesseract()
Using input = New OcrInput()
input.LoadImage("form-field.png")
Dim result = ocr.ReadDocumentAdvanced(input)
For Each ch In result.Characters
Console.WriteLine($"'{ch.Text}' | Box: ({ch.X}, {ch.Y}, {ch.Width}, {ch.Height}) | Page {ch.PageNumber}")
Next
' ToString() override provides diagnostic-friendly output
Console.WriteLine(result.Characters.First().ToString())
End Using
Words and Characters are computed lazily and cached. The first access triggers the computation; subsequent accesses return the cached result, so iterating a second time costs nothing.How Do You Crop the Original Image Using a BoundingBox?
To extract the visual region of a word for verification, annotation, or building labeled training data, pass the BoundingBox property to AnyBitmap.CropRegion(). The bounding box maps directly to the word's position in the source image.
:path=/static-assets/ocr/content-code-examples/how-to/read-document-advanced-crop-boundingbox.cs
using IronOcr;
using IronSoftware.Drawing;
var ocr = new IronTesseract();
using var input = new OcrInput();
input.LoadImage("invoice.png");
var result = ocr.ReadDocumentAdvanced(input);
// Load the original image for cropping
var originalImage = AnyBitmap.FromFile("invoice.png");
// Find a specific word and crop its region
var targetWord = result.Words.FirstOrDefault(w => w.Text == "Total");
if (targetWord != null)
{
Rectangle cropRect = targetWord.BoundingBox;
AnyBitmap croppedRegion = originalImage.CropRegion(cropRect);
croppedRegion.SaveAs("total-region.png");
Console.WriteLine(
$"Cropped '{targetWord.Text}' from " +
$"({cropRect.X}, {cropRect.Y}, {cropRect.Width}, {cropRect.Height})"
);
}
Imports IronOcr
Imports IronSoftware.Drawing
Dim ocr As New IronTesseract()
Using input As New OcrInput()
input.LoadImage("invoice.png")
Dim result = ocr.ReadDocumentAdvanced(input)
' Load the original image for cropping
Dim originalImage = AnyBitmap.FromFile("invoice.png")
' Find a specific word and crop its region
Dim targetWord = result.Words.FirstOrDefault(Function(w) w.Text = "Total")
If targetWord IsNot Nothing Then
Dim cropRect As Rectangle = targetWord.BoundingBox
Dim croppedRegion As AnyBitmap = originalImage.CropRegion(cropRect)
croppedRegion.SaveAs("total-region.png")
Console.WriteLine(
$"Cropped '{targetWord.Text}' from " &
$"({cropRect.X}, {cropRect.Y}, {cropRect.Width}, {cropRect.Height})"
)
End If
End Using
This pattern scales to bulk operations: iterate every word, crop each box, and export a labeled dataset for custom font training or downstream ML pipelines. Coordinates reflect the post-preprocessing image; if filters like EnhanceResolution changed the dimensions, the bounding box matches the processed image, not the original on disk.
Next Steps
The advanced pipeline provides the same spatial detail as IronTesseract.Read(), with additional layout intelligence on top. Related topics:
- Table extraction guide: covers the
Tablesproperty onReadDocumentAdvancedfor structured cell data. - Reading OCR results: word data for the standard pipeline.
- Image quality correction: preprocessing filters that raise confidence scores.
- OCR tutorial: end-to-end setup for new users.
Frequently Asked Questions
What is OCR and why is it important?
OCR, or Optical Character Recognition, is a technology that converts different types of documents, such as scanned paper documents, PDFs, or images captured by a digital camera, into editable and searchable data. OCR is important because it automates data extraction, reduces manual data entry, and makes information easily accessible and editable.
How does IronOCR enhance the OCR process?
IronOCR enhances the OCR process by providing accurate and high-speed text recognition capabilities. It supports multiple languages and includes features like image pre-processing to improve text recognition accuracy.
Can IronOCR handle multi-page documents?
Yes, IronOCR can process multi-page documents efficiently, extracting text from each page and allowing users to work with the entire document as a cohesive unit.
What file formats does IronOCR support?
IronOCR supports a wide range of file formats including PDF, TIFF, JPEG, PNG, and BMP, allowing flexibility in the types of documents it can process.
Is IronOCR suitable for recognizing text in low-quality images?
Yes, IronOCR includes advanced image pre-processing features that enhance the quality of low-resolution or poor-quality images, increasing the accuracy of text recognition.
Does IronOCR support multiple languages?
IronOCR supports multiple languages, making it a versatile tool for global applications that require text recognition in different languages.
Can IronOCR be integrated into existing applications?
IronOCR is designed to be easily integrated into existing applications using C#, allowing developers to add OCR functionality to their software with minimal effort.
What are the benefits of using IronOCR for document management?
Using IronOCR for document management streamlines the workflow by converting scanned documents into searchable and editable text, reducing the need for manual data entry and improving document accessibility.
How can IronOCR improve data accuracy?
IronOCR improves data accuracy through its advanced recognition algorithms and image correction features, ensuring that the text extraction process is both reliable and precise.
Is there a free trial available for IronOCR?
Yes, Iron Software offers a free trial of IronOCR, allowing users to test its features and capabilities before making a purchase decision.

