Skip to footer content
USING IRONOCR

OCR in C# CodeProject Tutorial: Extract Text from Images with IronOCR

Optical character recognition (OCR) in C# lets you extract machine-readable text from scanned documents, image files, and TIFF files inside .NET applications. With IronOCR, a .NET-native OCR library, you install one NuGet package and start reading text from images in a few lines of code -- no external service, no runtime dependency, no per-call API fee.

Start your free trial of IronOCR to follow along with the code samples below.

How Do You Install IronOCR in a .NET Project?

The fastest way to add OCR to a .NET 10 project is through the NuGet Package Manager. Open a terminal in your project directory and run the dotnet CLI command, or use the Package Manager Console inside Visual Studio:

# .NET CLI
dotnet add package IronOcr

# Package Manager Console
Install-Package IronOcr
# .NET CLI
dotnet add package IronOcr

# Package Manager Console
Install-Package IronOcr
SHELL

After installation, the NuGet package manager downloads all required assemblies and wires up references automatically. IronOCR targets .NET Framework 4.6.2+, .NET Core 3.1+, and .NET 5 through .NET 10, so it works across console apps, ASP.NET Core services, WPF applications, and Azure Functions.

You do not need to register a license key to test locally -- a trial watermark appears on output until a license is applied. Add the using directive and, when you are ready for production, pass your key once at startup:

using IronOcr;

// Apply license key before any OCR calls (production only)
IronOcr.License.LicenseKey = "YOUR-LICENSE-KEY";
using IronOcr;

// Apply license key before any OCR calls (production only)
IronOcr.License.LicenseKey = "YOUR-LICENSE-KEY";
$vbLabelText   $csharpLabel

See the IronOCR licensing page for pricing and activation details.

How Do You Extract Text from an Image File?

The core OCR workflow involves three objects: IronTesseract (the engine), OcrInput (the input container), and OcrResult (the output). The sample below reads a PNG and prints the recognised text to the console.

using IronOcr;

var ocr = new IronTesseract();

using var input = new OcrInput();
input.LoadImage("sample-document.png");

OcrResult result = ocr.Read(input);
Console.WriteLine(result.Text);
using IronOcr;

var ocr = new IronTesseract();

using var input = new OcrInput();
input.LoadImage("sample-document.png");

OcrResult result = ocr.Read(input);
Console.WriteLine(result.Text);
$vbLabelText   $csharpLabel

Optical Character Recognition Output

OCR in C# CodeProject Tutorial: Extract Text from Images with IronOCR: Image 1 - Screenshot of OCR output

IronTesseract wraps the Tesseract 5 engine with .NET-friendly defaults and automatic model management. OcrInput.LoadImage accepts PNG, JPEG, BMP, GIF, TIFF, and WebP files, so you rarely need to convert formats before passing an image to the engine.

The OcrResult.Text property returns a plain string of all recognised characters joined by newlines. For richer access -- word bounding boxes, confidence scores, per-paragraph text -- navigate the result.Pages, result.Paragraphs, result.Words, and result.Characters collections.

Key properties worth knowing:

  • result.Pages[0].Text -- text from a single page
  • result.Words[n].Text and result.Words[n].Confidence -- per-word accuracy (0.0 -- 1.0)
  • result.Pages[0].Paragraphs -- paragraph segmentation for structured extraction

You can also call ocr.ReadAsync(input) to keep the UI thread free in desktop or web applications.

How Do You Process Scanned Documents and TIFF Files?

Multi-page TIFF files are common in document scanning workflows. IronOCR handles them with LoadImageFrames, which lets you choose exactly which frames (pages) to process -- useful when you only need a subset of a large archive.

using IronOcr;

var ocr = new IronTesseract();

using var input = new OcrInput();
int[] pageIndices = { 0, 1, 2 };
input.LoadImageFrames("scanned-documents.tiff", pageIndices);

// Correct skew and remove noise before reading
input.Deskew();
input.DeNoise();

OcrResult result = ocr.Read(input);

foreach (var page in result.Pages)
{
    Console.WriteLine($"Page {page.PageNumber}:");
    Console.WriteLine(page.Text);
}
using IronOcr;

var ocr = new IronTesseract();

using var input = new OcrInput();
int[] pageIndices = { 0, 1, 2 };
input.LoadImageFrames("scanned-documents.tiff", pageIndices);

// Correct skew and remove noise before reading
input.Deskew();
input.DeNoise();

OcrResult result = ocr.Read(input);

foreach (var page in result.Pages)
{
    Console.WriteLine($"Page {page.PageNumber}:");
    Console.WriteLine(page.Text);
}
$vbLabelText   $csharpLabel

OCR Output from Multi-Paged TIFF File

OCR in C# CodeProject Tutorial: Extract Text from Images with IronOCR: Image 2 - Multi-paged TIFF OCR output

Deskew rotates the image to correct any tilt introduced by flatbed scanners. DeNoise removes speckles and JPEG artifacts that confuse the Tesseract engine. Together, these two preprocessing filters significantly improve recognition accuracy on low-quality scans.

Additional OcrInput filters available for difficult source material:

  • input.Sharpen() -- increases edge contrast for blurry images
  • input.Binarize() -- converts to black-and-white for fax-quality documents
  • input.Scale(200) -- upscales small images for better character separation
  • input.Rotate(90) -- corrects rotated document orientations

See the IronOCR image filters guide for a full list of preprocessing options and when to apply them.

How Do You Configure Language Support for OCR?

By default, IronOCR reads English text. To process documents in other languages, install the matching language NuGet package and set the Language property on the IronTesseract instance.

dotnet add package IronOcr.Languages.German
dotnet add package IronOcr.Languages.French
dotnet add package IronOcr.Languages.Japanese
dotnet add package IronOcr.Languages.German
dotnet add package IronOcr.Languages.French
dotnet add package IronOcr.Languages.Japanese
SHELL

Then configure the engine and, for bilingual documents, add a secondary language:

using IronOcr;
using IronOcr.Languages;

var ocr = new IronTesseract();
ocr.Language = OcrLanguage.German;

// For bilingual documents (e.g. Canadian forms, EU directives)
ocr.AddSecondaryLanguage(OcrLanguage.French);

using var input = new OcrInput();
input.LoadImage("german-invoice.png");

OcrResult result = ocr.Read(input);
Console.WriteLine(result.Text);
using IronOcr;
using IronOcr.Languages;

var ocr = new IronTesseract();
ocr.Language = OcrLanguage.German;

// For bilingual documents (e.g. Canadian forms, EU directives)
ocr.AddSecondaryLanguage(OcrLanguage.French);

using var input = new OcrInput();
input.LoadImage("german-invoice.png");

OcrResult result = ocr.Read(input);
Console.WriteLine(result.Text);
$vbLabelText   $csharpLabel

IronOCR supports over 125 languages, each distributed as a separate lightweight NuGet package. This keeps your production binary small -- only the language data your application actually needs is included. The engine blends primary and secondary language models during recognition when you call AddSecondaryLanguage.

How Do You Handle OCR Errors and Improve Recognition Results?

Production applications need error handling around the OCR pipeline. Image quality issues, missing files, or unsupported formats can cause exceptions. Wrapping the call in a try/catch block gives you a clean recovery path.

using IronOcr;

var ocr = new IronTesseract();
ocr.Language = OcrLanguage.English;

try
{
    using var input = new OcrInput();
    input.LoadImage("document.png");
    input.DeNoise();
    input.Deskew();

    OcrResult result = ocr.Read(input);

    if (result.Text.Length > 0)
    {
        Console.WriteLine("Recognised text:");
        Console.WriteLine(result.Text);
    }
    else
    {
        Console.WriteLine("No text was detected in the image.");
    }
}
catch (Exception ex)
{
    Console.WriteLine($"OCR error: {ex.Message}");
}
using IronOcr;

var ocr = new IronTesseract();
ocr.Language = OcrLanguage.English;

try
{
    using var input = new OcrInput();
    input.LoadImage("document.png");
    input.DeNoise();
    input.Deskew();

    OcrResult result = ocr.Read(input);

    if (result.Text.Length > 0)
    {
        Console.WriteLine("Recognised text:");
        Console.WriteLine(result.Text);
    }
    else
    {
        Console.WriteLine("No text was detected in the image.");
    }
}
catch (Exception ex)
{
    Console.WriteLine($"OCR error: {ex.Message}");
}
$vbLabelText   $csharpLabel

A few additional settings that help when accuracy is lower than expected:

  • ocr.Configuration.PageSegmentationMode = TesseractPageSegmentationMode.Auto -- lets Tesseract choose between single-column, multi-column, and single-word layouts automatically
  • ocr.Configuration.ReadBarCodes = false -- disables barcode detection if you are processing text-only documents and want faster throughput
  • ocr.Configuration.TesseractVersion = TesseractVersion.Tesseract5 -- ensures you use the fastest available engine

For structured forms where fields appear at predictable positions, use region-based OCR to read only the areas that matter:

using IronOcr;
using IronSoftware.Drawing;

var ocr = new IronTesseract();

using var input = new OcrInput();
var region = new CropRectangle(x: 50, y: 200, width: 600, height: 100);
input.LoadImage("form.png", region);

OcrResult result = ocr.Read(input);
Console.WriteLine(result.Text);
using IronOcr;
using IronSoftware.Drawing;

var ocr = new IronTesseract();

using var input = new OcrInput();
var region = new CropRectangle(x: 50, y: 200, width: 600, height: 100);
input.LoadImage("form.png", region);

OcrResult result = ocr.Read(input);
Console.WriteLine(result.Text);
$vbLabelText   $csharpLabel

Limiting recognition to a crop rectangle reduces processing time by up to 90 percent on large images. This technique is well-suited for invoice number extraction, form field reading, and ID document scanning. More details are available in the region OCR how-to guide.

How Do You Create a Searchable PDF from Recognised Text?

Converting scanned image archives into searchable PDF files is one of the highest-value OCR use cases. The resulting file preserves the original visual appearance while embedding an invisible text layer that PDF viewers, search engines, and screen readers can index.

using IronOcr;

var ocr = new IronTesseract();

using var input = new OcrInput();
input.Title = "Quarterly Report Q1 2026";
input.LoadImage("page1.png");
input.LoadImage("page2.png");
input.LoadImage("page3.png");

OcrResult result = ocr.Read(input);
result.SaveAsSearchablePdf("searchable-output.pdf");

Console.WriteLine("Searchable PDF created.");
Console.WriteLine($"Pages processed: {result.Pages.Count}");
using IronOcr;

var ocr = new IronTesseract();

using var input = new OcrInput();
input.Title = "Quarterly Report Q1 2026";
input.LoadImage("page1.png");
input.LoadImage("page2.png");
input.LoadImage("page3.png");

OcrResult result = ocr.Read(input);
result.SaveAsSearchablePdf("searchable-output.pdf");

Console.WriteLine("Searchable PDF created.");
Console.WriteLine($"Pages processed: {result.Pages.Count}");
$vbLabelText   $csharpLabel

Output Searchable PDF Document

OCR in C# CodeProject Tutorial: Extract Text from Images with IronOCR: Image 3 - Searchable PDF created from input images

SaveAsSearchablePdf writes a PDF/A-compatible file where each recognised word is placed at the exact pixel coordinates of the original image. Adobe Acrobat, Preview on macOS, and Foxit Reader all support full-text search in these files immediately after generation.

For web-based document viewers or downstream NLP pipelines, use result.SaveAsHocrFile("output.hocr") instead. The hOCR format is an open XML standard that encodes per-word bounding boxes alongside the text, enabling client-side highlight-on-search and word-level accessibility annotations.

Additional output formats available from OcrResult:

  • result.SaveAsHocrFile("output.hocr") -- hOCR XML with positional data
  • result.ToXDocument() -- LINQ-queryable XDocument for programmatic processing
  • result.Pages[0].Text -- plain text per page for streaming pipelines

For applications that already work with IronPDF you can pipe OcrResult directly into PDF generation workflows, combining OCR extraction with PDF editing in a single .NET process.

How Do You Read Barcodes Alongside Text?

IronOCR can read barcodes and QR codes embedded in the same image as printed text, eliminating the need to run a separate barcode library. Enable the feature with one configuration property:

using IronOcr;

var ocr = new IronTesseract();
ocr.Configuration.ReadBarCodes = true;

using var input = new OcrInput();
input.LoadImage("shipping-label.png");

OcrResult result = ocr.Read(input);

Console.WriteLine("Text:");
Console.WriteLine(result.Text);

Console.WriteLine("Barcodes:");
foreach (var barcode in result.Barcodes)
{
    Console.WriteLine($"  {barcode.Format}: {barcode.Value}");
}
using IronOcr;

var ocr = new IronTesseract();
ocr.Configuration.ReadBarCodes = true;

using var input = new OcrInput();
input.LoadImage("shipping-label.png");

OcrResult result = ocr.Read(input);

Console.WriteLine("Text:");
Console.WriteLine(result.Text);

Console.WriteLine("Barcodes:");
foreach (var barcode in result.Barcodes)
{
    Console.WriteLine($"  {barcode.Format}: {barcode.Value}");
}
$vbLabelText   $csharpLabel

Supported barcode formats include Code 128, Code 39, EAN-13, EAN-8, UPC-A, UPC-E, PDF417, Data Matrix, and QR Code. Full details are in the IronOCR barcode reading guide.

This capability is particularly useful in logistics, healthcare, and retail applications where shipping labels, patient wristbands, and product tags carry both human-readable text and machine-readable barcodes.

How Do You Compare IronOCR with Other .NET OCR Options?

Developers evaluating OCR libraries for .NET typically consider IronOCR, Tesseract.NET, and cloud services such as Google Cloud Vision or Azure Computer Vision. The table below summarises the key differences:

Comparison of .NET OCR options across key developer criteria
Criterion IronOCR Tesseract.NET Azure Computer Vision
Deployment On-premise or cloud, no external calls On-premise Cloud-only, requires internet
Installation Single NuGet package Multiple packages + native binaries SDK + Azure subscription
Language packs 125+ via NuGet packages Manual tessdata download Managed by Azure
Searchable PDF output Built-in one method call Not included Not included
Image preprocessing 12+ built-in filters Manual pre-processing required Automatic (server-side)
Pricing model One-time perpetual license Open source (Apache 2.0) Per-call billing

Tesseract, maintained by Google as an open-source project, powers both IronOCR and Tesseract.NET under the hood. IronOCR adds .NET-idiomatic packaging, automatic model management, and the production output features (searchable PDF, hOCR export) that raw Tesseract bindings lack. Azure Computer Vision provides state-of-the-art cloud accuracy but introduces network latency and per-call costs that are unsuitable for high-volume or offline workflows.

For scenarios where data privacy regulations prohibit sending documents to external services -- healthcare records, legal documents, financial statements -- an on-premise library like IronOCR is the appropriate choice.

What Are Your Next Steps?

You now have the building blocks to add OCR to any .NET 10 application: installation via NuGet, basic image-to-text extraction, multi-page TIFF processing, language configuration, error handling, region-based reading, barcode detection, and searchable PDF generation.

To go deeper, explore these IronOCR resources:

For licensing questions or to deploy IronOCR in a production environment, visit the IronOCR licensing page. A free trial license removes output watermarks during your evaluation period, and Iron Software's support team is available for technical questions at any tier.

Get stated with IronOCR now.
green arrow pointer

Frequently Asked Questions

What is OCR and how does it benefit C# developers?

OCR, or Optical Character Recognition, converts documents such as scanned papers, PDFs, or images into editable and searchable data. For C# developers, OCR simplifies document processing by enabling applications to extract text from images and scanned documents, enhancing data accessibility and usability.

How do you implement OCR in a C# project?

You implement OCR in a C# project by installing the IronOCR NuGet package, creating an IronTesseract instance, loading an image into OcrInput, and calling the Read method. The returned OcrResult contains the extracted text and per-word positioning data.

What image formats are supported by IronOCR?

IronOCR supports PNG, JPEG, BMP, GIF, TIFF, and WebP image formats. This lets you work with most common image types without converting files before processing.

Can IronOCR handle multi-page TIFF files?

Yes, IronOCR can handle multi-page TIFF files. Use LoadImageFrames with an array of page indices to process specific frames, and iterate result.Pages to access per-page text.

Is it possible to extract text from a specific area of an image using IronOCR?

Yes, pass a CropRectangle to LoadImage to restrict OCR to a defined region. This reduces processing time significantly and is useful for extracting specific fields from forms, invoices, and ID documents.

Does IronOCR support different languages for text extraction?

IronOCR supports over 125 languages, each available as a separate NuGet package. Set the Language property on IronTesseract and call AddSecondaryLanguage for bilingual documents.

What are the advantages of IronOCR compared with raw Tesseract.NET?

IronOCR adds .NET-idiomatic packaging, automatic language model management, built-in image preprocessing filters, searchable PDF output, and hOCR export on top of the Tesseract engine, all accessible through a single NuGet package without manual native binary management.

How does IronOCR improve the accuracy of text recognition?

IronOCR provides preprocessing filters -- Deskew, DeNoise, Sharpen, Binarize, Scale, and Rotate -- that correct common scan defects before the Tesseract engine processes the image, improving recognition accuracy on low-quality source material.

Can IronOCR read barcodes and QR codes?

Yes, set ocr.Configuration.ReadBarCodes = true to detect barcodes and QR codes alongside text in the same image. Results are available in OcrResult.Barcodes with format type and decoded value.

What are common use cases for IronOCR in C# applications?

IronOCR is used in document management systems, invoice and receipt data extraction, searchable PDF generation from scanned archives, form field reading, shipping label processing, healthcare record digitisation, and accessibility tools.

Kannaopat Udonpant
Software Engineer
Before becoming a Software Engineer, Kannapat completed a Environmental Resources PhD from Hokkaido University in Japan. While pursuing his degree, Kannapat also became a member of the Vehicle Robotics Laboratory, which is part of the Department of Bioproduction Engineering. In 2022, he leveraged his C# skills to join Iron Software's engineering ...
Read More