USING IRONOCR

OCR in C# CodeProject Tutorial: Extract Text from Images with IronOCR

Published:December 11, 2025

Optical character recognition (OCR) transforms how developers handle document processing in their .NET projects. Whether working with scanned documents, image files, or TIFF files, implementing a reliable OCR solution enables applications to extract text and convert visual data into machine-readable content. In this article, we'll show you how to use OCR in C# CodeProject using IronOCR, a powerful OCR library that simplifies the entire text recognition process.

Start your free trial of IronOCR to follow along with these code samples.

How Do I Set Up an OCR Library in My .NET Project?

Setting up optical character recognition (OCR) in Visual Studio requires just a few steps. The IronOCR library is available via NuGet, making integration straightforward for any Windows application.

Open Visual Studio and create a new Console Application project. In the Solution Explorer, right-click on References and select "Manage NuGet Packages." Search for "IronOcr" and install the package. The NuGet package manager downloads all required DLL files and adds references to your project automatically.

// Install via Package Manager Console
Install-Package IronOCR

// Install via Package Manager Console
Install-Package IronOCR

IRON VB CONVERTER ERROR developers@ironsoftware.com

$vbLabelText $csharpLabel

Once installed, add the using statement to import the IronOCR namespace into your program. The library supports .NET Framework 4.6.2+ and .NET Core, ensuring compatibility across different project types and Windows versions.

How Can I Extract Text from an Image File?

The first step in the OCR process is loading an image and passing it through the OCR engine. IronOCR provides the IronTesseract class as the primary OCR API for character recognition operations. This OCR sample demonstrates the fundamental approach to extracting text from any image file.

using System;
using IronOcr;
class Program
{
    static void Main(string[] args)
    {
        // Initialize the new Tesseract engine
        var ocr = new IronTesseract();
        // Load the image file and perform OCR
        using (var input = new OcrInput())
        {
            input.LoadImage(@"sample-document.png");
            // Process the image and extract text
            OcrResult result = ocr.Read(input);
            // Output the recognized text
            var text = result.Text;
            Console.WriteLine(text);
        }
    }
}

using System;
using IronOcr;
class Program
{
    static void Main(string[] args)
    {
        // Initialize the new Tesseract engine
        var ocr = new IronTesseract();
        // Load the image file and perform OCR
        using (var input = new OcrInput())
        {
            input.LoadImage(@"sample-document.png");
            // Process the image and extract text
            OcrResult result = ocr.Read(input);
            // Output the recognized text
            var text = result.Text;
            Console.WriteLine(text);
        }
    }
}

IRON VB CONVERTER ERROR developers@ironsoftware.com

$vbLabelText $csharpLabel

Optical Character Recognition Output

OCR in C# CodeProject Tutorial: Extract Text from Images with IronOCR: Image 1 - Screenshot of OCR output

The code above creates an IronTesseract object that serves as the OCR engine for all text-recognition operations. The OcrInput class accepts various image formats, including PNG, JPEG, BMP, GIF, and TIFF. When you call the Read method, the library processes the input image and returns an OcrResult object containing the recognized text.

The OcrResult.Text property provides the extracted content as a plain text string, ready for further processing in your application. This OCR code handles the complex character recognition algorithms internally, delivering recognition results with high accuracy across different document types.

How Do I Process Scanned Documents and TIFF Files?

Real-world applications often require processing multi-page scanned documents stored as TIFF files. The OCR library handles these scenarios efficiently by allowing developers to load specific page ranges or process entire documents. This sample code shows how to work with multi-frame TIFF images.

using System;
using IronOcr;
class Program
{
    static void Main(string[] args)
    {
        var ocr = new IronTesseract();
        using (var input = new OcrInput())
        {
            // Load specific pages from a multi-page TIFF file
            int[] pageIndices = new int[] { 0, 1, 2 };
            input.LoadImageFrames(@"scanned-documents.tiff", pageIndices);
            // Apply image enhancement for better results
            input.Deskew();
            OcrResult result = ocr.Read(input);
            // Access page-by-page results
            foreach (var page in result.Pages)
            {
                Console.WriteLine($"Page {page.PageNumber}:");
                Console.WriteLine(page.Text);
            }
        }
    }
}

using System;
using IronOcr;
class Program
{
    static void Main(string[] args)
    {
        var ocr = new IronTesseract();
        using (var input = new OcrInput())
        {
            // Load specific pages from a multi-page TIFF file
            int[] pageIndices = new int[] { 0, 1, 2 };
            input.LoadImageFrames(@"scanned-documents.tiff", pageIndices);
            // Apply image enhancement for better results
            input.Deskew();
            OcrResult result = ocr.Read(input);
            // Access page-by-page results
            foreach (var page in result.Pages)
            {
                Console.WriteLine($"Page {page.PageNumber}:");
                Console.WriteLine(page.Text);
            }
        }
    }
}

IRON VB CONVERTER ERROR developers@ironsoftware.com

$vbLabelText $csharpLabel

OCR Output from Multi-Paged TIFF File

OCR in C# CodeProject Tutorial: Extract Text from Images with IronOCR: Image 2 - Multi-paged TIFF OCR output

The LoadImageFrames method accepts a file path and an integer array specifying which pages to process. This approach optimizes performance when you only need specific pages from large document archives. The Deskew filter corrects any rotation or alignment issues in scanned images, improving image quality and OCR accuracy.

Each page in the result maintains layout information, including paragraphs, lines, and individual words. The OCR API provides access to confidence scores and positioning data, enabling sophisticated document analysis beyond simple text extraction.

How Can I Handle OCR Code Errors and Improve Recognition Results?

Production applications require proper error handling to manage various exception scenarios. Image quality issues, unsupported file formats, or corrupted files can cause the OCR process to fail. Implementing exception handling ensures your application responds gracefully to these situations.

using System;
using IronOcr;
class Program
{
    static void Main(string[] args)
    {
        var ocr = new IronTesseract();
        // Configure the OCR engine for your language
        ocr.Language = OcrLanguage.English;
        try
        {
            using (var input = new OcrInput())
            {
                input.LoadImage(@"document.png");
                // Enhance low-quality images
                input.DeNoise();
                input.Deskew();
                OcrResult result = ocr.Read(input);
                if (result.Text.Length > 0)
                {
                    Console.WriteLine("Recognized text:");
                    Console.WriteLine(result.Text);
                }
            }
        }
        catch (Exception ex)
        {
            Console.WriteLine($"OCR Error: {ex.Message}");
        }
    }
}

using System;
using IronOcr;
class Program
{
    static void Main(string[] args)
    {
        var ocr = new IronTesseract();
        // Configure the OCR engine for your language
        ocr.Language = OcrLanguage.English;
        try
        {
            using (var input = new OcrInput())
            {
                input.LoadImage(@"document.png");
                // Enhance low-quality images
                input.DeNoise();
                input.Deskew();
                OcrResult result = ocr.Read(input);
                if (result.Text.Length > 0)
                {
                    Console.WriteLine("Recognized text:");
                    Console.WriteLine(result.Text);
                }
            }
        }
        catch (Exception ex)
        {
            Console.WriteLine($"OCR Error: {ex.Message}");
        }
    }
}

IRON VB CONVERTER ERROR developers@ironsoftware.com

$vbLabelText $csharpLabel

The Language property configures which language pack the OCR engine uses for text recognition. IronOCR supports over 125 languages, each available as separate NuGet packages. The DeNoise filter removes digital artifacts from scanned documents, while Deskew corrects alignment—both critical for achieving optimal recognition results from imperfect source images.

How Do I Create a Searchable PDF from Recognized Text?

Converting scanned documents into searchable PDF files represents one of the most valuable OCR applications. Users can then search, select, and copy text from previously image-only documents. This transformation enables document management systems to index content and improve accessibility.

using System;
using IronOcr;
class Program
{
    static void Main(string[] args)
    {
        var ocr = new IronTesseract();
        using (var input = new OcrInput())
        {
            // Set document metadata
            input.Title = "Converted Document";
            // Load source images or existing PDF
            input.LoadImage(@"page1.png");
            input.LoadImage(@"page2.png");
            OcrResult result = ocr.Read(input);
            // Save as searchable PDF with embedded text layer
            result.SaveAsSearchablePdf(@"searchable-output.pdf");
            Console.WriteLine("Searchable PDF created successfully.");
            Console.WriteLine($"Total pages processed: {result.Pages.Count}");
        }
    }
}

using System;
using IronOcr;
class Program
{
    static void Main(string[] args)
    {
        var ocr = new IronTesseract();
        using (var input = new OcrInput())
        {
            // Set document metadata
            input.Title = "Converted Document";
            // Load source images or existing PDF
            input.LoadImage(@"page1.png");
            input.LoadImage(@"page2.png");
            OcrResult result = ocr.Read(input);
            // Save as searchable PDF with embedded text layer
            result.SaveAsSearchablePdf(@"searchable-output.pdf");
            Console.WriteLine("Searchable PDF created successfully.");
            Console.WriteLine($"Total pages processed: {result.Pages.Count}");
        }
    }
}

IRON VB CONVERTER ERROR developers@ironsoftware.com

$vbLabelText $csharpLabel

Output Searchable PDF Document

OCR in C# CodeProject Tutorial: Extract Text from Images with IronOCR: Image 3 - Searchable PDF created from input images

The SaveAsSearchablePdf method generates a PDF file that preserves the original image appearance while embedding an invisible text layer. This approach preserves document fidelity by ensuring the visual output matches the source exactly, while also enabling full-text search. Microsoft Office applications, Adobe Reader, and other PDF viewers can then search and index the recognized text.

For applications requiring HTML output, IronOCR also provides the SaveAsHocrFile method, which exports results in the hOCR format. This XML-based standard includes per-word positioning data, enabling web-based document viewers and advanced text-analysis workflows.

Conclusion

Implementing optical character recognition in C# projects becomes straightforward with IronOCR. The library handles complex image processing, supports multiple image formats and languages, and provides flexible output options including searchable PDF generation. From simple text extraction to processing multi-page TIFF documents, the samples in this tutorial demonstrate the core workflows developers need.

The IronOCR documentation provides additional code examples for advanced image filters, barcode reading, and region-specific OCR processing. The API reference details all available classes and methods for building comprehensive document processing solutions.

Get stated with IronOCR now.

Ready to implement OCR in your next project? Purchase a license to deploy IronOCR in production environments with full support and updates.

Frequently Asked Questions

What is OCR and how does it benefit C# developers?

OCR, or Optical Character Recognition, is a technology that converts different types of documents, such as scanned paper documents, PDFs, or images captured by a digital camera, into editable and searchable data. For C# developers, OCR simplifies document processing by enabling applications to extract text from images and scanned documents, enhancing data accessibility and usability.

How can I implement OCR in a C# project?

You can implement OCR in a C# project by using the IronOCR library. This library provides an easy-to-use interface to extract text from images and scanned documents within .NET applications, making it simple to integrate OCR functionality into your project.

What image formats are supported by IronOCR?

IronOCR supports a wide range of image formats, including JPEG, PNG, BMP, GIF, and TIFF. This flexibility allows you to work with various types of image files to extract text efficiently.

Can IronOCR handle multi-page TIFF files?

Yes, IronOCR can handle multi-page TIFF files. It provides capabilities to process and extract text from each page within a multi-page TIFF, making it an ideal solution for handling complex documents.

Is it possible to extract text from a specific area of an image using IronOCR?

Yes, IronOCR allows you to specify a particular area of an image from which to extract text. This feature is useful when you need to focus on a specific section of a document, such as a form or table.

Does IronOCR support different languages for text extraction?

IronOCR supports text extraction in multiple languages, allowing you to work with documents in various languages seamlessly. This feature enhances the versatility of your applications, catering to a global audience.

What are the advantages of using IronOCR over other OCR libraries?

IronOCR offers several advantages, including ease of use, reliable text recognition, support for multiple languages, and compatibility with various image formats. Its powerful features and performance make it a preferred choice for developers looking to implement OCR in their C# projects.

How does IronOCR improve the accuracy of text recognition?

IronOCR improves text recognition accuracy through advanced algorithms and machine learning techniques. It can handle challenging documents with varying fonts, sizes, and layouts, ensuring high precision in text extraction.

Is it possible to integrate IronOCR into existing .NET applications?

Yes, IronOCR can be easily integrated into existing .NET applications. Its straightforward API allows developers to add OCR capabilities to their applications with minimal effort, enhancing their functionality without extensive modifications.

What are some common use cases for IronOCR in C# applications?

IronOCR can be used in various C# applications, including document management systems, data entry automation, archiving, text extraction from invoices and receipts, and accessibility tools for the visually impaired. Its versatility makes it suitable for a wide range of industries and applications.

Kannapat Udonpant

Chat with engineering team now

Software Engineer

Before becoming a Software Engineer, Kannapat completed a Environmental Resources PhD from Hokkaido University in Japan. While pursuing his degree, Kannapat also became a member of the Vehicle Robotics Laboratory, which is part of the Department of Bioproduction Engineering. In 2022, he leveraged his C# skills to join Iron Software's engineering ...