Skip to footer content
USING IRONOCR

OCR in C# CodeProject Tutorial: Extract Text from Images with IronOCR

Optical character recognition (OCR) transforms how developers handle document processing in their .NET projects. Whether working with scanned documents, image files, or TIFF files, implementing a reliable OCR solution enables applications to extract text and convert visual data into machine-readable content. In this article, we'll show you how to use OCR in C# CodeProject using IronOCR, a powerful OCR library that simplifies the entire text recognition process.

Start your free trial of IronOCR to follow along with these code samples.

How Do I Set Up an OCR Library in My .NET Project?

Setting up optical character recognition (OCR) in Visual Studio requires just a few steps. The IronOCR library is available via NuGet, making integration straightforward for any Windows application.

Open Visual Studio and create a new Console Application project. In the Solution Explorer, right-click on References and select "Manage NuGet Packages." Search for "IronOcr" and install the package. The NuGet package manager downloads all required DLL files and adds references to your project automatically.

// Install via Package Manager Console
Install-Package IronOCR
// Install via Package Manager Console
Install-Package IronOCR
IRON VB CONVERTER ERROR developers@ironsoftware.com
$vbLabelText   $csharpLabel

Once installed, add the using statement to import the IronOCR namespace into your program. The library supports .NET Framework 4.6.2+ and .NET Core, ensuring compatibility across different project types and Windows versions.

How Can I Extract Text from an Image File?

The first step in the OCR process is loading an image and passing it through the OCR engine. IronOCR provides the IronTesseract class as the primary OCR API for character recognition operations. This OCR sample demonstrates the fundamental approach to extracting text from any image file.

using System;
using IronOcr;
class Program
{
    static void Main(string[] args)
    {
        // Initialize the new Tesseract engine
        var ocr = new IronTesseract();
        // Load the image file and perform OCR
        using (var input = new OcrInput())
        {
            input.LoadImage(@"sample-document.png");
            // Process the image and extract text
            OcrResult result = ocr.Read(input);
            // Output the recognized text
            var text = result.Text;
            Console.WriteLine(text);
        }
    }
}
using System;
using IronOcr;
class Program
{
    static void Main(string[] args)
    {
        // Initialize the new Tesseract engine
        var ocr = new IronTesseract();
        // Load the image file and perform OCR
        using (var input = new OcrInput())
        {
            input.LoadImage(@"sample-document.png");
            // Process the image and extract text
            OcrResult result = ocr.Read(input);
            // Output the recognized text
            var text = result.Text;
            Console.WriteLine(text);
        }
    }
}
IRON VB CONVERTER ERROR developers@ironsoftware.com
$vbLabelText   $csharpLabel

Optical Character Recognition Output

OCR in C# CodeProject Tutorial: Extract Text from Images with IronOCR: Image 1 - Screenshot of OCR output

The code above creates an IronTesseract object that serves as the OCR engine for all text-recognition operations. The OcrInput class accepts various image formats, including PNG, JPEG, BMP, GIF, and TIFF. When you call the Read method, the library processes the input image and returns an OcrResult object containing the recognized text.

The OcrResult.Text property provides the extracted content as a plain text string, ready for further processing in your application. This OCR code handles the complex character recognition algorithms internally, delivering recognition results with high accuracy across different document types.

How Do I Process Scanned Documents and TIFF Files?

Real-world applications often require processing multi-page scanned documents stored as TIFF files. The OCR library handles these scenarios efficiently by allowing developers to load specific page ranges or process entire documents. This sample code shows how to work with multi-frame TIFF images.

using System;
using IronOcr;
class Program
{
    static void Main(string[] args)
    {
        var ocr = new IronTesseract();
        using (var input = new OcrInput())
        {
            // Load specific pages from a multi-page TIFF file
            int[] pageIndices = new int[] { 0, 1, 2 };
            input.LoadImageFrames(@"scanned-documents.tiff", pageIndices);
            // Apply image enhancement for better results
            input.Deskew();
            OcrResult result = ocr.Read(input);
            // Access page-by-page results
            foreach (var page in result.Pages)
            {
                Console.WriteLine($"Page {page.PageNumber}:");
                Console.WriteLine(page.Text);
            }
        }
    }
}
using System;
using IronOcr;
class Program
{
    static void Main(string[] args)
    {
        var ocr = new IronTesseract();
        using (var input = new OcrInput())
        {
            // Load specific pages from a multi-page TIFF file
            int[] pageIndices = new int[] { 0, 1, 2 };
            input.LoadImageFrames(@"scanned-documents.tiff", pageIndices);
            // Apply image enhancement for better results
            input.Deskew();
            OcrResult result = ocr.Read(input);
            // Access page-by-page results
            foreach (var page in result.Pages)
            {
                Console.WriteLine($"Page {page.PageNumber}:");
                Console.WriteLine(page.Text);
            }
        }
    }
}
IRON VB CONVERTER ERROR developers@ironsoftware.com
$vbLabelText   $csharpLabel

OCR Output from Multi-Paged TIFF File

OCR in C# CodeProject Tutorial: Extract Text from Images with IronOCR: Image 2 - Multi-paged TIFF OCR output

The LoadImageFrames method accepts a file path and an integer array specifying which pages to process. This approach optimizes performance when you only need specific pages from large document archives. The Deskew filter corrects any rotation or alignment issues in scanned images, improving image quality and OCR accuracy.

Each page in the result maintains layout information, including paragraphs, lines, and individual words. The OCR API provides access to confidence scores and positioning data, enabling sophisticated document analysis beyond simple text extraction.

How Can I Handle OCR Code Errors and Improve Recognition Results?

Production applications require proper error handling to manage various exception scenarios. Image quality issues, unsupported file formats, or corrupted files can cause the OCR process to fail. Implementing exception handling ensures your application responds gracefully to these situations.

using System;
using IronOcr;
class Program
{
    static void Main(string[] args)
    {
        var ocr = new IronTesseract();
        // Configure the OCR engine for your language
        ocr.Language = OcrLanguage.English;
        try
        {
            using (var input = new OcrInput())
            {
                input.LoadImage(@"document.png");
                // Enhance low-quality images
                input.DeNoise();
                input.Deskew();
                OcrResult result = ocr.Read(input);
                if (result.Text.Length > 0)
                {
                    Console.WriteLine("Recognized text:");
                    Console.WriteLine(result.Text);
                }
            }
        }
        catch (Exception ex)
        {
            Console.WriteLine($"OCR Error: {ex.Message}");
        }
    }
}
using System;
using IronOcr;
class Program
{
    static void Main(string[] args)
    {
        var ocr = new IronTesseract();
        // Configure the OCR engine for your language
        ocr.Language = OcrLanguage.English;
        try
        {
            using (var input = new OcrInput())
            {
                input.LoadImage(@"document.png");
                // Enhance low-quality images
                input.DeNoise();
                input.Deskew();
                OcrResult result = ocr.Read(input);
                if (result.Text.Length > 0)
                {
                    Console.WriteLine("Recognized text:");
                    Console.WriteLine(result.Text);
                }
            }
        }
        catch (Exception ex)
        {
            Console.WriteLine($"OCR Error: {ex.Message}");
        }
    }
}
IRON VB CONVERTER ERROR developers@ironsoftware.com
$vbLabelText   $csharpLabel

The Language property configures which language pack the OCR engine uses for text recognition. IronOCR supports over 125 languages, each available as separate NuGet packages. The DeNoise filter removes digital artifacts from scanned documents, while Deskew corrects alignment—both critical for achieving optimal recognition results from imperfect source images.

How Do I Create a Searchable PDF from Recognized Text?

Converting scanned documents into searchable PDF files represents one of the most valuable OCR applications. Users can then search, select, and copy text from previously image-only documents. This transformation enables document management systems to index content and improve accessibility.

using System;
using IronOcr;
class Program
{
    static void Main(string[] args)
    {
        var ocr = new IronTesseract();
        using (var input = new OcrInput())
        {
            // Set document metadata
            input.Title = "Converted Document";
            // Load source images or existing PDF
            input.LoadImage(@"page1.png");
            input.LoadImage(@"page2.png");
            OcrResult result = ocr.Read(input);
            // Save as searchable PDF with embedded text layer
            result.SaveAsSearchablePdf(@"searchable-output.pdf");
            Console.WriteLine("Searchable PDF created successfully.");
            Console.WriteLine($"Total pages processed: {result.Pages.Count}");
        }
    }
}
using System;
using IronOcr;
class Program
{
    static void Main(string[] args)
    {
        var ocr = new IronTesseract();
        using (var input = new OcrInput())
        {
            // Set document metadata
            input.Title = "Converted Document";
            // Load source images or existing PDF
            input.LoadImage(@"page1.png");
            input.LoadImage(@"page2.png");
            OcrResult result = ocr.Read(input);
            // Save as searchable PDF with embedded text layer
            result.SaveAsSearchablePdf(@"searchable-output.pdf");
            Console.WriteLine("Searchable PDF created successfully.");
            Console.WriteLine($"Total pages processed: {result.Pages.Count}");
        }
    }
}
IRON VB CONVERTER ERROR developers@ironsoftware.com
$vbLabelText   $csharpLabel

Output Searchable PDF Document

OCR in C# CodeProject Tutorial: Extract Text from Images with IronOCR: Image 3 - Searchable PDF created from input images

The SaveAsSearchablePdf method generates a PDF file that preserves the original image appearance while embedding an invisible text layer. This approach preserves document fidelity by ensuring the visual output matches the source exactly, while also enabling full-text search. Microsoft Office applications, Adobe Reader, and other PDF viewers can then search and index the recognized text.

For applications requiring HTML output, IronOCR also provides the SaveAsHocrFile method, which exports results in the hOCR format. This XML-based standard includes per-word positioning data, enabling web-based document viewers and advanced text-analysis workflows.

Conclusion

Implementing optical character recognition in C# projects becomes straightforward with IronOCR. The library handles complex image processing, supports multiple image formats and languages, and provides flexible output options including searchable PDF generation. From simple text extraction to processing multi-page TIFF documents, the samples in this tutorial demonstrate the core workflows developers need.

The IronOCR documentation provides additional code examples for advanced image filters, barcode reading, and region-specific OCR processing. The API reference details all available classes and methods for building comprehensive document processing solutions.

Get stated with IronOCR now.
green arrow pointer

Ready to implement OCR in your next project? Purchase a license to deploy IronOCR in production environments with full support and updates.

Kannaopat Udonpant
Software Engineer
Before becoming a Software Engineer, Kannapat completed a Environmental Resources PhD from Hokkaido University in Japan. While pursuing his degree, Kannapat also became a member of the Vehicle Robotics Laboratory, which is part of the Department of Bioproduction Engineering. In 2022, he leveraged his C# skills to join Iron Software's engineering ...
Read More