Get Started with OCR in C# and VB.NET

Chaknith Bin

Updated:June 9, 2025

IronOCR is a C# software library allowing .NET platform software developers to recognize and read text from images and PDF documents. It is a pure .NET OCR library using the most advanced Tesseract engine known, anywhere.

Installation

Install with NuGet Package Manager

Install IronOcr in Visual Studio or at the command line with the NuGet Package Manager. In Visual Studio, navigate to the console with:

Tools ->
NuGet Package Manager ->
Package Manager Console

Install-Package IronOcr

And check out IronOcr on NuGet for more about version updates and installation.

There are other IronOCR NuGet Packages available for different platforms:

Windows: https://www.nuget.org/packages/IronOcr
Linux: https://www.nuget.org/packages/IronOcr.Linux
MacOS: https://www.nuget.org/packages/IronOcr.MacOs
MacOS (ARM): https://www.nuget.org/packages/IronOcr.MacOs.ARM

Download the IronOCR .ZIP

You may also choose to download IronOCR via .ZIP file instead. Click to directly download the DLL. Once you have the .zip downloaded:

Instructions for .NET Framework 4.0+ Installation:

Include the IronOcr.dll in net40 folder into your project
And then add Assembly references to:
- System.Configuration
- System.Drawing
- System.Web

Instructions for .NET Standard & .NET Core 2.0+, & .NET 5

Include the IronOcr.dll in netstandard2.0 folder into your project
And then add a NuGet Package Reference to:
- System.Drawing.Common 4.7 or higher

Download the IronOCR Installer (Windows only)

Another option is to download our IronOCR installer which will install all the required resources for IronOCR to work out-of-the-box. Please keep in mind this option is only for Windows systems. To download the installer please click here. Once you have the .zip downloaded:

Instructions for .NET Framework 4.0+ Installation:

Include the IronOcr.dll in net40 folder into your project
And then add Assembly references to:
- System.Configuration
- System.Drawing
- System.Web

Instructions for .NET Standard & .NET Core 2.0+, & .NET 5

Include the IronOcr.dll in netstandard2.0 folder into your project
And then add a NuGet Package Reference to:
- System.Drawing.Common 4.7 or higher

Why Choose IronOCR?

IronOCR is an easy-to-install, complete and well-documented .NET software library.

Choose IronOCR to achieve 99.8%+ OCR accuracy without using any external web services, ongoing fees or sending confidential documents over the internet.

Why C# developers choose IronOCR over Vanilla Tesseract:

Install as a single DLL or NuGet
Includes Tesseract 5, 4, and 3 Engines out of the box.
Accuracy 99.8% significantly outperforms regular Tesseract.
Blazing Speed and MultiThreading
MVC, WebApp, Desktop, Console & Server Application compatible
No Exes or C++ code to work with
Full PDF OCR support
Perform OCR on almost any Image file or PDF
Full .NET Core, Standard, and Framework support
Deploy on Windows, Mac, Linux, Azure, Docker, Lambda, AWS
Read barcodes and QR codes
Export OCR results as XHTML
Export OCR to searchable PDF documents
Multithreading support
125 international languages all managed via NuGet or OcrData files
Extract Images, Coordinates, Statistics, and Fonts. Not just text.
Can be used to redistribute Tesseract OCR inside commercial & proprietary applications.

IronOCR shines when working with real-world images and imperfect documents such as photographs, or scans of low resolution which may have digital noise or imperfections.

Other free OCR libraries for the .NET platform such as other .NET Tesseract APIs and web services do not perform so well on these real-world use cases.

OCR with Tesseract 5 - Start Coding in C#

The code sample below shows how easy it is to read text from an image using C# or VB .NET.

OneLiner

:path=/static-assets/ocr/content-code-examples/get-started/get-started-1.cs

string Text = new IronTesseract().Read(@"img\Screenshot.png").Text;

Dim Text As String = (New IronTesseract()).Read("img\Screenshot.png").Text

$vbLabelText $csharpLabel

Configurable Hello World

:path=/static-assets/ocr/content-code-examples/get-started/get-started-2.cs

using IronOcr;

IronTesseract ocr = new IronTesseract();
using OcrInput input = new OcrInput();

// Add multiple images
input.LoadImage("images/sample.jpeg");

OcrResult result = ocr.Read(input);
Console.WriteLine(result.Text);

Imports IronOcr

Private ocr As New IronTesseract()
Private OcrInput As using

' Add multiple images
input.LoadImage("images/sample.jpeg")

Dim result As OcrResult = ocr.Read(input)
Console.WriteLine(result.Text)

$vbLabelText $csharpLabel

C# PDF OCR

The same approach can similarly be used to extract text from any PDF document.

:path=/static-assets/ocr/content-code-examples/get-started/get-started-3.cs

using IronOcr;

IronTesseract ocr = new IronTesseract();
using OcrInput input = new OcrInput();

// We can also select specific PDF page numbers to OCR
input.LoadPdf("example.pdf", Password: "password");

OcrResult result = ocr.Read(input);
Console.WriteLine(result.Text);

// 1 page for every page of the PDF
Console.WriteLine($"{result.Pages.Length} Pages");

Imports IronOcr

Private ocr As New IronTesseract()
Private OcrInput As using

' We can also select specific PDF page numbers to OCR
input.LoadPdf("example.pdf", Password:= "password")

Dim result As OcrResult = ocr.Read(input)
Console.WriteLine(result.Text)

' 1 page for every page of the PDF
Console.WriteLine($"{result.Pages.Length} Pages")

$vbLabelText $csharpLabel

OCR for MultiPage TIFFs

:path=/static-assets/ocr/content-code-examples/get-started/get-started-4.cs

using IronOcr;

IronTesseract ocr = new IronTesseract();
using OcrInput input = new OcrInput();
var pageindices = new int[] { 1, 2 };
input.LoadImageFrames("multi-frame.tiff", pageindices);
OcrResult result = ocr.Read(input);
Console.WriteLine(result.Text);

Imports IronOcr

Private ocr As New IronTesseract()
Private OcrInput As using
Private pageindices = New Integer() { 1, 2 }
input.LoadImageFrames("multi-frame.tiff", pageindices)
Dim result As OcrResult = ocr.Read(input)
Console.WriteLine(result.Text)

$vbLabelText $csharpLabel

Barcodes and QR

A unique feature of IronOCR is it can read barcodes and QR codes from documents while it is scanning for text. Instances of the OcrResult.OcrBarcode class give the developer detailed information about each scanned barcode.

:path=/static-assets/ocr/content-code-examples/get-started/get-started-5.cs

using IronOcr;

IronTesseract ocr = new IronTesseract();
ocr.Configuration.ReadBarCodes = true;

using OcrInput input = new OcrInput();
input.LoadImage("img/Barcode.png");

OcrResult Result = ocr.Read(input);
foreach (var Barcode in Result.Barcodes)
{
    // type and location properties also exposed
    Console.WriteLine(Barcode.Value);
}

Imports IronOcr

Private ocr As New IronTesseract()
ocr.Configuration.ReadBarCodes = True

Using input As New OcrInput()
	input.LoadImage("img/Barcode.png")
	
	Dim Result As OcrResult = ocr.Read(input)
	For Each Barcode In Result.Barcodes
		' type and location properties also exposed
		Console.WriteLine(Barcode.Value)
	Next Barcode
End Using

$vbLabelText $csharpLabel

OCR on Specific Areas of Images

All of IronOCR's scanning and reading methods provide the ability to specify exactly which part of a page or pages we wish to read text from. This is very useful when we are looking at standardized forms and can save a lot of time and improve efficiency.

To use crop regions, we will need to add a system reference to System.Drawing so that we can use the System.Drawing.Rectangle object.

:path=/static-assets/ocr/content-code-examples/get-started/get-started-6.cs

using IronOcr;

IronTesseract ocr = new IronTesseract();
using OcrInput input = new OcrInput();

// Dimensions are in pixel
var contentArea = new System.Drawing.Rectangle() { X = 215, Y = 1250, Height = 280, Width = 1335 };

input.LoadImage("document.png", contentArea);

OcrResult result = ocr.Read(input);
Console.WriteLine(result.Text);

Imports IronOcr

Private ocr As New IronTesseract()
Private OcrInput As using

' Dimensions are in pixel
Private contentArea = New System.Drawing.Rectangle() With {
	.X = 215,
	.Y = 1250,
	.Height = 280,
	.Width = 1335
}

input.LoadImage("document.png", contentArea)

Dim result As OcrResult = ocr.Read(input)
Console.WriteLine(result.Text)

$vbLabelText $csharpLabel

OCR for Low Quality Scans

The IronOCR OcrInput class can fix scans that normal Tesseract cannot read.

:path=/static-assets/ocr/content-code-examples/get-started/get-started-7.cs

using IronOcr;

IronTesseract ocr = new IronTesseract();
using OcrInput input = new OcrInput();
var pageindices = new int[] { 1, 2 };
input.LoadImageFrames(@"img\Potter.tiff", pageindices);

// fixes digital noise and poor scanning
input.DeNoise();

// fixes rotation and perspective
input.Deskew();

OcrResult result = ocr.Read(input);
Console.WriteLine(result.Text);

Imports IronOcr

Private ocr As New IronTesseract()
Private OcrInput As using
Private pageindices = New Integer() { 1, 2 }
input.LoadImageFrames("img\Potter.tiff", pageindices)

' fixes digital noise and poor scanning
input.DeNoise()

' fixes rotation and perspective
input.Deskew()

Dim result As OcrResult = ocr.Read(input)
Console.WriteLine(result.Text)

$vbLabelText $csharpLabel

Export OCR results as a Searchable PDF

:path=/static-assets/ocr/content-code-examples/get-started/get-started-8.cs

using IronOcr;

IronTesseract ocr = new IronTesseract();
using OcrInput input = new OcrInput();
input.Title = "Quarterly Report";
input.LoadImage("image1.jpeg");
input.LoadImage("image2.png");
var pageindices = new int[] { 1, 2 };
input.LoadImageFrames("image3.gif", pageindices);

OcrResult result = ocr.Read(input);
result.SaveAsSearchablePdf("searchable.pdf");

Imports IronOcr

Private ocr As New IronTesseract()
Private OcrInput As using
input.Title = "Quarterly Report"
input.LoadImage("image1.jpeg")
input.LoadImage("image2.png")
Dim pageindices = New Integer() { 1, 2 }
input.LoadImageFrames("image3.gif", pageindices)

Dim result As OcrResult = ocr.Read(input)
result.SaveAsSearchablePdf("searchable.pdf")

$vbLabelText $csharpLabel

TIFF to searchable PDF Conversion

:path=/static-assets/ocr/content-code-examples/get-started/get-started-9.cs

using IronOcr;

IronTesseract ocr = new IronTesseract();
using OcrInput input = new OcrInput();
var pageindices = new int[] { 1, 2 };
input.LoadImageFrames("example.tiff", pageindices);
ocr.Read(input).SaveAsSearchablePdf("searchable.pdf");

Imports IronOcr

Private ocr As New IronTesseract()
Private OcrInput As using
Private pageindices = New Integer() { 1, 2 }
input.LoadImageFrames("example.tiff", pageindices)
ocr.Read(input).SaveAsSearchablePdf("searchable.pdf")

$vbLabelText $csharpLabel

Export OCR results as HTML

:path=/static-assets/ocr/content-code-examples/get-started/get-started-10.cs

using IronOcr;

IronTesseract ocr = new IronTesseract();
using OcrInput input = new OcrInput();
input.Title = "Html Title";
input.LoadImage("image1.jpeg");

OcrResult Result = ocr.Read(input);
Result.SaveAsHocrFile("results.html");

Imports IronOcr

Private ocr As New IronTesseract()
Private OcrInput As using
input.Title = "Html Title"
input.LoadImage("image1.jpeg")

Dim Result As OcrResult = ocr.Read(input)
Result.SaveAsHocrFile("results.html")

$vbLabelText $csharpLabel

OCR Image Enhancement Filters

IronOCR provides unique filters to OcrInput objects to improve OCR performance.

Image Enhancement Code Example

:path=/static-assets/ocr/content-code-examples/get-started/get-started-11.cs

using IronOcr;

IronTesseract ocr = new IronTesseract();
using OcrInput input = new OcrInput();
input.LoadImage("LowQuality.jpeg");

// fixes digital noise and poor scanning
input.DeNoise();

// fixes rotation and perspective
input.Deskew();

OcrResult result = ocr.Read(input);
Console.WriteLine(result.Text);

Imports IronOcr

Private ocr As New IronTesseract()
Private OcrInput As using
input.LoadImage("LowQuality.jpeg")

' fixes digital noise and poor scanning
input.DeNoise()

' fixes rotation and perspective
input.Deskew()

Dim result As OcrResult = ocr.Read(input)
Console.WriteLine(result.Text)

$vbLabelText $csharpLabel

List of OCR Image Filters

Input filters to enhance OCR performance which are built into IronOCR include:

OcrInput.Rotate(double degrees) - Rotates images by a number of degrees clockwise. For anti-clockwise rotation, use negative numbers.
OcrInput.Binarize() - This filter converts every pixel to either black or white with no middle ground, potentially improving OCR performance in very low contrast images.
OcrInput.ToGrayScale() - Converts every pixel into a shade of grayscale. It may not improve accuracy but could improve speed.
OcrInput.Contrast() - Automatically increases contrast, often improving speed and accuracy in low contrast scans.
OcrInput.DeNoise() - Removes digital noise, recommended only when noise is expected.
OcrInput.Invert() - Inverts every color (white becomes black and vice versa).
OcrInput.Dilate() - Advances morphology, adds pixels to object boundaries, opposite of Erode.
OcrInput.Erode() - Advances morphology, removes pixels from object boundaries, opposite of Dilate.
OcrInput.Deskew() - Rotates an image to orient it correctly. Useful because Tesseract's skew tolerance is limited.
OcrInput.EnhanceResolution - Enhances resolution of low-quality images. This setting is generally used to manage low DPI input automatically.
EnhanceResolution detects low-resolution images (below 275 dpi), upscales them, and sharpens text for better OCR results. Though time-consuming, it often reduces overall OCR operation time.
Language - Supports selection from 22 international language packs.
Strategy - Allows selection between fast and less accurate or advanced (using AI for accuracy) strategies based on the statistical relationship of words.
ColorSpace - Choose to OCR in grayscale or color; grayscale is generally optimal though color can be better in certain contrast scenarios.
DetectWhiteTextOnDarkBackgrounds - Adjusts for negative images, automatically detecting and reading white text on dark backgrounds.
InputImageType - Guides the OCR library, specifying whether it is working on a full document or a snippet.
RotateAndStraighten - Allows IronOCR to properly handle documents that are rotated or affected by perspective distortions.
ReadBarcodes - Automatically reads barcodes and QR codes concurrently with text scanning without significant added time.
ColorDepth - Determines bits per pixel for color depth in the OCR process. A higher depth can increase quality but also the time of processing.

125 Language Packs

IronOCR supports 125 international languages via language packs which are distributed as DLLs, available for download from this website, or from the NuGet Package Manager.

Languages include German, French, English, Chinese, Japanese, among others. Specialist language packs exist for MRZ, MICR checks, financial data, license plates, etc. Additionally, custom tesseract ".traineddata" files can be used.

Language Example

// Reference to the path of the source file that demonstrates setting language packs for OCR
:path=/static-assets/ocr/content-code-examples/get-started/get-started-12.cs

// Reference to the path of the source file that demonstrates setting language packs for OCR
using IronOcr;

// PM> Install IronOcr.Languages.Arabic
IronTesseract ocr = new IronTesseract();
ocr.Language = OcrLanguage.Arabic;

using OcrInput input = new OcrInput();

var pageindices = new int[] { 1, 2 };
input.LoadImageFrames("img/arabic.gif", pageindices);
// Add image filters if needed
// In this case, even thought input is very low quality
// IronTesseract can read what conventional Tesseract cannot.
OcrResult result = ocr.Read(input);
// Console can't print Arabic on Windows easily.
// Let's save to disk instead.
result.SaveAsTextFile("arabic.txt");

' Reference to the path of the source file that demonstrates setting language packs for OCR
Imports IronOcr

' PM> Install IronOcr.Languages.Arabic
Private ocr As New IronTesseract()
ocr.Language = OcrLanguage.Arabic

Using input As New OcrInput()
	
	Dim pageindices = New Integer() { 1, 2 }
	input.LoadImageFrames("img/arabic.gif", pageindices)
	' Add image filters if needed
	' In this case, even thought input is very low quality
	' IronTesseract can read what conventional Tesseract cannot.
	Dim result As OcrResult = ocr.Read(input)
	' Console can't print Arabic on Windows easily.
	' Let's save to disk instead.
	result.SaveAsTextFile("arabic.txt")
End Using

$vbLabelText $csharpLabel

Multiple Language Example

It is also possible to OCR using multiple languages at the same time. This can enhance OCR of English metadata and URLs in Unicode documents.

// Reference to the path of the source file that demonstrates multi-language OCR
:path=/static-assets/ocr/content-code-examples/get-started/get-started-13.cs

// Reference to the path of the source file that demonstrates multi-language OCR
using IronOcr;

// PM> Install IronOcr.Languages.ChineseSimplified
IronTesseract ocr = new IronTesseract();
ocr.Language = OcrLanguage.ChineseSimplified;

// We can add any number of languages
ocr.AddSecondaryLanguage(OcrLanguage.English);

using OcrInput input = new OcrInput();
input.LoadPdf("multi-language.pdf");
OcrResult result = ocr.Read(input);
result.SaveAsTextFile("results.txt");

' Reference to the path of the source file that demonstrates multi-language OCR
Imports IronOcr

' PM> Install IronOcr.Languages.ChineseSimplified
Private ocr As New IronTesseract()
ocr.Language = OcrLanguage.ChineseSimplified

' We can add any number of languages
ocr.AddSecondaryLanguage(OcrLanguage.English)

Using input As New OcrInput()
	input.LoadPdf("multi-language.pdf")
	Dim result As OcrResult = ocr.Read(input)
	result.SaveAsTextFile("results.txt")
End Using

$vbLabelText $csharpLabel

Detailed OCR Results Objects

IronOCR returns an OCR result object for each operation. Generally, developers access the Text property to get scanned text. However, the results object contains much more detailed information.

// Reference to the path of the source file demonstrating detailed OCR result object usage
:path=/static-assets/ocr/content-code-examples/get-started/get-started-14.cs

// Reference to the path of the source file demonstrating detailed OCR result object usage
using IronOcr;

IronTesseract ocr = new IronTesseract();

// Must be set to true to read barcode
ocr.Configuration.ReadBarCodes = true;
using OcrInput input = new OcrInput();
var pageindices = new int[] { 1, 2 };
input.LoadImageFrames(@"img\sample.tiff", pageindices);

OcrResult result = ocr.Read(input);
var pages = result.Pages;
var words = pages[0].Words;
var barcodes = result.Barcodes;
// Explore here to find a massive, detailed API:
// - Pages, Blocks, Paraphaphs, Lines, Words, Chars
// - Image Export, Fonts Coordinates, Statistical Data, Tables

' Reference to the path of the source file demonstrating detailed OCR result object usage
Imports IronOcr

Private ocr As New IronTesseract()

' Must be set to true to read barcode
ocr.Configuration.ReadBarCodes = True
Using input As New OcrInput()
	Dim pageindices = New Integer() { 1, 2 }
	input.LoadImageFrames("img\sample.tiff", pageindices)
	
	Dim result As OcrResult = ocr.Read(input)
	Dim pages = result.Pages
	Dim words = pages(0).Words
	Dim barcodes = result.Barcodes
	' Explore here to find a massive, detailed API:
	' - Pages, Blocks, Paraphaphs, Lines, Words, Chars
	' - Image Export, Fonts Coordinates, Statistical Data, Tables
End Using

$vbLabelText $csharpLabel

Performance

IronOCR works out of the box with no need for performance tuning or image modification.

Speed is blazing: IronOcr.2020+ is up to 10 times faster and makes over 250% fewer errors than previous builds.

Learn More

To learn more about OCR in C#, VB, F#, or any other .NET language, please read our community tutorials, which give real-world examples of using IronOCR and show the nuances of optimizing the library.

A full API reference for .NET developers is also available.

Chaknith Bin

Chat with engineering team now

Software Engineer

Chaknith works on IronXL and IronBarcode. He has deep expertise in C# and .NET, helping improve the software and support customers. His insights from user interactions contribute to better products, documentation, and overall experience.

Jeffrey T. Fritz

Principal Program Manager - .NET Community Team

Jeff is also a Principal Program Manager for the .NET and Visual Studio teams. He is the executive producer of the .NET Conf virtual conference series and hosts 'Fritz and Friends' a live stream for developers that airs twice weekly where he talks tech and writes code together with viewers. Jeff writes workshops, presentations, and plans content for the largest Microsoft developer events including Microsoft Build, Microsoft Ignite, .NET Conf, and the Microsoft MVP Summit

On This Page

Get Started with OCR in C# and VB.NET

Installation

Install with NuGet Package Manager

Download the IronOCR .ZIP

Instructions for .NET Framework 4.0+ Installation:

Instructions for .NET Standard & .NET Core 2.0+, & .NET 5

Download the IronOCR Installer (Windows only)

Instructions for .NET Framework 4.0+ Installation:

Instructions for .NET Standard & .NET Core 2.0+, & .NET 5

Why Choose IronOCR?

Why C# developers choose IronOCR over Vanilla Tesseract:

OCR with Tesseract 5 - Start Coding in C#

OneLiner

Configurable Hello World

C# PDF OCR

OCR for MultiPage TIFFs

Barcodes and QR

OCR on Specific Areas of Images

OCR for Low Quality Scans

Export OCR results as a Searchable PDF

TIFF to searchable PDF Conversion

Export OCR results as HTML

OCR Image Enhancement Filters

Image Enhancement Code Example

List of OCR Image Filters

125 Language Packs

Language Example

Multiple Language Example

Detailed OCR Results Objects

Performance

Learn More

Ready to Get Started?

On This Page

Get Started with OCR in C# and VB.NET

Installation

Install with NuGet Package Manager

Download the IronOCR .ZIP

Instructions for .NET Framework 4.0+ Installation:

Instructions for .NET Standard & .NET Core 2.0+, & .NET 5

Download the IronOCR Installer (Windows only)

Instructions for .NET Framework 4.0+ Installation:

Instructions for .NET Standard & .NET Core 2.0+, & .NET 5

Why Choose IronOCR?

Why C# developers choose IronOCR over Vanilla Tesseract:

OCR with Tesseract 5 - Start Coding in C#

OneLiner

Configurable Hello World

C# PDF OCR

OCR for MultiPage TIFFs

Barcodes and QR

OCR on Specific Areas of Images

OCR for Low Quality Scans

Export OCR results as a Searchable PDF

TIFF to searchable PDF Conversion

Export OCR results as HTML

OCR Image Enhancement Filters

Image Enhancement Code Example

List of OCR Image Filters

125 Language Packs

Language Example

Multiple Language Example

Detailed OCR Results Objects

Performance

Learn More

Ready to Get Started?

Next step: Start free 30-day Trial

Next step: Start free 30-day Trial

Trusted by Over 2 Million Engineers Worldwide