Get Started with OCR in C# and VB.NET
IronOCR is a C# software library allowing .NET platform software developers to recognize and read text from images and PDF documents. It is a pure .NET OCR library using the most advanced Tesseract engine known, anywhere.
Installation
Install with NuGet Package Manager
Install IronOcr in Visual Studio or at the command line with the NuGet Package Manager. In Visual Studio, navigate to the console with:
- Tools ->
- NuGet Package Manager ->
- Package Manager Console
Install-Package IronOcr
And check out IronOcr on NuGet for more about version updates and installation.
There are other IronOCR NuGet Packages available for different platforms:
- Windows: https://www.nuget.org/packages/IronOcr
- Linux: https://www.nuget.org/packages/IronOcr.Linux
- MacOS: https://www.nuget.org/packages/IronOcr.MacOs
- MacOS (ARM): https://www.nuget.org/packages/IronOcr.MacOs.ARM
Download the IronOCR .ZIP
You may also choose to download IronOCR via .ZIP file instead. Click to directly download the DLL. Once you have the .zip downloaded:
Instructions for .NET Framework 4.0+ Installation:
- Include the IronOcr.dll in net40 folder into your project
And then add Assembly references to:
- System.Configuration
- System.Drawing
- System.Web
Instructions for .NET Standard & .NET Core 2.0+, & .NET 5
- Include the IronOcr.dll in netstandard2.0 folder into your project
And then add a NuGet Package Reference to:
- System.Drawing.Common 4.7 or higher
Download the IronOCR Installer (Windows only)
Another option is to download our IronOCR installer which will install all the required resources for IronOCR to work out-of-the-box. Please keep in mind this option is only for Windows systems. To download the installer please click here. Once you have the .zip downloaded:
Instructions for .NET Framework 4.0+ Installation:
- Include the IronOcr.dll in net40 folder into your project
And then add Assembly references to:
- System.Configuration
- System.Drawing
- System.Web
Instructions for .NET Standard & .NET Core 2.0+, & .NET 5
- Include the IronOcr.dll in netstandard2.0 folder into your project
And then add a NuGet Package Reference to:
- System.Drawing.Common 4.7 or higher
Why Choose IronOCR?
IronOCR is an easy-to-install, complete and well-documented .NET software library.
Choose IronOCR to achieve 99.8%+ OCR accuracy without using any external web services, ongoing fees or sending confidential documents over the internet.
Why C# developers choose IronOCR over Vanilla Tesseract:
- Install as a single DLL or NuGet
- Includes Tesseract 5, 4, and 3 Engines out of the box.
- Accuracy 99.8% significantly outperforms regular Tesseract.
- Blazing Speed and MultiThreading
- MVC, WebApp, Desktop, Console & Server Application compatible
- No Exes or C++ code to work with
- Full PDF OCR support
- Perform OCR on almost any Image file or PDF
- Full .NET Core, Standard, and Framework support
- Deploy on Windows, Mac, Linux, Azure, Docker, Lambda, AWS
- Read barcodes and QR codes
- Export OCR results as XHTML
- Export OCR to searchable PDF documents
- Multithreading support
- 125 international languages all managed via NuGet or OcrData files
- Extract Images, Coordinates, Statistics, and Fonts. Not just text.
- Can be used to redistribute Tesseract OCR inside commercial & proprietary applications.
IronOCR shines when working with real-world images and imperfect documents such as photographs, or scans of low resolution which may have digital noise or imperfections.
Other free OCR libraries for the .NET platform such as other .NET Tesseract APIs and web services do not perform so well on these real-world use cases.
OCR with Tesseract 5 - Start Coding in C#
The code sample below shows how easy it is to read text from an image using C# or VB .NET.
OneLiner
:path=/static-assets/ocr/content-code-examples/get-started/get-started-1.cs
string Text = new IronTesseract().Read(@"img\Screenshot.png").Text;
Dim Text As String = (New IronTesseract()).Read("img\Screenshot.png").Text
Configurable Hello World
:path=/static-assets/ocr/content-code-examples/get-started/get-started-2.cs
using IronOcr;
IronTesseract ocr = new IronTesseract();
using OcrInput input = new OcrInput();
// Add multiple images
input.LoadImage("images/sample.jpeg");
OcrResult result = ocr.Read(input);
Console.WriteLine(result.Text);
Imports IronOcr
Private ocr As New IronTesseract()
Private OcrInput As using
' Add multiple images
input.LoadImage("images/sample.jpeg")
Dim result As OcrResult = ocr.Read(input)
Console.WriteLine(result.Text)
C# PDF OCR
The same approach can similarly be used to extract text from any PDF document.
:path=/static-assets/ocr/content-code-examples/get-started/get-started-3.cs
using IronOcr;
IronTesseract ocr = new IronTesseract();
using OcrInput input = new OcrInput();
// We can also select specific PDF page numbers to OCR
input.LoadPdf("example.pdf", Password: "password");
OcrResult result = ocr.Read(input);
Console.WriteLine(result.Text);
// 1 page for every page of the PDF
Console.WriteLine($"{result.Pages.Length} Pages");
Imports IronOcr
Private ocr As New IronTesseract()
Private OcrInput As using
' We can also select specific PDF page numbers to OCR
input.LoadPdf("example.pdf", Password:= "password")
Dim result As OcrResult = ocr.Read(input)
Console.WriteLine(result.Text)
' 1 page for every page of the PDF
Console.WriteLine($"{result.Pages.Length} Pages")
OCR for MultiPage TIFFs
:path=/static-assets/ocr/content-code-examples/get-started/get-started-4.cs
using IronOcr;
IronTesseract ocr = new IronTesseract();
using OcrInput input = new OcrInput();
var pageindices = new int[] { 1, 2 };
input.LoadImageFrames("multi-frame.tiff", pageindices);
OcrResult result = ocr.Read(input);
Console.WriteLine(result.Text);
Imports IronOcr
Private ocr As New IronTesseract()
Private OcrInput As using
Private pageindices = New Integer() { 1, 2 }
input.LoadImageFrames("multi-frame.tiff", pageindices)
Dim result As OcrResult = ocr.Read(input)
Console.WriteLine(result.Text)
Barcodes and QR
A unique feature of IronOCR is it can read barcodes and QR codes from documents while it is scanning for text. Instances of the OcrResult.OcrBarcode
class give the developer detailed information about each scanned barcode.
:path=/static-assets/ocr/content-code-examples/get-started/get-started-5.cs
using IronOcr;
IronTesseract ocr = new IronTesseract();
ocr.Configuration.ReadBarCodes = true;
using OcrInput input = new OcrInput();
input.LoadImage("img/Barcode.png");
OcrResult Result = ocr.Read(input);
foreach (var Barcode in Result.Barcodes)
{
// type and location properties also exposed
Console.WriteLine(Barcode.Value);
}
Imports IronOcr
Private ocr As New IronTesseract()
ocr.Configuration.ReadBarCodes = True
Using input As New OcrInput()
input.LoadImage("img/Barcode.png")
Dim Result As OcrResult = ocr.Read(input)
For Each Barcode In Result.Barcodes
' type and location properties also exposed
Console.WriteLine(Barcode.Value)
Next Barcode
End Using
OCR on Specific Areas of Images
All of IronOCR's scanning and reading methods provide the ability to specify exactly which part of a page or pages we wish to read text from. This is very useful when we are looking at standardized forms and can save a lot of time and improve efficiency.
To use crop regions, we will need to add a system reference to System.Drawing
so that we can use the System.Drawing.Rectangle
object.
:path=/static-assets/ocr/content-code-examples/get-started/get-started-6.cs
using IronOcr;
IronTesseract ocr = new IronTesseract();
using OcrInput input = new OcrInput();
// Dimensions are in pixel
var contentArea = new System.Drawing.Rectangle() { X = 215, Y = 1250, Height = 280, Width = 1335 };
input.LoadImage("document.png", contentArea);
OcrResult result = ocr.Read(input);
Console.WriteLine(result.Text);
Imports IronOcr
Private ocr As New IronTesseract()
Private OcrInput As using
' Dimensions are in pixel
Private contentArea = New System.Drawing.Rectangle() With {
.X = 215,
.Y = 1250,
.Height = 280,
.Width = 1335
}
input.LoadImage("document.png", contentArea)
Dim result As OcrResult = ocr.Read(input)
Console.WriteLine(result.Text)
OCR for Low Quality Scans
The IronOCR OcrInput
class can fix scans that normal Tesseract cannot read.
:path=/static-assets/ocr/content-code-examples/get-started/get-started-7.cs
using IronOcr;
IronTesseract ocr = new IronTesseract();
using OcrInput input = new OcrInput();
var pageindices = new int[] { 1, 2 };
input.LoadImageFrames(@"img\Potter.tiff", pageindices);
// fixes digital noise and poor scanning
input.DeNoise();
// fixes rotation and perspective
input.Deskew();
OcrResult result = ocr.Read(input);
Console.WriteLine(result.Text);
Imports IronOcr
Private ocr As New IronTesseract()
Private OcrInput As using
Private pageindices = New Integer() { 1, 2 }
input.LoadImageFrames("img\Potter.tiff", pageindices)
' fixes digital noise and poor scanning
input.DeNoise()
' fixes rotation and perspective
input.Deskew()
Dim result As OcrResult = ocr.Read(input)
Console.WriteLine(result.Text)
Export OCR results as a Searchable PDF
:path=/static-assets/ocr/content-code-examples/get-started/get-started-8.cs
using IronOcr;
IronTesseract ocr = new IronTesseract();
using OcrInput input = new OcrInput();
input.Title = "Quarterly Report";
input.LoadImage("image1.jpeg");
input.LoadImage("image2.png");
var pageindices = new int[] { 1, 2 };
input.LoadImageFrames("image3.gif", pageindices);
OcrResult result = ocr.Read(input);
result.SaveAsSearchablePdf("searchable.pdf");
Imports IronOcr
Private ocr As New IronTesseract()
Private OcrInput As using
input.Title = "Quarterly Report"
input.LoadImage("image1.jpeg")
input.LoadImage("image2.png")
Dim pageindices = New Integer() { 1, 2 }
input.LoadImageFrames("image3.gif", pageindices)
Dim result As OcrResult = ocr.Read(input)
result.SaveAsSearchablePdf("searchable.pdf")
TIFF to searchable PDF Conversion
:path=/static-assets/ocr/content-code-examples/get-started/get-started-9.cs
using IronOcr;
IronTesseract ocr = new IronTesseract();
using OcrInput input = new OcrInput();
var pageindices = new int[] { 1, 2 };
input.LoadImageFrames("example.tiff", pageindices);
ocr.Read(input).SaveAsSearchablePdf("searchable.pdf");
Imports IronOcr
Private ocr As New IronTesseract()
Private OcrInput As using
Private pageindices = New Integer() { 1, 2 }
input.LoadImageFrames("example.tiff", pageindices)
ocr.Read(input).SaveAsSearchablePdf("searchable.pdf")
Export OCR results as HTML
:path=/static-assets/ocr/content-code-examples/get-started/get-started-10.cs
using IronOcr;
IronTesseract ocr = new IronTesseract();
using OcrInput input = new OcrInput();
input.Title = "Html Title";
input.LoadImage("image1.jpeg");
OcrResult Result = ocr.Read(input);
Result.SaveAsHocrFile("results.html");
Imports IronOcr
Private ocr As New IronTesseract()
Private OcrInput As using
input.Title = "Html Title"
input.LoadImage("image1.jpeg")
Dim Result As OcrResult = ocr.Read(input)
Result.SaveAsHocrFile("results.html")
OCR Image Enhancement Filters
IronOCR provides unique filters to OcrInput
objects to improve OCR performance.
Image Enhancement Code Example
:path=/static-assets/ocr/content-code-examples/get-started/get-started-11.cs
using IronOcr;
IronTesseract ocr = new IronTesseract();
using OcrInput input = new OcrInput();
input.LoadImage("LowQuality.jpeg");
// fixes digital noise and poor scanning
input.DeNoise();
// fixes rotation and perspective
input.Deskew();
OcrResult result = ocr.Read(input);
Console.WriteLine(result.Text);
Imports IronOcr
Private ocr As New IronTesseract()
Private OcrInput As using
input.LoadImage("LowQuality.jpeg")
' fixes digital noise and poor scanning
input.DeNoise()
' fixes rotation and perspective
input.Deskew()
Dim result As OcrResult = ocr.Read(input)
Console.WriteLine(result.Text)
List of OCR Image Filters
Input filters to enhance OCR performance which are built into IronOCR include:
OcrInput.Rotate(double degrees)
- Rotates images by a number of degrees clockwise. For anti-clockwise rotation, use negative numbers.OcrInput.Binarize()
- This filter converts every pixel to either black or white with no middle ground, potentially improving OCR performance in very low contrast images.OcrInput.ToGrayScale()
- Converts every pixel into a shade of grayscale. It may not improve accuracy but could improve speed.OcrInput.Contrast()
- Automatically increases contrast, often improving speed and accuracy in low contrast scans.OcrInput.DeNoise()
- Removes digital noise, recommended only when noise is expected.OcrInput.Invert()
- Inverts every color (white becomes black and vice versa).OcrInput.Dilate()
- Advances morphology, adds pixels to object boundaries, opposite of Erode.OcrInput.Erode()
- Advances morphology, removes pixels from object boundaries, opposite of Dilate.OcrInput.Deskew()
- Rotates an image to orient it correctly. Useful because Tesseract's skew tolerance is limited.OcrInput.EnhanceResolution
- Enhances resolution of low-quality images. This setting is generally used to manage low DPI input automatically.EnhanceResolution
detects low-resolution images (below 275 dpi), upscales them, and sharpens text for better OCR results. Though time-consuming, it often reduces overall OCR operation time.Language
- Supports selection from 22 international language packs.Strategy
- Allows selection between fast and less accurate or advanced (using AI for accuracy) strategies based on the statistical relationship of words.ColorSpace
- Choose to OCR in grayscale or color; grayscale is generally optimal though color can be better in certain contrast scenarios.DetectWhiteTextOnDarkBackgrounds
- Adjusts for negative images, automatically detecting and reading white text on dark backgrounds.InputImageType
- Guides the OCR library, specifying whether it is working on a full document or a snippet.RotateAndStraighten
- Allows IronOCR to properly handle documents that are rotated or affected by perspective distortions.ReadBarcodes
- Automatically reads barcodes and QR codes concurrently with text scanning without significant added time.ColorDepth
- Determines bits per pixel for color depth in the OCR process. A higher depth can increase quality but also the time of processing.
125 Language Packs
IronOCR supports 125 international languages via language packs which are distributed as DLLs, available for download from this website, or from the NuGet Package Manager.
Languages include German, French, English, Chinese, Japanese, among others. Specialist language packs exist for MRZ, MICR checks, financial data, license plates, etc. Additionally, custom tesseract ".traineddata" files can be used.
Language Example
// Reference to the path of the source file that demonstrates setting language packs for OCR
:path=/static-assets/ocr/content-code-examples/get-started/get-started-12.cs
// Reference to the path of the source file that demonstrates setting language packs for OCR
using IronOcr;
// PM> Install IronOcr.Languages.Arabic
IronTesseract ocr = new IronTesseract();
ocr.Language = OcrLanguage.Arabic;
using OcrInput input = new OcrInput();
var pageindices = new int[] { 1, 2 };
input.LoadImageFrames("img/arabic.gif", pageindices);
// Add image filters if needed
// In this case, even thought input is very low quality
// IronTesseract can read what conventional Tesseract cannot.
OcrResult result = ocr.Read(input);
// Console can't print Arabic on Windows easily.
// Let's save to disk instead.
result.SaveAsTextFile("arabic.txt");
' Reference to the path of the source file that demonstrates setting language packs for OCR
Imports IronOcr
' PM> Install IronOcr.Languages.Arabic
Private ocr As New IronTesseract()
ocr.Language = OcrLanguage.Arabic
Using input As New OcrInput()
Dim pageindices = New Integer() { 1, 2 }
input.LoadImageFrames("img/arabic.gif", pageindices)
' Add image filters if needed
' In this case, even thought input is very low quality
' IronTesseract can read what conventional Tesseract cannot.
Dim result As OcrResult = ocr.Read(input)
' Console can't print Arabic on Windows easily.
' Let's save to disk instead.
result.SaveAsTextFile("arabic.txt")
End Using
Multiple Language Example
It is also possible to OCR using multiple languages at the same time. This can enhance OCR of English metadata and URLs in Unicode documents.
// Reference to the path of the source file that demonstrates multi-language OCR
:path=/static-assets/ocr/content-code-examples/get-started/get-started-13.cs
// Reference to the path of the source file that demonstrates multi-language OCR
using IronOcr;
// PM> Install IronOcr.Languages.ChineseSimplified
IronTesseract ocr = new IronTesseract();
ocr.Language = OcrLanguage.ChineseSimplified;
// We can add any number of languages
ocr.AddSecondaryLanguage(OcrLanguage.English);
using OcrInput input = new OcrInput();
input.LoadPdf("multi-language.pdf");
OcrResult result = ocr.Read(input);
result.SaveAsTextFile("results.txt");
' Reference to the path of the source file that demonstrates multi-language OCR
Imports IronOcr
' PM> Install IronOcr.Languages.ChineseSimplified
Private ocr As New IronTesseract()
ocr.Language = OcrLanguage.ChineseSimplified
' We can add any number of languages
ocr.AddSecondaryLanguage(OcrLanguage.English)
Using input As New OcrInput()
input.LoadPdf("multi-language.pdf")
Dim result As OcrResult = ocr.Read(input)
result.SaveAsTextFile("results.txt")
End Using
Detailed OCR Results Objects
IronOCR returns an OCR result object for each operation. Generally, developers access the Text
property to get scanned text. However, the results object contains much more detailed information.
// Reference to the path of the source file demonstrating detailed OCR result object usage
:path=/static-assets/ocr/content-code-examples/get-started/get-started-14.cs
// Reference to the path of the source file demonstrating detailed OCR result object usage
using IronOcr;
IronTesseract ocr = new IronTesseract();
// Must be set to true to read barcode
ocr.Configuration.ReadBarCodes = true;
using OcrInput input = new OcrInput();
var pageindices = new int[] { 1, 2 };
input.LoadImageFrames(@"img\sample.tiff", pageindices);
OcrResult result = ocr.Read(input);
var pages = result.Pages;
var words = pages[0].Words;
var barcodes = result.Barcodes;
// Explore here to find a massive, detailed API:
// - Pages, Blocks, Paraphaphs, Lines, Words, Chars
// - Image Export, Fonts Coordinates, Statistical Data, Tables
' Reference to the path of the source file demonstrating detailed OCR result object usage
Imports IronOcr
Private ocr As New IronTesseract()
' Must be set to true to read barcode
ocr.Configuration.ReadBarCodes = True
Using input As New OcrInput()
Dim pageindices = New Integer() { 1, 2 }
input.LoadImageFrames("img\sample.tiff", pageindices)
Dim result As OcrResult = ocr.Read(input)
Dim pages = result.Pages
Dim words = pages(0).Words
Dim barcodes = result.Barcodes
' Explore here to find a massive, detailed API:
' - Pages, Blocks, Paraphaphs, Lines, Words, Chars
' - Image Export, Fonts Coordinates, Statistical Data, Tables
End Using
Performance
IronOCR works out of the box with no need for performance tuning or image modification.
Speed is blazing: IronOcr.2020+ is up to 10 times faster and makes over 250% fewer errors than previous builds.
Learn More
To learn more about OCR in C#, VB, F#, or any other .NET language, please read our community tutorials, which give real-world examples of using IronOCR and show the nuances of optimizing the library.
A full API reference for .NET developers is also available.