Test in a live environment
Test in production without watermarks.
Works wherever you need it to.
Tesseract is an optical character recognition engine that can be used on a variety of operating systems. It is a free software, released under the Apache License. In this guide, I will take you through the steps that I followed in order to install Tesseract on my Windows 10 machine. The major version 5 is the current stable version and began with release 5.0. 0 on November 30, 2021.
To install language data: sudo port install tesseract -<langcode>
A list of langcodes is found on the MacPorts Tesseract page Homebrew. The first step to install Tesseract OCR for Windows is to download the .exe installer that corresponds to your machine’s operating system
Next, we’ll need to configure the Tesseract installation. If you’re feeling confident and only want to run Tesseract OCR for Windows with the default language set to English, running through the installation screens with all of the default options selected should work.
This is just the language for the dialog boxes and help information. If we want to then we can run Tesseract OCR for Windows in multiple languages:
Installer language for Tesseract OCR for Windows
The setup screen recommends that all other applications are closed before continuing with the installation.
The Tesseract OCR for Windows installation screen.
Next, we’ll choose the installation location. Before proceeding to the next step, make sure to copy the install location to a .txt file. We will need to add the installation location to our machine’s environment variables once the installation is complete.
Choose the installation location.
By default, the ScrollView, Training Tools, Shortcuts creation, and Language data are all selected. Unless you have a specific reason not to install these, we will want to keep all of these selected.
Default Tesseract OCR for Windows installation components.
If we scroll down and expand the ‘Additional script data’, we will see that we have the option to download and install additional script data. This can be helpful in improving the accuracy of text extraction from certain scripted languages. It’s up to you if you want to install these.
Optional script installation components.
In the last step of the installation, we’ll be asked to choose the start menu folder for Tesseract OCR for Windows shortcuts. I’ve left mine set to the default name: ‘Tesseract-OCR’.
Choose the start menu folder for the Tesseract OCR for Windows shortcuts.
After we click install, Tesseract OCR for Windows will begin installing. Our next step is to add the installation path to our machine’s environment variables.
To add the installation location to our environment variables, go to the Start menu and search for ‘environment variables’. You should see a result to edit the system environment variables. If you don’t, you can always use the following steps: Start menu > Control Panel > Edit the system environment variables.
Searching for ‘environment variables’
When presented with the ‘System Properties’ dialog box , we’ll want to make sure the Advanced tab is clicked, then click the Environment Variables button towards the bottom right of the screen.
Under system variables, we will click the Edit button.
When presented with the "Edit environment variable" screen, click the New button, and paste in your Tesseract OCR installation path that we copied earlier in Step 2. Once you’ve done this, click the ‘OK‘ button.
That’s it! Now that we’ve run the .exe installer and added the Tesseract OCR for Windows install location to our environment variables, we can test that our installation is working by running Tesseract on a test image.
To test that Tesseract OCR for Windows was installed successfully, open command prompt on your machine, then run the Tesseract command. You should see an output with a quick explanation of Tesseract’s usage options.
Checking successful installation of Tesseract OCR for Windows
Congratulations! You’ve successfully installed Tesseract OCR for Windows on your machine.
IronOCR provides Tesseract OCR on Mac, Windows, Linux, Azure and Docker for:
IronOCR reads text, barcodes, and QR codes from all major image and PDF formats using the latest Tesseract 5 engine. This library adds OCR functionality to Desktop, Console and Web applications in minutes. It supports 127+ international languages. Licenses start from $749.
Download the IronOcr DLL directly to your machine.
Alternatively, you can install it through NuGet.
Install-Package IronOcr
Add this code to the startup of your application before IronOCR is used.
IronOcr.Installation.LicenseKey = "IRONOCR-MYLICENSE-KEY-1EF01";
IronOcr.Installation.LicenseKey = "IRONOCR-MYLICENSE-KEY-1EF01";
IronOcr.Installation.LicenseKey = "IRONOCR-MYLICENSE-KEY-1EF01"
Test if your key has been installed correctly.
BoolresultIronOcr.License.IsValidLicense("IRONOCR-MYLICENSE-KEY-1EF0");
BoolresultIronOcr.License.IsValidLicense("IRONOCR-MYLICENSE-KEY-1EF0");
BoolresultIronOcr.License.IsValidLicense("IRONOCR-MYLICENSE-KEY-1EF0")
Get started with the project
// PM > Install-Package IronOcr
// using IronOcr;
var Ocr = new IronTesseract();
// Hundreds of languages available
Ocr.Language = OcrLanguage.English;
using (var Input = new OcrInput())
{
OcrInput.Add(@"img\example.tiff")
// Input.DeNoise(); optional
// Input.Deskew(); optional
IronOcr.OcrResult Result = Ocr.Read(Input);
Console.WriteLine(Result.Text);
// Explore the OcrResult using IntelliSense
}
// PM > Install-Package IronOcr
// using IronOcr;
var Ocr = new IronTesseract();
// Hundreds of languages available
Ocr.Language = OcrLanguage.English;
using (var Input = new OcrInput())
{
OcrInput.Add(@"img\example.tiff")
// Input.DeNoise(); optional
// Input.Deskew(); optional
IronOcr.OcrResult Result = Ocr.Read(Input);
Console.WriteLine(Result.Text);
// Explore the OcrResult using IntelliSense
}
' PM > Install-Package IronOcr
' using IronOcr;
Dim Ocr = New IronTesseract()
' Hundreds of languages available
Ocr.Language = OcrLanguage.English
Using Input = New OcrInput()
OcrInput.Add("img\example.tiff") IronOcr.OcrResult Result = Ocr.Read(Input)
Console.WriteLine(Result.Text)
' ' Explore the OcrResult using IntelliSense
End Using
Use NuGet Package Manager to install the IronOCR NuGet Package into your Visual Studio solution.
// PM > Install-Package IronOcr
// using IronOcr;
var Ocr = new IronTesseract();
// Hundreds of languages available
Ocr.Language = OcrLanguage.English;
using (var Input = new OcrInput())
{
OcrInput.Add(@"img\example.tiff")
// Input.DeNoise(); optional
// Input.Deskew(); optional
IronOcr.OcrResult Result = Ocr.Read(Input);
Console.WriteLine(Result.Text);
// Explore the OcrResult using IntelliSense
}
// PM > Install-Package IronOcr
// using IronOcr;
var Ocr = new IronTesseract();
// Hundreds of languages available
Ocr.Language = OcrLanguage.English;
using (var Input = new OcrInput())
{
OcrInput.Add(@"img\example.tiff")
// Input.DeNoise(); optional
// Input.Deskew(); optional
IronOcr.OcrResult Result = Ocr.Read(Input);
Console.WriteLine(Result.Text);
// Explore the OcrResult using IntelliSense
}
' PM > Install-Package IronOcr
' using IronOcr;
Dim Ocr = New IronTesseract()
' Hundreds of languages available
Ocr.Language = OcrLanguage.English
Using Input = New OcrInput()
OcrInput.Add("img\example.tiff") IronOcr.OcrResult Result = Ocr.Read(Input)
Console.WriteLine(Result.Text)
' ' Explore the OcrResult using IntelliSense
End Using
With IronOCR, all Tesseract installation happens entirely using the NuGet Package Manager.
Install-Package IronOcr
To date, IronTesseract is the only known implementation of Tesseract 5 for .NET Framework or Core.
// using IronOcr;
var Ocr = new IronTesseract(); // nothing to configure
using (var Input = new OcrInput(@"images\image.png"))
{
var Result = Ocr.Read(Input);
Console.WriteLine(Result.Text);
}
// using IronOcr;
var Ocr = new IronTesseract(); // nothing to configure
using (var Input = new OcrInput(@"images\image.png"))
{
var Result = Ocr.Read(Input);
Console.WriteLine(Result.Text);
}
' using IronOcr;
Dim Ocr = New IronTesseract() ' nothing to configure
Using Input = New OcrInput("images\image.png")
Dim Result = Ocr.Read(Input)
Console.WriteLine(Result.Text)
End Using
// using IronOcr;
var Ocr = new IronTesseract();
Ocr.Configuration.TesseractVersion = TesseractVersion.Tesseract4;
using (var Input = new OcrInput(@"images\image.png"))
{
var Result = Ocr.Read(Input);
Console.WriteLine(Result.Text);
}
// using IronOcr;
var Ocr = new IronTesseract();
Ocr.Configuration.TesseractVersion = TesseractVersion.Tesseract4;
using (var Input = new OcrInput(@"images\image.png"))
{
var Result = Ocr.Read(Input);
Console.WriteLine(Result.Text);
}
' using IronOcr;
Dim Ocr = New IronTesseract()
Ocr.Configuration.TesseractVersion = TesseractVersion.Tesseract4
Using Input = New OcrInput("images\image.png")
Dim Result = Ocr.Read(Input)
Console.WriteLine(Result.Text)
End Using
If Tesseract encounters an image that is rotated, skewed, is of a low DPI, scanned, or has background noise, it becomes almost impossible for Tesseract to get data from that image. In addition, Tesseract will also take a very long time to process that document before providing you with nonsensical information.
IronOCR takes this headache away. Users often achieve 99.8-100% accuracy with minimal configuration.
Only accepts Leptonica PIX image format which is an IntPtr C++ object in C#. PIX objects are not managed memory — and failure to handle them with care in C# results in memory leaks.
Images are memory managed. PDF & Tiff supported. System. Drawing, Stream, and Byte Array are included for every file format.
Broad image support:
Google Tesseract can perform fast and accurate results if properly tuned and input images have been preprocessed using Photoshop or ImageMagick.
The IronOcr .NET Tesseract DLL works accurately and at speed for most images out of the box. We have implemented multithreading to make use of the multi-core processors that most machines now use. Even low-resolution images generally work with a high degree of accuracy in your program. No PhotoShop required.
We have two free choices:
A managed and tested .NET Library for Tesseract called IronTesseract.
Fully documented with IntelliSense support.
Supports only 100 languages.
Supports 127+ languages.
Tesseract is an excellent resource for C++ developers, but it is not a complete OCR library for .NET. Scanned or photographed images need to be processed so as to be orthogonal, standardized, high-resolution, and free of digital noise before Tesseract can accurately work with them.
In contrast, IronOCR can do this and more, with just a single line of code. It is true that IronOCR uses Tesseract for its internal OCR engine, a very finely-tuned Tesseract, built for C#, with a lot of performance improvements and features added as standard.
9 .NET API products for your office documents