OCR TOOLS

Install Tesseract (Step-By-Step Tutorial With Images)

Published January 27, 2023
Share:

What is Tesseract OCR?

Tesseract is an open-source software library, released under Apache license agreement. It was originally developed by Hewlett Packard in 1980s. It is a text recognition tool primarily used for identifying and extracting texts from images. Tesseract OCR provides a command prompt interface for performing this functionality.

How to Download Tesseract OCR in Windows

  1. Download Tesseract Installer for Windows
  2. Install Tesseract OCR
  3. Add installation path to Environment Variables
  4. Run Tesseract OCR

1. Download Tesseract Installer for Windows

To use Tesseract command on Windows, we first need to download Tesseract OCR binaries .exe Windows Installer.

There are many places where people can download the latest version of Tesseract OCR. Once such place is from UB Mannheim, which is forked from tesseract-ocr/tesseract (Main Repository).

Install Tesseract, Figure 1: Tesseract Wiki

Tesseract Wiki

Download the tesseract-ocr-w64-setup-5.3.0.20221222.exe (64 bit) Windows Installer.

Tesseract can be installed in Python prompt on macOS using either of the commands below:

brew install tesseract
sudo port install tesseract

2. Install Tesseract OCR

Next, we'll install Tesseract using the .exe file that we downloaded in the previous step. Launch the .exe installer to start Tesseract installation.

Installer Language

Once the unpacking of the setup is completed, the installer's language data dialog will appear. You can install Tesseract to use multiple languages by selecting additional language packs, but here we'll just install the language data for the English language.

Install Tesseract, Figure 2: Tesseract Installer

Tesseract Installer

Click OK and the Installer language for Tesseract OCR is set.

Tesseract OCR Setup

Next, the setup wizard will appear. This Setup Wizard will guide the Tesseract installation for Windows.

Install Tesseract, Figure 3: Tesseract OCR

Tesseract OCR Setup Wizard

Click Next to continue the installation.

Accept License Agreement

Tesseract OCR is licensed under Apache License Version 2.0. As it is open source and free to use, you can redistribute and modify versions of Tesseract without any loyalty concerns.

Install Tesseract, Figure 4: Tesseract License

Tesseract OCR is licensed under Apache License v2.0. Please accept this license to continue with the installation.

Click I Agree to proceed to installation.

Choose Users

You can choose to install Tesseract for multiple users or for a single user.

Install Tesseract, Figure 5: Tesseract Choose Users

Choose to install Tesseract OCR for the Current User (you) or for all user accounts

Click Next to choose components to install with Tesseract.

Choose Components

From the components list to install, ScrollView, Training Tools, Shortcuts creation, and Language data are all selected by default. We will keep all of the default selected options. You can choose any or skip any component based on the needs. Usually all are necessary to install.

Install Tesseract, Figure 6: Tesseract Components

Here, you can choose to include or exclude Tesseract OCR components. For the best results, continue the installation with the default components selected.

Click Next to choose installation location.

Choose Installation Location

Next, we'll choose the location to install Tesseract. Make sure you copy the destination folder path. We will need this later to add the installation location to the machine's path Environment Variable.

Install Tesseract, Figure 7: Tesseract Install Location

Select a install location for the Tesseract OCR library, and remember this location for later.

Click Next to further setup the installation of Tesseract.

Choose the Start Menu Folder

This is the last step in which we will create shortcuts in Start menu. You can name the folder anything but I've kept it the same as default.

Install Tesseract, Figure 8: Tesseract Start Menu

Choose the name of Tesseract OCR's Start Menu Folder

Now, click Install and wait for the installation to complete. Once the installation is done, following screen will appear. Click Finish and we are done with installing Tesseract OCR in Windows successfully.

Install Tesseract, Figure 9: Tesseract Installer

Tesseract OCR Installation is now complete.

3. Add Installation Path to System Environment Variables

Now, we will add the Tesseract installation path to Windows' Environment Variables.

In the Start menu, type "environment variables" or "advanced system settings"

Install Tesseract, Figure 10: System Path Variables

The Windows System Properties Dialog Box

System Properties

Once the System Properties dialog box opens, click on the Advanced, and then click the Environment Variables button, located towards the bottom right of the screen.

The Environment Variables dialog box will be presented to you.

Environment Variables

Under System variables, click on the Path variable.

Install Tesseract, Figure 11: Environment Variables

Accessing the Windows' System Environment Variables

Now, click Edit.

Add Tesseract OCR for Windows Installation Directory to Environment Variables

From the Edit environment variable dialog box, click New. Paste the installation location path which was copied during the second step, and click OK.

Install Tesseract, Figure 12: Edit Environment Variable

Edit Windows' Path System Environment Variable by adding an entry that includes the Absolute path to the Tesseract OCR installation

That's it! We have successfully downloaded, installed, and set the environment variable for Tesseract OCR in Windows machine.

4. Run Tesseract OCR

To check that Tesseract OCR for Windows was successfully installed and added to Environment Variables, open Command prompt (cmd) on your Windows machine, then run the "tesseract" command. If everything worked fine, then a quick explanation usage guide must be displayed with OCR and single options such as Tesseract version.

Install Tesseract, Figure 13: Edit Environment Variable

Run the tesseract command in Windows Commandline (or Windows Powershell) to make sure that the above installation steps were done correctly. The console output is the expected result of a successful Windows installation.

Congratulations! We have successfully installed Tesseract OCR for Windows.

IronOCR Library

IronOCR is a Tesseract-based C# library that allows .NET software developers to identify and extract text from images and PDF documents. It is purely built in .NET, using the most advanced Tesseract engine known anywhere.

Install with NuGet Package Manager

Installing IronOCR in Visual Studio or using Command line with the NuGet Package Manager is very easy. In Visual Studio, navigate to the Menu options with:

Tools > NuGet Package Manager > Package Manager Console

Then in Command line, type the following command:

Install-Package IronOcr

This will install IronOCR with ease and now you can use it to extract its full potential.

You can also download other IronOCR NuGet Packages for different platforms:

IronOCR with Tesseract 5

The below sample code shows how easy it is to use IronOCR Tesseract to read text from an image and perform OCR using C#.

string Text = new IronTesseract().Read(@"test-files/redacted-employmentapp.png").Text;
Console.WriteLine(Text); // Printed text
string Text = new IronTesseract().Read(@"test-files/redacted-employmentapp.png").Text;
Console.WriteLine(Text); // Printed text
Dim Text As String = (New IronTesseract()).Read("test-files/redacted-employmentapp.png").Text
Console.WriteLine(Text) ' Printed text
VB   C#

If you want more robust code, then the following should help you in achieving the same task:

using IronOcr;

var Ocr = new IronTesseract();
using (var Input = new OcrInput()){
    Input.AddImage("test-files/redacted-employmentapp.png");
    // you can add any number of images
    var Result = Ocr.Read(Input);
    Console.WriteLine(Result.Text);
}
using IronOcr;

var Ocr = new IronTesseract();
using (var Input = new OcrInput()){
    Input.AddImage("test-files/redacted-employmentapp.png");
    // you can add any number of images
    var Result = Ocr.Read(Input);
    Console.WriteLine(Result.Text);
}
Imports IronOcr

Private Ocr = New IronTesseract()
Using Input = New OcrInput()
	Input.AddImage("test-files/redacted-employmentapp.png")
	' you can add any number of images
	Dim Result = Ocr.Read(Input)
	Console.WriteLine(Result.Text)
End Using
VB   C#

Input Image

Install Tesseract, Figure 14: Input Image

Sample input image for IronOCR processing

Ouput Image

The output is printed on the Console as:

Install Tesseract, Figure 15: Output Image

The console returned from the execution of IronOCR on the sample image.

Why Choose IronOCR?

IronOCR is very easy to install. It provides a complete and well-documented .NET software library.

IronOCR achieves a 99.8% text-detection accuracy rate without the need for other third-party libraries or webservices.

It also provides multithreading support. Most importantly, IronOCR can work with well over 125 international languages.

Conclusion

In this tutorial, we learned how to download and install Tesseract OCR for Windows machine. Tesseract OCR is an excellent software for C++ developers but however it has some limits. It is not fully developed for .NET. Scanned image files or photographed images need to be processed and standardized to high-resolution, keeping it free from digital noise. Only then, Tesseract can accurately work on them.

In contrast, IronOCR can work with any image provided whether scanned or photographed, with just a single line of code. IronOCR also uses Tesseract as its internal OCR engine, but it is very finely tuned to get the best out of Tesseract especially built for C#, with a high performance and improved features.

You can download the IronOCR software product from this link.

< PREVIOUS
Microsoft OCR Tools (Alternatives in C#)
NEXT >
OCR From PDF (Free Online Tools)