OCR TOOLS

Install Tesseract (Step-By-Step Tutorial With Images)

What is Tesseract OCR?

Tesseract is an open-source software library, released under the Apache license agreement. It was originally developed by Hewlett Packard in the 1980s. It is a text recognition tool primarily used for identifying and extracting texts from images. Tesseract OCR provides a command prompt interface for performing this functionality.

How to Download Tesseract OCR in Windows

  1. Download Tesseract Installer for Windows
  2. Install Tesseract OCR
  3. Add installation path to Environment Variables
  4. Run Tesseract OCR

1. Download Tesseract Installer for Windows

To use the Tesseract command on Windows, we first need to download the Tesseract OCR binaries .exe Windows Installer.

There are many places where you can download the latest version of Tesseract OCR. One such place is from UB Mannheim, which is forked from tesseract-ocr/tesseract (Main Repository).

Install Tesseract, Figure 1: Tesseract Wiki

Tesseract Wiki

Download the tesseract-ocr-w64-setup-5.3.0.20221222.exe (64 bit) Windows Installer.

For macOS users, Tesseract can be installed in the terminal using either of the commands below:

brew install tesseract
brew install tesseract
SHELL
sudo port install tesseract
sudo port install tesseract
SHELL

2. Install Tesseract OCR

Next, we'll install Tesseract using the .exe file we downloaded in the previous step. Launch the .exe installer to start Tesseract installation.

Installer Language

Once the unpacking of the setup is completed, the installer's language data dialog will appear. You can install Tesseract to use multiple languages by selecting additional language packs, but here we'll just install the language data for the English language.

Install Tesseract, Figure 2: Tesseract Installer

Tesseract Installer

Click OK, and the Installer language for Tesseract OCR is set.

Tesseract OCR Setup

Next, the setup wizard will appear. This Setup Wizard will guide the Tesseract installation for Windows.

Install Tesseract, Figure 3: Tesseract OCR

Tesseract OCR Setup Wizard

Click Next to continue the installation.

Accept License Agreement

Tesseract OCR is licensed under the Apache License Version 2.0. As it is open source and free to use, you can redistribute and modify versions of Tesseract without any royalty concerns.

Install Tesseract, Figure 4: Tesseract License

Tesseract OCR is licensed under Apache License v2.0. Please accept this license to continue with the installation.

Click I Agree to proceed to installation.

Choose Users

You can choose to install Tesseract for multiple users or for a single user.

Install Tesseract, Figure 5: Tesseract Choose Users

Choose to install Tesseract OCR for the Current User (you) or for all user accounts

Click Next to choose the components to install with Tesseract.

Choose Components

From the components list to install, ScrollView, Training Tools, Shortcuts creation, and Language data are all selected by default. We will keep all of the default selected options. You can choose any or skip any component based on the needs. Usually, all are necessary to install.

Install Tesseract, Figure 6: Tesseract Components

Here, you can choose to include or exclude Tesseract OCR components. For the best results, continue the installation with the default components selected.

Click Next to choose the installation location.

Choose Installation Location

Next, we'll choose the location to install Tesseract. Make sure you copy the destination folder path. We will need this later to add the installation location to the machine's path Environment Variable.

Install Tesseract, Figure 7: Tesseract Install Location

Select an install location for the Tesseract OCR library, and remember this location for later.

Click Next to further setup the installation of Tesseract.

Choose the Start Menu Folder

This is the last step in which we will create shortcuts in the Start menu. You can name the folder anything, but I've kept it the same as the default.

Install Tesseract, Figure 8: Tesseract Start Menu

Choose the name of Tesseract OCR's Start Menu Folder

Now, click Install and wait for the installation to complete. Once the installation is done, the following screen will appear. Click Finish, and we are done with installing Tesseract OCR in Windows successfully.

Install Tesseract, Figure 9: Tesseract Installer

Tesseract OCR Installation is now complete.

3. Add Installation Path to System Environment Variables

Now, we will add the Tesseract installation path to Windows' Environment Variables.

In the Start menu, type "environment variables" or "advanced system settings"

Install Tesseract, Figure 10: System Path Variables

The Windows System Properties Dialog Box

System Properties

Once the System Properties dialog box opens, click on the Advanced tab, and then click the Environment Variables button, located towards the bottom right of the screen.

The Environment Variables dialog box will be presented to you.

Environment Variables

Under System variables, click on the Path variable.

Install Tesseract, Figure 11: Environment Variables

Access the Windows' System Environment Variables

Now, click Edit.

Add Tesseract OCR for Windows Installation Directory to Environment Variables

From the Edit environment variable dialog box, click New. Paste the installation location path which was copied during the second step, and click OK.

Install Tesseract, Figure 12: Edit Environment Variable

Edit Windows' Path System Environment Variable by adding an entry that includes the Absolute path to the Tesseract OCR installation

That's it! We have successfully downloaded, installed, and set the environment variable for Tesseract OCR in Windows machine.

4. Run Tesseract OCR

To check that Tesseract OCR for Windows was successfully installed and added to Environment Variables, open the Command prompt (cmd) on your Windows machine, then run the "tesseract" command. If everything worked fine, then a quick usage guide must be displayed with OCR and other single options such as the Tesseract version.

Install Tesseract, Figure 13: Edit Environment Variable

Run the tesseract command in Windows Commandline (or Windows Powershell) to make sure that the above installation steps were done correctly. The console output is the expected result of a successful Windows installation.

Congratulations! We have successfully installed Tesseract OCR for Windows.

IronOCR Library

IronOCR is a Tesseract-based C# library that allows .NET software developers to identify and extract text from images and PDF documents. It is purely built in .NET, using the most advanced Tesseract engine known anywhere.

Install with NuGet Package Manager

Installing IronOCR in Visual Studio or using the Command line with the NuGet Package Manager is straightforward. In Visual Studio, navigate to the Menu options with:

Tools > NuGet Package Manager > Package Manager Console

Then in Command line, type the following command:

Install-Package IronOcr

This will install IronOCR with ease, and now you can use it to extract its full potential.

You can also download other IronOCR NuGet Packages for different platforms:

IronOCR with Tesseract 5

The below sample code shows how easy it is to use IronOCR Tesseract to read text from an image and perform OCR using C#.

// Import the IronOCR library
using IronOcr;

// Create an instance of IronTesseract
var Ocr = new IronTesseract();

string Text = Ocr.Read(@"test-files/redacted-employmentapp.png").Text;

// Output the extracted text to the console
Console.WriteLine(Text); // Printed text
// Import the IronOCR library
using IronOcr;

// Create an instance of IronTesseract
var Ocr = new IronTesseract();

string Text = Ocr.Read(@"test-files/redacted-employmentapp.png").Text;

// Output the extracted text to the console
Console.WriteLine(Text); // Printed text
' Import the IronOCR library
Imports IronOcr

' Create an instance of IronTesseract
Private Ocr = New IronTesseract()

Private Text As String = Ocr.Read("test-files/redacted-employmentapp.png").Text

' Output the extracted text to the console
Console.WriteLine(Text) ' Printed text
$vbLabelText   $csharpLabel

If you want more robust code, then the following should help you in achieving the same task:

// Import the IronOCR library
using IronOcr;

// Create an instance of IronTesseract
var Ocr = new IronTesseract();

// Using the OcrInput class to handle multiple images
using (var Input = new OcrInput()){
    // Add an image to the input collection
    Input.AddImage("test-files/redacted-employmentapp.png");
    // You can add any number of images

    // Read the OCR text from the input
    var Result = Ocr.Read(Input);

    // Output the extracted text to the console
    Console.WriteLine(Result.Text);
}
// Import the IronOCR library
using IronOcr;

// Create an instance of IronTesseract
var Ocr = new IronTesseract();

// Using the OcrInput class to handle multiple images
using (var Input = new OcrInput()){
    // Add an image to the input collection
    Input.AddImage("test-files/redacted-employmentapp.png");
    // You can add any number of images

    // Read the OCR text from the input
    var Result = Ocr.Read(Input);

    // Output the extracted text to the console
    Console.WriteLine(Result.Text);
}
' Import the IronOCR library
Imports IronOcr

' Create an instance of IronTesseract
Private Ocr = New IronTesseract()

' Using the OcrInput class to handle multiple images
Using Input = New OcrInput()
	' Add an image to the input collection
	Input.AddImage("test-files/redacted-employmentapp.png")
	' You can add any number of images

	' Read the OCR text from the input
	Dim Result = Ocr.Read(Input)

	' Output the extracted text to the console
	Console.WriteLine(Result.Text)
End Using
$vbLabelText   $csharpLabel

Input Image

Install Tesseract, Figure 14: Input Image

Sample input image for IronOCR processing

Output Image

The output is printed on the Console as:

Install Tesseract, Figure 15: Output Image

The console returned from the execution of IronOCR on the sample image.

Why Choose IronOCR?

IronOCR is very easy to install. It provides a complete and well-documented .NET software library.

IronOCR achieves a 99.8% text-detection accuracy rate without the need for other third-party libraries or web services.

It also provides multithreading support. Most importantly, IronOCR can work with well over 125 international languages.

Conclusion

In this tutorial, we learned how to download and install Tesseract OCR for a Windows machine. Tesseract OCR is excellent software for C++ developers, but it does have some limits. It is not fully developed for .NET. Scanned image files or photographed images need to be processed and standardized to high-resolution, keeping them free from digital noise. Only then can Tesseract accurately work on them.

In contrast, IronOCR can work with any image provided, whether scanned or photographed, with just a single line of code. IronOCR also uses Tesseract as its internal OCR engine, but it is finely tuned to get the best out of Tesseract, especially built for C#, with high performance and improved features.

You can download the IronOCR software product from this link.

Kannaopat Udonpant
Software Engineer
Before becoming a Software Engineer, Kannapat completed a Environmental Resources PhD from Hokkaido University in Japan. While pursuing his degree, Kannapat also became a member of the Vehicle Robotics Laboratory, which is part of the Department of Bioproduction Engineering. In 2022, he leveraged his C# skills to join Iron Software's engineering team, where he focuses on IronPDF. Kannapat values his job because he learns directly from the developer who writes most of the code used in IronPDF. In addition to peer learning, Kannapat enjoys the social aspect of working at Iron Software. When he's not writing code or documentation, Kannapat can usually be found gaming on his PS5 or rewatching The Last of Us.
< PREVIOUS
Microsoft OCR Tools (Alternatives in C#)
NEXT >
OCR From PDF (Free Online Tools)