Test in a live environment
Test in production without watermarks.
Works wherever you need it to.
Tesseract is an open-source software library, released under Apache license agreement. It was originally developed by Hewlett Packard in 1980s. It is a text recognition tool primarily used for identifying and extracting texts from images. Tesseract OCR provides a command prompt interface for performing this functionality.
To use Tesseract command on Windows, we first need to download Tesseract OCR binaries .exe Windows Installer.
There are many places where people can download the latest version of Tesseract OCR. Once such place is from UB Mannheim, which is forked from tesseract-ocr/tesseract (Main Repository).
Download the tesseract-ocr-w64-setup-5.3.0.20221222.exe (64 bit) Windows Installer.
Tesseract can be installed in Python prompt on macOS using either of the commands below:
brew install tesseract
sudo port install tesseract
Next, we'll install Tesseract using the .exe file that we downloaded in the previous step. Launch the .exe installer to start Tesseract installation.
Once the unpacking of the setup is completed, the installer's language data dialog will appear. You can install Tesseract to use multiple languages by selecting additional language packs, but here we'll just install the language data for the English language.
Click OK and the Installer language for Tesseract OCR is set.
Next, the setup wizard will appear. This Setup Wizard will guide the Tesseract installation for Windows.
Click Next to continue the installation.
Tesseract OCR is licensed under Apache License Version 2.0. As it is open source and free to use, you can redistribute and modify versions of Tesseract without any loyalty concerns.
Click I Agree to proceed to installation.
You can choose to install Tesseract for multiple users or for a single user.
Click Next to choose components to install with Tesseract.
From the components list to install, ScrollView, Training Tools, Shortcuts creation, and Language data are all selected by default. We will keep all of the default selected options. You can choose any or skip any component based on the needs. Usually all are necessary to install.
Click Next to choose installation location.
Next, we'll choose the location to install Tesseract. Make sure you copy the destination folder path. We will need this later to add the installation location to the machine's path Environment Variable.
Click Next to further setup the installation of Tesseract.
This is the last step in which we will create shortcuts in Start menu. You can name the folder anything but I've kept it the same as default.
Now, click Install and wait for the installation to complete. Once the installation is done, following screen will appear. Click Finish and we are done with installing Tesseract OCR in Windows successfully.
Now, we will add the Tesseract installation path to Windows' Environment Variables.
In the Start menu, type "environment variables" or "advanced system settings"
Once the System Properties dialog box opens, click on the Advanced, and then click the Environment Variables button, located towards the bottom right of the screen.
The Environment Variables dialog box will be presented to you.
Under System variables, click on the Path variable.
Now, click Edit.
From the Edit environment variable dialog box, click New. Paste the installation location path which was copied during the second step, and click OK.
That's it! We have successfully downloaded, installed, and set the environment variable for Tesseract OCR in Windows machine.
To check that Tesseract OCR for Windows was successfully installed and added to Environment Variables, open Command prompt (cmd) on your Windows machine, then run the "tesseract" command. If everything worked fine, then a quick explanation usage guide must be displayed with OCR and single options such as Tesseract version.
Congratulations! We have successfully installed Tesseract OCR for Windows.
IronOCR is a Tesseract-based C# library that allows .NET software developers to identify and extract text from images and PDF documents. It is purely built in .NET, using the most advanced Tesseract engine known anywhere.
Installing IronOCR in Visual Studio or using Command line with the NuGet Package Manager is very easy. In Visual Studio, navigate to the Menu options with:
Tools > NuGet Package Manager > Package Manager Console
Then in Command line, type the following command:
Install-Package IronOcr
This will install IronOCR with ease and now you can use it to extract its full potential.
You can also download other IronOCR NuGet Packages for different platforms:
The below sample code shows how easy it is to use IronOCR Tesseract to read text from an image and perform OCR using C#.
string Text = new IronTesseract().Read(@"test-files/redacted-employmentapp.png").Text;
Console.WriteLine(Text); // Printed text
string Text = new IronTesseract().Read(@"test-files/redacted-employmentapp.png").Text;
Console.WriteLine(Text); // Printed text
Dim Text As String = (New IronTesseract()).Read("test-files/redacted-employmentapp.png").Text
Console.WriteLine(Text) ' Printed text
If you want more robust code, then the following should help you in achieving the same task:
using IronOcr;
var Ocr = new IronTesseract();
using (var Input = new OcrInput()){
Input.AddImage("test-files/redacted-employmentapp.png");
// you can add any number of images
var Result = Ocr.Read(Input);
Console.WriteLine(Result.Text);
}
using IronOcr;
var Ocr = new IronTesseract();
using (var Input = new OcrInput()){
Input.AddImage("test-files/redacted-employmentapp.png");
// you can add any number of images
var Result = Ocr.Read(Input);
Console.WriteLine(Result.Text);
}
Imports IronOcr
Private Ocr = New IronTesseract()
Using Input = New OcrInput()
Input.AddImage("test-files/redacted-employmentapp.png")
' you can add any number of images
Dim Result = Ocr.Read(Input)
Console.WriteLine(Result.Text)
End Using
The output is printed on the Console as:
IronOCR is very easy to install. It provides a complete and well-documented .NET software library.
IronOCR achieves a 99.8% text-detection accuracy rate without the need for other third-party libraries or webservices.
It also provides multithreading support. Most importantly, IronOCR can work with well over 125 international languages.
In this tutorial, we learned how to download and install Tesseract OCR for Windows machine. Tesseract OCR is an excellent software for C++ developers but however it has some limits. It is not fully developed for .NET. Scanned image files or photographed images need to be processed and standardized to high-resolution, keeping it free from digital noise. Only then, Tesseract can accurately work on them.
In contrast, IronOCR can work with any image provided whether scanned or photographed, with just a single line of code. IronOCR also uses Tesseract as its internal OCR engine, but it is very finely tuned to get the best out of Tesseract especially built for C#, with a high performance and improved features.
You can download the IronOCR software product from this link.
9 .NET API products for your office documents