A Comparison between IronOCR and Tesseract.Net

Optical character recognition identifies the text that can be read in an image. There are many ways in which optical character recognition is used. For example, it can scan and digitize old paper documents to be converted into searchable electronic documents. It is also useful for law enforcement to identify criminal evidence by examining photos and videos. For a machine or a computer to be able to determine the characters on any document, it must understand the font used and the writing system in which those characters were written. Often, this understanding comes from artificial intelligence software called image-recognition algorithms, which are trained and tuned on large data sets of text images.

OCR is an important technology that has a variety of uses. It is often employed to read scanned paper documents, converting them into digital files that can be edited and searched via computers. However, OCR can also be applied to various other types of information, including printed text on signs or labels, handwritten or typed text from checks, forms, and other business records, and even transcribed medical records from audio recordings.

In this article, we will compare two .NET OCR libraries.

  • IronOCR
  • The Tesseract.NET SDK

Introduction

IronOCR Features

IronOCR is the latest and most advanced OCR (Optical Character Recognition) library for .NET C# and VB. IronOCR can scan barcodes and QR codes from all image formats, and it reads text and performs PDF scanning using the latest Tesseract 5 engine. IronOCR can add OCR functionality in all .NET project templates such as desktop applications, console, and web applications with just a few lines and without adding a library. IronOCR is one of the most accurate OCR engines for .NET projects.

Let's discuss some of IronOCR's unique features:

  • IronOCR is made purely for .NET applications.
  • IronOCR supports up to 127 languages.
  • IronOCR can correct a tilted image's position and remove noise from an image for precise output.
  • IronOCR performs exceptionally well in low-resolution images with low DPI.
  • IronOCR can read multiple types of QR codes and barcodes.
  • IronOCR also supports the Gif and Tiff format.
  • IronOCR supports multi-threading. It is a fantastic feature that is not present in other OCR libraries. It makes the processes smoother.
  • IronOCR can easily perform OCR on PDF files and export searchable PDF documents using OCR.

All major languages are supported by IronOCR, including Arabic, Chinese, English, Finnish, French, German, Japanese, and many more. IronOCR provides the functionality to show output in different formats such as Barcode Data, Plain Text, or the OCR result class that contains lines, words, paragraphs, and characters. IronOCR uses Tesseract library technology.

IronOCR is compatible with Mac, Windows, and Linux machines. It also supports Azure and Docker for Cloud Solutions. The latest update of IronOCR includes .NET core 3.1 and .NET 6 in the support list, It provides support for Xamarin for MacOS too.

You can download the software product from this link.

Tesseract OCR Library Features

The Tesseract.net SDK is a product of Patagames, an optical character recognition (OCR) library for .NET projects, and provides a method to add OCR capabilities such as text recognition in .NET applications. The Tesseract.NET SDK is an OCR engine that can read various image formats and convert images to text. It supports up to 60 languages. It also supports the reading and scanning of PDF docs and converting them into searchable PDF files. Basically, the Tesseract.NET SDK is a class library based on the Tesseract OCR project. It has a Tesseract engine for performing OCR. The Patagames.Ocr.xml contains the XML documentation of the API.

The Tesseract.Net SDK supports .Net Framework 2.0 to 4.5 on 32-bit and 64-bit operating systems. This SDK can be used with Windows XP and other Windows versions such as Windows 7, Vista, 8, 10, and 11. It is compatible with 32-bit and 64-bit OS, making it easy to use on any CPU.

Unfortunately, the .NET SDK is not available for macOS or Linux.

Using IronOCR and the Tesseract.NET SDK

Let's take a look at how we can use IronOCR and the Tesseract.NET SDK in our project.

Creating a C# Project in Visual Studio

We are using the Visual Studio 2022 version to create this project. The latest version of Visual Studio is recommended for smooth progress. Next, open Visual Studio and click on "Create New Project". After that, click on the "Console Application" from the templates and configure your project.

Now enter the name of the project. I will assign the name "IronOCR vs Tesseract.NET SDK". After that, select the path where you want to create a project and hit Enter.

After that, select the .NET version. We use the latest version of .NET, which is .NET 6, and IronOCR supports it. You can use that which best suits your requirements for the project.

After clicking on the Create button, the project Template will Create the project and is now ready to install libraries. Let's install the libraries directly.

Install IronOCR and the Tesseract.NET SDK

It's now time to install the libraries and check the functionalities. First, we will install the IronOCR library.

Install IronOCR

IronOCR supports installation using various methods. We can choose any approach. All methods are safe.

  • Using the Visual Studio NuGet Package Manager
  • Using the NuGet Package manager Command-Line.
  • Direct download from the NuGet website.
  • Direct download from the IronOcr website.

Using Visual Studio NuGet Package Manager

We can install the IronOCR library using the NuGet Package manager GUI in Visual Studio. We can access it by clicking on Tools > NuGet Package manager > Manage NuGet Packages for solution.

Go to the Browse tab and search for IronOCR. Select IronOCR from the search results and install it in our project.

Now, we have installed the IronOCR library in our project. It is ready for use in our .NET project.

Using the NuGet Package manager Command-Line

We can use the NuGet Package Manager Console to install the IronOCR library. Go to the Command line, which is usually located below the code file, and then write the following line in the command line and hit Enter.

Install-Package IronOcr

It will begin installing the IronOCR library. After installation, it will be ready to use in our project.

Install the Tesseract.NET SDK

We can install the Tesseract.NET SDK using the NuGet Package Manager. To install the Tesseract.NET SDK, go to Tools > NuGet Package Manager > Manage NuGet Packages for Solution. Go to the Browse tab and search for the Tesseract.NET SDK. Select the Tesseract.NET SDK from the search results and install it. After installation, we can use the Tesseract.NET SDK in our program.

After installation, you can see these three folders in the solution explorer.

These folders contain essential data required by Tesseract to perform OCR. Now we are ready to embed OCR capability in our project.

OCR Image

It is now time to test the capabilities of IronOCR and the Tesseract.NET SDK. Both libraries can perform OCR on images. We will test them using a tilted and noisy image with text.

Test image

This is the image that we will use for testing.

Using the Tesseract.Net SDK

Firstly, we will look at the output generated by the Tesseract.Net SDK for the testing image. Let's take a look at the code:

using Patagames. Ocr;

using (var api = OcrApi.Create())
{
    api.Init(Patagames.Ocr.Enums.Languages.English);
    string plainText = api.GetTextFromImage(@"C:\Users\Administrator\Desktop\Input.jpg");
    Console.WriteLine(plainText);
}
using Patagames. Ocr;

using (var api = OcrApi.Create())
{
    api.Init(Patagames.Ocr.Enums.Languages.English);
    string plainText = api.GetTextFromImage(@"C:\Users\Administrator\Desktop\Input.jpg");
    Console.WriteLine(plainText);
}
Imports Patagames.Ocr

Using api = OcrApi.Create()
	api.Init(Patagames.Ocr.Enums.Languages.English)
	Dim plainText As String = api.GetTextFromImage("C:\Users\Administrator\Desktop\Input.jpg")
	Console.WriteLine(plainText)
End Using
VB   C#

First of all, we will import the Patagames. OCR library for using the Tesseract.NET SDK. After that, we will create an OcrApi by using the Create function. Then, we will set the default language to English using the Init function. Next, we extract plain text from the image using the GetTextFromImage method, and in the parameter, we will provide the path of the image files. Then, we write the extracted text to the console.

Next, take a look at the output generated by the Tesseract.NET SDK:

So, this is the output that we get from the Tesseract.NET SDK. At first, it gives errors based on resolution, showing that it works well only for high-resolution images. After the errors, we can see the text extracted from the image. If we compare this text with the image, we will see that it is entirely different. The extracted text has a lot of irrelevant text that makes no sense. Overall, the Tesseract.NET SDK fails this test.

Using IronOCR

Next, we will see the results from IronOCR. Before jumping to the results, we will first look at the code for IronOCR:

using IronOcr;
var Ocr = new IronTesseract(); // nothing to configure            
Ocr.Language = OcrLanguage.EnglishBest;
using (var Input = new OcrInput())
{
    Input.AddImage(@"C:\Users\Administrator\Desktop\Input.jpg");
    Input.Deskew();
    Input.DeNoise();
    var Result = Ocr.Read(Input);
    Console.WriteLine(Result.Text);
}
using IronOcr;
var Ocr = new IronTesseract(); // nothing to configure            
Ocr.Language = OcrLanguage.EnglishBest;
using (var Input = new OcrInput())
{
    Input.AddImage(@"C:\Users\Administrator\Desktop\Input.jpg");
    Input.Deskew();
    Input.DeNoise();
    var Result = Ocr.Read(Input);
    Console.WriteLine(Result.Text);
}
Imports IronOcr
Private Ocr = New IronTesseract() ' nothing to configure
Ocr.Language = OcrLanguage.EnglishBest
Using Input = New OcrInput()
	Input.AddImage("C:\Users\Administrator\Desktop\Input.jpg")
	Input.Deskew()
	Input.DeNoise()
	Dim Result = Ocr.Read(Input)
	Console.WriteLine(Result.Text)
End Using
VB   C#

In the code above, we import the IronOCR library into our program and then create an object of IronTesseract, which helps start the process. After that, we set the process language to English. Now the actual work can begin. We make the object of OcrInput. Assign the image path to the Input variable using the AddImage function. We use the Deskew function to rotate the image to its actual position, and then use the Denoise function to remove the noise from the image. This will provide a better outcome. After that, we use the Read function to recognize text and extract it from the testing image. Next, we show the outcome in the console. You can also save the output as a PDF file in the Project file.

Here is the output generated by IronOCR:

If we compare the output, it is the same text that is present on the image. IronOCR extracts text perfectly without any error. IronOCR can extract text from distorted and rotated images. It even works with low res0lution images.

IronOCR also supports adding multi-frame images. We can use the "AddMultiFrameTiff" method to do this operation. IronOCR reads every frame in the picture, and every frame is treated as a separate page. Only Tiff images are supported using this method.

using IronOcr;

var Ocr = new IronTesseract();

using (var Input = new OcrInput())
{
    Input.AddMultiFrameTiff("images/multiframe.tiff");

    var Result = Ocr.Read(Input);
    Console.WriteLine(Result.Text);
}
using IronOcr;

var Ocr = new IronTesseract();

using (var Input = new OcrInput())
{
    Input.AddMultiFrameTiff("images/multiframe.tiff");

    var Result = Ocr.Read(Input);
    Console.WriteLine(Result.Text);
}
Imports IronOcr

Private Ocr = New IronTesseract()

Using Input = New OcrInput()
	Input.AddMultiFrameTiff("images/multiframe.tiff")

	Dim Result = Ocr.Read(Input)
	Console.WriteLine(Result.Text)
End Using
VB   C#

Let's take a look at the code for making a searchable PDF:

using IronOcr;

var Ocr = new IronTesseract();
using (var Input = new OcrInput())
{
    Input.AddImage(@"images\page1.png")
    Input.AddImage(@"images\page2.bmp")
    Input.AddMultiFrameTiff(@"images\page3.tiff")

    Input.Deskew();

    var Result = Ocr.Read(Input);
    Result.SaveAsSearchablePdf("searchable.pdf");
}
using IronOcr;

var Ocr = new IronTesseract();
using (var Input = new OcrInput())
{
    Input.AddImage(@"images\page1.png")
    Input.AddImage(@"images\page2.bmp")
    Input.AddMultiFrameTiff(@"images\page3.tiff")

    Input.Deskew();

    var Result = Ocr.Read(Input);
    Result.SaveAsSearchablePdf("searchable.pdf");
}
Imports IronOcr

Private Ocr = New IronTesseract()
Using Input = New OcrInput()
	Input.AddImage("images\page1.png") Input.AddImage("images\page2.bmp") Input.AddMultiFrameTiff("images\page3.tiff") Input.Deskew()

	Dim Result = Ocr.Read(Input)
	Result.SaveAsSearchablePdf("searchable.pdf")
End Using
VB   C#

The SaveAsSearchablePdf function helps to save the files as searchable.

Other Features

  • Contrast: This image filter turns every pixel black or white with no middle ground.
  • DeepCleanBackgroundNoise: use this filter in case extreme document background noise is known.
  • Invert: Inverts every color. E.g. White becomes black: black becomes white.
  • ReplaceColor: Replace color with another color to reduce the noise
  • ToGrayScale: This image filter turns every pixel into a shade of grayscale.
  • And lot other functions and features.

IronOCR Features

IronOCR supports 127 languages. IronOCR also supports QR codes and barcodes of more than 20 types of reading. IronOCR can convert images to Gray Scale for a better outcome. IronOCR can enhance image resolution manually and automatically. It also supports auto-contrast functionality for the best results. IronOCR can export the document in multiple languages and formats such as Searchable PDF, HTML Export, and images of any page. IronOCR supports many input formats such as the following:

  • Images (JPG, PNG, GIF, Tiff, BMP)
  • Multipage Gif & Tiff
  • System.Drawing Objects
  • Streams
  • PDFs

Licensing

IronOCR

IronOCR is free for development. It also offers a 30-day free trial version for development. IronOCR has a variety of pricing plans for production. You can buy the plan that best matches your needs. There are individual, developer team, and organizational level pricing plans. Prices start from $499 for a Lite plan for one developer and one project. All plans are one-time payments. Users get free updates for one year. It also supports SaaS and OEM coverage. Professional plans are available at $999, while the unlimited plan is priced at $2999. The unlimited plan includes unlimited developers, projects, and locations.

You can learn more about the pricing plans by following this link. Also, Iron Software currently has a special offer where you can buy a suite of five software packages for the price of just two. These five software packages are all excellent: IronPDF, IronXL, IronOCR, IronBarcode, and IronWebscraper.

The Tesseract.NET SDK

The Tesseract.NET SDK also has a pricing plan. The Tesseract.NET SDK plan starts from $220 for one developer and one project. One important thing to know here is that the pricing plan includes a renewal plan. So, you have to pay either annually or monthly to ensure that the Tesseract.NET SDK is operating in your project. You can learn more about the pricing plan for the Tesseract.NET SDK at this link.

Conclusion

IronOCR is the perfect library for the tasks at hand. IronOCR also supports 127 languages, which means that it is globally accepted. It supports multiple image formats and PDFs for input processing. It also performs the pre-processing of images to ensure the best results. IronOCR is a compelling .NET library. It can recognize text from a specific area of an image. IronOCR focuses on accuracy, and the output results are indeed amazing in this respect. Developers don't need any additional files and libraries to perform OCR. Overall, it is the perfect library.

The Tesseract.NET SDK is also a sound library for .NET projects. It offers OCR services in 60 languages. It is based on the Tesseract OCR project. It can convert scanned images to searchable PDFs with its set of functions. The Tesseract.NET SDK accepts a wide range of image formats for input processing. It provides high-level services to support its OCR capabilities in .NET projects.

IronOCR and the Tesseract.NET SDK both have pricing plans. But, IronOCR has a little more variety in its pricing plan, and it is also cheaper than the Tesseract.NET SDK. This is because IronOCR only accepts one-time payments, whereas the Tesseract.NET SDK has monthly or annual renewals. So, in the long run, you must necessarily pay more for the Tesseract.NET SDK, even if its starting price is lower than IronOCR.

By analyzing the whole scenario and by testing the capabilities of both libraries, we can conclude that IronOCR is a better option than the Tesseract.NET SDK in terms of performance for blurry and rotated documents that are tilted and a little bit noisy. The OCR capability of both libraries is good, but IronOCR is an advanced library with better functions such as image pre-processing, denoising, and rotating pictures to their original position. The Tesseract.NET SDK supports up to 60 languages and IronOCR supports up to 127 languages. The Tesseract.NET SDK requires extra files for different languages, adding extra bulk to the program. Also, the Tesseract.NET SDK was last updated a long time ago.

IronOCR offers a 30-day free trial for production tests. It also currently provides an excellent special offer where you can buy the full suite of five Iron Software packages for the price of just two. You can get more information about the offer at this link.