Test in a live environment
Test in production without watermarks.
Works wherever you need it to.
This article will compare two software libraries that use optical character recognition (OCR) to automate the detection and extraction of printed or handwritten text from images and from scanned documents. First, we will discuss the features of both libraries. Next, we will examine and compare their text recognition and extraction capabilities using example source code produced using both libraries. Finally, we will compare the libraries' licensing and pricing.
The libraries that we will compare in this article are:
Syncfusion's Essential PDF library incorporates OCR functionality to enable image-text processing on scanned images within PDF documents.
Syncfusion's OCR processor can work with Tesseract versions 3 (3.02 and 3.05) and 4. The library can be included in .NET Core and ASP.NET applications.
Features of SyncFusion Essential PDF's OCR Functionality include:
OCRProcessor
class can be used to perform OCR on PDF files. It is based on the Tesseract data processor, which is known to be one of the best OCR processors in the world.IronOCR is a C# software library that allows .NET platform developers to recognize and read text from pictures and PDF documents. It is a .NET-only OCR library that uses the powerful Tesseract engine. Tesseract versions 3 - 5 works right out of the box on Windows, macOS, Linux, Azure, AWS, Lambda, Mono, and Xamarin Mac.
IronOCR covers more languages than every OCR engine available, supporting 125 languages, (only English is installed by default).
.NET developers have full control over their documents, being able to modify them as they see appropriate.
IronOCR offers a unique combination of capabilities and functions for integrating, signing, exporting, reading visuals, and extracting details from photos, independent of user technical background or hardware sophistication.
The IronOCR SDK beats other OCR libraries in terms of accuracy, with a rate of 99.8 percent.
The IronOCR class provides extensive control to C# developers. They give their developers OCR (images and PDF to text) functionality and fine-tuned performance in each specific instance.
IronOCR includes configuration options that enable the library to process images that are not of ideal quality. Some of these configurations that are available include: Clean Background Noise, Enhance Contrast, Enhance Resolution, Language, Strategy, Rotate And Straighten, Color Space, Detect White Text On Dark Backgrounds, and Input Image Type.
IronOCR provides support for 125+ international languages.
The Iron Tesseract can read several image formats as well as PDF files. This feature is not available with standard, free Tesseract engines. If your scans are of poor quality, OCR input allows you to automatically fix the required attributes.
The OCRInput class provides C# programmers with fine-grained control over input. The picture input is subsequently preprocessed by developers for speed and accuracy. This eliminates the need to use Photoshop Batch Scripts or ImageMagick to prepare photographs prior to OCR processing.
IronOCR allows its end-users to perform OCR on specific areas of an image.
IronOCR returns an advanced result object for each page it scans using Tesseract 3,4 or 5. This contains location data, images, text, statistical confidence, alternative symbol choices, font names, font-sizes decoration, font weights, and a position for each of the following:
IronOCR allows developers to use multiple languages in a single document. This capability is extremely beneficial to .NET service providers.
In this article, we will be using a new Visual Studio Console Application to demonstrate the OCR processing capabilities of both IronOCR and Syncfusion Essential PDF.
Open the Visual Studio software, go to the file menu and select New Project. Then, select Console Application.
Enter the project name and select the path in the appropriate text box. Next, click the create button, and then select the required .NET framework, as in the screenshot below:
The Visual Studio project will now generate the structure for the new console application. The program.cs file will be opened upon completion.
We will now add both libraries to the project.
The IronOCR library can be downloaded and installed in four ways. These are:
You can integrate IronOCR in a C# project using the Visual Studio NuGet Package Manager.
Access the NuGet Package Manager GUI by clicking on Tools > NuGet Package Manager > Manage NuGet Packages for Solutions...
After this, a new window will appear. Search for IronOCR and install the package in the project.
Additional language packs for IronOCR can also be installed using the same method described above.
IronOCR can be directly downloaded from the NuGet website by following these instructions:
Developers can download the library from the IronOCR web site and add it as a project reference.
Follow the instructions below to add the library as a reference in Visual Studio.
The package will now download/install in the current project and be ready to use.
Syncfsion Essential PDF can be installed in three different ways.
As with IronOCR, developers can also install SyncFusion's OCR Library using Visual Studio's NuGet Package Manager.
Access the Package Manager as before by clicking on Tools > NuGet Package Manager > Manage NuGet Packages for Solutions...
Search for SyncFusion OCR and install the appropriate package (should be Syncfusion.PDF.OCR.Net.Core
)
Additional language packs for SyncFusion Essential PDF OCR can be downloaded from GitHub.
Syncfusion Essential PDF OCR can be directly downloaded from the NuGet website by following these instructions:
Install-Package Syncfusion.PDF.OCR.Net.Core -Version 20.2.0.38
The package will now download/install in the current project and be ready to use.
Both IronOCR and Syncfusion OCR are capable of performing OCR on PDF documents. Here, we will discuss how both of them can be used in Visual studio.
With just a few lines of code, developers can perform OCR on an entire PDF or on specific pages/portions of a PDF. Consider the code snippet below.
using IronOcr;
var Ocr = new IronTesseract();
using (var Input = new OcrInput())
{
Input.AddPdf("example.pdf", "password");
var Result = Ocr.Read(Input);
Console.WriteLine(Result.Text);
}
using IronOcr;
var Ocr = new IronTesseract();
using (var Input = new OcrInput())
{
Input.AddPdf("example.pdf", "password");
var Result = Ocr.Read(Input);
Console.WriteLine(Result.Text);
}
Imports IronOcr
Private Ocr = New IronTesseract()
Using Input = New OcrInput()
Input.AddPdf("example.pdf", "password")
Dim Result = Ocr.Read(Input)
Console.WriteLine(Result.Text)
End Using
You can use the OCRProcessor Class to perform OCR on PDF documents as well as on regions of a document. Examine the code sample below for context.
using Syncfusion.OCRProcessor;
using Syncfusion.Pdf.Graphics;
using Syncfusion.Pdf.Parsing;
//initialize the ocr processor
using (OCRProcessor processor = new OCRProcessor(@"TesseractBinaries\"))
{
PdfLoadedDocument lDoc = new PdfLoadedDocument("Input.pdf");
processor.Settings.Language = Languages.English;
processor.PerformOCR(lDoc, @"TessData\");
lDoc.Save("Sample.pdf");
lDoc.Close(true);
}
using Syncfusion.OCRProcessor;
using Syncfusion.Pdf.Graphics;
using Syncfusion.Pdf.Parsing;
//initialize the ocr processor
using (OCRProcessor processor = new OCRProcessor(@"TesseractBinaries\"))
{
PdfLoadedDocument lDoc = new PdfLoadedDocument("Input.pdf");
processor.Settings.Language = Languages.English;
processor.PerformOCR(lDoc, @"TessData\");
lDoc.Save("Sample.pdf");
lDoc.Close(true);
}
Imports Syncfusion.OCRProcessor
Imports Syncfusion.Pdf.Graphics
Imports Syncfusion.Pdf.Parsing
'initialize the ocr processor
Using processor As New OCRProcessor("TesseractBinaries\")
Dim lDoc As New PdfLoadedDocument("Input.pdf")
processor.Settings.Language = Languages.English
processor.PerformOCR(lDoc, "TessData\")
lDoc.Save("Sample.pdf")
lDoc.Close(True)
End Using
Both libraries can perform OCR on images within a C#.NET and .NET Core application.
IronOCR is unique in its ability to automatically detect and read text from imperfectly scanned images with only two lines of code.
using IronOcr;
var Result = new IronTesseract().Read(@"images\11111.png").Text;
using IronOcr;
var Result = new IronTesseract().Read(@"images\11111.png").Text;
Imports IronOcr
Private Result = (New IronTesseract()).Read("images\11111.png").Text
OCR OUTPUT form IMAGE
OCR Output
Simple Data Outputs:
» NET Text Strings
» Barcode & QR Data & Images
Structured Data Outputs:
» Pages
» Blocks
» Paragraphs
» Lines
» words
» Characters
Export Documents:
» Searchable PDFs
» hOCR / HTML Export
» Images of any Page, Text
Element or Barcode
Syncfusion Essential PDF is capable of extracting text from images with great accuracy.
using (OCRProcessor processor = new OCRProcessor(@"TesseractBinaries\"))
using Syncfusion.OCRProcessor;
using Syncfusion.Pdf.Graphics;
using Syncfusion.Pdf.Parsing;
{
//loading the input image
Bitmap image = new Bitmap("11111.jpeg");
//Set OCR language to process
processor.Settings.Language = Languages.English;
//Process OCR by providing the bitmap image, data dictionary and language
string ocrText= processor.PerformOCR(image, @"TessData\");
}
using (OCRProcessor processor = new OCRProcessor(@"TesseractBinaries\"))
using Syncfusion.OCRProcessor;
using Syncfusion.Pdf.Graphics;
using Syncfusion.Pdf.Parsing;
{
//loading the input image
Bitmap image = new Bitmap("11111.jpeg");
//Set OCR language to process
processor.Settings.Language = Languages.English;
//Process OCR by providing the bitmap image, data dictionary and language
string ocrText= processor.PerformOCR(image, @"TessData\");
}
Using processor As New OCRProcessor("TesseractBinaries\")
Using Syncfusion.OCRProcessor
End Using
End Using
Using Syncfusion.Pdf.Graphics
Using Syncfusion.Pdf.Parsing
'loading the input image
Dim image As New Bitmap("11111.jpeg")
'Set OCR language to process
processor.Settings.Language = Languages.English
'Process OCR by providing the bitmap image, data dictionary and language
Dim ocrText As String= processor.PerformOCR(image, "TessData\")
End Using
End Using
OCR OUTPUT form IMAGE
OCR Outpu
Simple Data Output:
+ NET Text Strings
Dee eT Nd
tC eke ass
Biren)
Soy
Seg
ors
eae
eed
TLC
eres
Smt d
See amr'
etd ieot
Use of both IronOCR and Syncfusion Essential PDF require software licenses.
IronOCR has a free development license for personal, noncommercial projects.
IronOCR offers a distinct price structure for commercial licenses. The Lite package begins at $749 with no additional costs. All licenses include a 30-day money-back guarantee, a year of software support and upgrades, development, testing, staging, production validity, and a perpetual license (one-time purchase). Learn more about IronOCR's complete pricing structure and licensing information from this page.
For a one-time fee of $1599, you may obtain royalty-free redistribution of SaaS and OEM goods.
Syncfusion Essential PDF provides three types of developer licenses, but doesn't provide SaaS and OEM coverage.
View the entire licensing structure for Syncfusion Essential PDF (and for other Syncfusion components) on the product licensing page.
IronOCR supports about 125 worldwide languages in total. Its processing capabilities include: the ability to perform OCR on portions of a PDF document or image, the ability to extract text from PDFs and photos, and the ability to correct an image of poor quality, among many more. IronOCR prioritizes speed and accuracy. Its accuracy rate of 99.8 percent is higher than any other Tesseract-powered OCR library on the market. IronOCR works right out of the box, with no need for performance tuning or image preprocessing.
Syncfusion Essential PDF OCR also uses the Google open-source tesseract engine. It can perform OCR on entire documents or specific portions of documents. Syncfusion's OCR library supports more than 60 international languages.
IronOCR licenses have lifetime validity with unlimited support and SaaS and OEM coverage. On the other hand, Syncfusion Essential PDF OCR offers yearly-based licenses. IronOCR pricing starts from $749, and Syncfusion pricing starts from $995 per year.
Obtain IronOCR along with four other Iron Software products for a discounted price by purchasing the full IronSuite. Products bundled in the IronSuite include:
The Iron Software licensing page contains more detailed information about pricing and licensing for the above five products.
9 .NET API products for your office documents