Test in a live environment
Test in production without watermarks.
Works wherever you need it to.
If you are looking for information about OCR, you are in the right place. This blog will discuss OCR and related software, and examine how they perform when applied to OCR-related tasks. Let's begin with the question: what is OCR?
Optical character recognition (OCR) uses an optical scanner and specialized software to identify and digitally encode written or printed text. A computer can read static photographs of text and turn them into editable, searchable data using OCR software.
OCR usually consists of three steps: opening and scanning a document in OCR software, recognizing the document in OCR software, and storing the OCR-produced document in the format of your choice.
Today, we discuss two OCR software packages and compare their pros and cons, as well as how to integrate and export their SDK in C#. The software packages under discussion are:
LEADTOOLS OCR comes from the award-winning line of development toolkits developed and published by LEAD Technologies Inc. LEADTOOLS is a collection of comprehensive toolkits for integrating raster, document, medical, multimedia, and vector imagery into desktop, server, tablet, and mobile applications. File formats (150+), image compression, image processing, color conversion, color processing, image display, special effects, scanning/capture, common dialogs, printing, DICOM, PACS, OCR, barcodes, forms recognition, PDF, document clean-up, annotations and more are all supported by LEADTOOLS. Millions of lines of code are practically at the fingertips of application developers using a LEADTOOLS toolkit. LEADTOOLS is a toolset built to provide you with the most potent image technology available, no matter your programming needs.
LEADTOOLS is a comprehensive toolkit to integrate recognition, documentation, medical, imaging, and multimedia technology into desktop, server, tablet, and mobile systems, powered by unique artificial intelligence and machine-learning algorithms. In order to improve your apps, why not make good use of more than 30 years of image development knowledge and support for 150+ file types.
The LEADTOOLS OCR class library provides programming software for the quick and efficient incorporation of document optical character recognition (OCR) technology into software programs. Programmers can conduct character recognition on document pictures, and output recognized text to over 20 file formats using the LEADTOOLS OCR Class Library.
The Lead toolkit provides an award-winning line of multimedia technologies for end-users and developers, and is able to perform all types of OCR functions to satisfy its broad range of clients.
The Lead technology engine provides the end-user with support for multi-thread and server-based OCR operations.
The LeadTools Document SDK allows users to create multiple OCR documents in their application. Each document contains its own list of pages.
The award-winning LeadTools line of technologies provides services in more then 40 different languages, and allows you to choose which language you wish to employ when recognizing OCR pages.
LeadTools OCR is an awesome doc scanner app that allow its end-user to access the dictionaries for all supported languages. Moreover, users can access more than one dictionary in one document.
Recognize a variety of documents, including facsimiles, photocopies, and documents with complex layouts.
With improved accuracy and speed, the LEADTOOLS OCR Application can conduct Optical Character Recognition (OCR) on pictures, extract text from photos, and convert images to multiple document formats. To modify and share text from a picture, use OCR to extract it and then copy it.
Lead Technologies uses AI to improve recognition on documents of the same type — wonderful news for the end-user.
The Leadtools Document SDK, from the award-winning line of OCR toolkits, uses powerful zone-recognition software that takes it to a whole new level of zone recognition.
This is a high-capacity, scalable Web API. Its user-friendly interface allows you to easily incorporate powerful OCR, barcodes, MICR, and document conversion into any program.
Note:
NuGet's official site shows how much .NET developers prefer IronOCR over LeadTools. LeadTools has 77.8 K downloads, but on the other hand, IronOCR has more than 320 K downloads.
IronOCR is a C# software library that enables .NET platform programmers to detect and read text from images and PDF documents. It is a pure .NET OCR package that uses the world's most potent Tesseract engine. IronOCR thrives when working with real-world graphics and flawed documents such as photos or low-resolution processing with digital noise or defects. With little or no setup, Tesseract 5 (as well as 4 and 3) runs out of the box on Windows, macOS, Linux, Azure, AWS, Lambda, Mono, and Xamarin Mac. There are no native binaries to deal with. Framework and Core are compatible.
IronOCR supports more languages than any other OCR engine anywhere, helping programmers to make meaningful image creations, and enabling the extraction of multimedia data from it. IronOCR supports 125 international languages, but only English is installed as standard in IronOCR .
The service provided by the IronOCR toolkit is easy to integrate, easy to process, and more interactive than any other OCR engine. It offers solutions to .NET developers and allows them to control and connect with their documents digitally, as well as manipulate them however they see fit.
IronOCR provides a unique set of features and functions to integrate, sign, export, read graphics and extract details from images, regardless of the technical background of users or the level of sophistication of the hardware being used.
The IronOCR SDK takes work accuracy to a whole new level for OCR libraries, thanks to its accuracy rate of 99.8% that significantly outperforms other OCR libraries.
The IronOCR class gives C# developers granular control. They provide OCR (images and PDF to text) capability to their developers and finely-tuned performance in each unique case.
Working with real-world instances, a perfect balance between speed and accuracy may be reached by establishing variables. Clean Background Noise, Enhance Contrast, Enhance Resolution, Language, Strategy, Rotate And Straighten, Color Space, Detect White Text On Dark Backgrounds and Input Image Type are just some of the options available.
Below are the examples of before-and-after images of low-quality scans being fixed:
Before
After
IronOCR provides solutions in 125+ international languages to help developers all over the world.
Iron Tesseract can read a variety of picture types and PDF files. With the traditional free Tesseract engines, this feature is not possible. If scans are of poor quality, OCR input allows you to get the relevant properties automatically repaired.
The OCRInput class gives C# programmers granular control over input. Developers then preprocess image input for speed and accuracy. This eliminates the standard method of preparing photos for OCR using Photoshop Batch Scripts or ImageMagick.
In terms of performing OCR on an image with speed and accuracy, IronOCR takes it to a whole new level. IronOCR allows its end-users to select a specific area or region in the image and perform OCR on that region. The region is known as ContentAreas or CropAreas.
IronOCR returns an advanced result object for each page it scans using Tesseract 3,4 or 5. This contains location data, images, text, statistical confidence, alternative symbol choices, font-names, font-sizes decoration, font weights, and a position for each of the following:
IronOCR enables developers to use more then one language for a single document. This feature is of great help for .NET service providers.
Note:
IronOCR is part of an award-winning product line. By winning this award, Iron Software demonstrates that it does indeed provide an awesome doc scanner app that provides superb recognition, as well as excellent document-related conversion and manipulation.
Open the Visual Studio software, go to file menu and select new project. Then, select console application.
Enter the project name and select the path in the appropriate text box. Next, click the create button, and then select the required .NET framework, as in the screenshot below:
The Visual Studio project will now generate the structure for the selected application, and, if you have selected the console, Windows, and web application, it will now open the program.cs file where you can enter the code and build/run the application.
Next, we can add the library to test the program.
The IronOCR library can be downloaded and installed in four ways. These are:
You can integrate IronOCR in a C# project using the Visual Studio NuGet Package Manager.
After this, a new window will appear in the search bar: type IronOCR.
By using this method, developers can install the IronOCR library and any language pack of the developer's choice.
IronOCR can be directly downloaded from the NuGet website by following these instructions:
Developers can download the IronOCR library directly from IronOCR website by using this Link.
The package will now download/install in the current project and be ready to use.
Developers can download the LeadTools OCR SDK in three different ways as shown below. We will discuss them all.
You can install LeadTools OCR in a C# project using the Visual Studio NuGet Package Manager:
After this, a new window will appear; in the search bar type LeadTools OCR.
By following these steps, developers can install the LeadTools OCR library and any language pack of the developer's choice.
LeadTools OCR can be downloaded directly from the NuGet website by following these instructions:
Developers can directly download the Leadtools Document SDK from their website without any hassle. Simply go to their website and download the one of the packs containing the OCR library.
Both the sets of software under discussion provide services for multi-thread OCR engines. Under this heading we will look at their performance and speed.
LeadTools supports running more than one instance of OCR at a time, depending upon eacg system's physical cores. This feature of Lead Technologies saves a lot of time for .NET developers.
// Create an instance of an OCR document from the engine
IOcrDocument ocrDocument= ocrEngineInstance.DocumentManager.CreateDocument();
// Add page, zone them, recognize them and save them
// to the final document:
ocrDocument.Pages.AddPages(imageFileName, null);
ocrDocument.Recognize(null);
ocrDocument.Save(documentFileName, DocumentFormat.Pdf, null);
// Create an instance of an OCR document from the engine
IOcrDocument ocrDocument= ocrEngineInstance.DocumentManager.CreateDocument();
// Add page, zone them, recognize them and save them
// to the final document:
ocrDocument.Pages.AddPages(imageFileName, null);
ocrDocument.Recognize(null);
ocrDocument.Save(documentFileName, DocumentFormat.Pdf, null);
' Create an instance of an OCR document from the engine
Dim ocrDocument As IOcrDocument= ocrEngineInstance.DocumentManager.CreateDocument()
' Add page, zone them, recognize them and save them
' to the final document:
ocrDocument.Pages.AddPages(imageFileName, Nothing)
ocrDocument.Recognize(Nothing)
ocrDocument.Save(documentFileName, DocumentFormat.Pdf, Nothing)
Using the multi-thread feature by IronOCR is quite easy and time-saving for developers. Iron Tesseract will automatically attempt to use all threads available on all cores, and will tactfully consider responsiveness on the main/GUI thread.
using IronOcr;
var Ocr = new IronTesseract();
using (var Input = new OcrInput())
{
Input.AddPdf("scan.pdf")
// Image processing is automatically multithreaded
Input.Deskew();
// OCR reading is automatically multi threaded too
var Result = Ocr.Read(Input);
}
using IronOcr;
var Ocr = new IronTesseract();
using (var Input = new OcrInput())
{
Input.AddPdf("scan.pdf")
// Image processing is automatically multithreaded
Input.Deskew();
// OCR reading is automatically multi threaded too
var Result = Ocr.Read(Input);
}
Imports IronOcr
Private Ocr = New IronTesseract()
Using Input = New OcrInput()
Input.AddPdf("scan.pdf") Input.Deskew()
' OCR reading is automatically multi threaded too
Dim Result = Ocr.Read(Input)
End Using
Creating searchable PDFs with ease is every C# developer's dream. In this section we will discuss this process using both the IronOCR SDK and the Lead technologies OCR SDK.
IronOCR's awesome doc scanner app allows developers to take the creation of searchable PDFs to a whole new level by offering support in detecting text characters in images and turning them into meaningful PDF text. The code example for users is below:
using IronOcr;
var Ocr = new IronTesseract();
using (var Input = new OcrInput())
{
Input.Add(@"images\page1.png")
Input.Add(@"images\page2.bmp")
Input.Add(@"images\page3.tiff")
Input.Deskew();
var Result = Ocr.Read(Input);
Result.SaveAsSearchablePdf("searchable.pdf");
}
using IronOcr;
var Ocr = new IronTesseract();
using (var Input = new OcrInput())
{
Input.Add(@"images\page1.png")
Input.Add(@"images\page2.bmp")
Input.Add(@"images\page3.tiff")
Input.Deskew();
var Result = Ocr.Read(Input);
Result.SaveAsSearchablePdf("searchable.pdf");
}
Imports IronOcr
Private Ocr = New IronTesseract()
Using Input = New OcrInput()
Input.Add("images\page1.png") Input.Add("images\page2.bmp") Input.Add("images\page3.tiff") Input.Deskew()
Dim Result = Ocr.Read(Input)
Result.SaveAsSearchablePdf("searchable.pdf")
End Using
Lead Technologies offers an awesome doc scanner app from their award-winning line of software. However, for the end-user, the code is a little more complicated than that used for IronOCR.
private void saveAsSearchablePDFToolStripMenuItem_Click(object sender, EventArgs e)
{
try
{
// Create a document
using (IOcrDocument ocrDocument = _ocrEngine.DocumentManager.CreateDocument(null, OcrCreateDocumentOptions.AutoDeleteFile))
{
// Create IOcrPage from loaded image
_ocrPage = _ocrEngine.CreatePage(_viewer.Image, OcrImageSharingMode.AutoDispose);
// Recognize Text
_ocrPage.Recognize(null);
// Add the page
ocrDocument.Pages.Add(_ocrPage);
// Save page as documentation
SaveFileDialog saveDlg = new SaveFileDialog();
saveDlg.InitialDirectory = @"C:\LEADTOOLS22\Resources\Images";
saveDlg.Filter = "Adobe Portable Document Format|*.pdf";
if (saveDlg.ShowDialog(this) != DialogResult.OK)
return;
ocrDocument.Save(saveDlg.FileName, DocumentFormat.Pdf, null);
MessageBox.Show($"OCR output saved to {saveDlg.FileName}");
}
}
catch (Exception ex)
{
MessageBox.Show(ex.ToString());
}
}
private void saveAsSearchablePDFToolStripMenuItem_Click(object sender, EventArgs e)
{
try
{
// Create a document
using (IOcrDocument ocrDocument = _ocrEngine.DocumentManager.CreateDocument(null, OcrCreateDocumentOptions.AutoDeleteFile))
{
// Create IOcrPage from loaded image
_ocrPage = _ocrEngine.CreatePage(_viewer.Image, OcrImageSharingMode.AutoDispose);
// Recognize Text
_ocrPage.Recognize(null);
// Add the page
ocrDocument.Pages.Add(_ocrPage);
// Save page as documentation
SaveFileDialog saveDlg = new SaveFileDialog();
saveDlg.InitialDirectory = @"C:\LEADTOOLS22\Resources\Images";
saveDlg.Filter = "Adobe Portable Document Format|*.pdf";
if (saveDlg.ShowDialog(this) != DialogResult.OK)
return;
ocrDocument.Save(saveDlg.FileName, DocumentFormat.Pdf, null);
MessageBox.Show($"OCR output saved to {saveDlg.FileName}");
}
}
catch (Exception ex)
{
MessageBox.Show(ex.ToString());
}
}
Private Sub saveAsSearchablePDFToolStripMenuItem_Click(ByVal sender As Object, ByVal e As EventArgs)
Try
' Create a document
Using ocrDocument As IOcrDocument = _ocrEngine.DocumentManager.CreateDocument(Nothing, OcrCreateDocumentOptions.AutoDeleteFile)
' Create IOcrPage from loaded image
_ocrPage = _ocrEngine.CreatePage(_viewer.Image, OcrImageSharingMode.AutoDispose)
' Recognize Text
_ocrPage.Recognize(Nothing)
' Add the page
ocrDocument.Pages.Add(_ocrPage)
' Save page as documentation
Dim saveDlg As New SaveFileDialog()
saveDlg.InitialDirectory = "C:\LEADTOOLS22\Resources\Images"
saveDlg.Filter = "Adobe Portable Document Format|*.pdf"
If saveDlg.ShowDialog(Me) <> DialogResult.OK Then
Return
End If
ocrDocument.Save(saveDlg.FileName, DocumentFormat.Pdf, Nothing)
MessageBox.Show($"OCR output saved to {saveDlg.FileName}")
End Using
Catch ex As Exception
MessageBox.Show(ex.ToString())
End Try
End Sub
This section is about the services these software packages provide to support different platforms. Both of these software packages provide support for many platforms and operating systems.
The IronOCR .NET SDK is the best OCR SDK that is compatible with almost all the .NET platforms and operating systems that support the C# programming language. IronOCR also provides support for different image formats such as JPEG, JPG, tiff and many more.
Lead Technologies also provides support for the integration of its various products and apps across different platforms. Lead Technologies also provides excellent SDK support for its users and developers.
Licenses are required for the use of any of the software discussed above. Both sets of software require the holding of licenses before logging in to the environment. Once you are logged in, only then can you begin to access their whole new level of software technologies.
LeadTools provide two (2) key licensing components in the SDK license:
To develop with LEADTOOLS, you'll need a Development License. The Development License can be purchased directly from LEAD or through a LEAD-authorized reseller or distributor.
The Development License enables a customer to install the SDK on a development machine, and use it to create an end-user application by including specific redistributable libraries and files into the application using the SDK sample code and documentation.
The customer's use of the SDK-developed end-user application ("End User Software") is governed by the Deployment License.
Unlike a standard end-user application license agreement, which prohibits any copying of the application, an SDK license allows the user to copy and redistribute a portion of the SDK. In order to reproduce LEAD's intellectual property and deploy it with end-user software produced using the LEAD SDK, LEAD's clients must obtain the necessary deployment license.
LeadTools does not provide free licenses for its developers. Instead, it provides comprehensive developer-based licensing. To see the Lead Technologies OCR SDK price structure, visit here.
IronOCR is a library that provides a developer's license for free. IronPDF also has a distinct pricing structure; the Lite bundle starts at $749 with no hidden fees. The redistribution of SaaS and OEM products is also possible. All licenses come with a 30-day money-back guarantee, a year of software support and upgrades, dev/staging/production validity, and a perpetual license (one-time purchase). To see IronOCR's entire price structure and licensing details, go here.
You can get the redistribution of SaaS and OEM products royalty-free service for just a $1599 single-time purchase.
IronOCR is a .NET SDK library that uses the world's most powerful Tesseract engine called Iron Tesseract. It support a total of 125+ international languages. IronOCR is an awesome doc scanner app with a lot of imaging features such as OCR region of an image, text extraction from images, fixing a low resolution image and performing OCR on a specific region of an image, and many other related features. IronOCR focuses on providing speed with accuracy, and its accuracy rate of 99.8% is higher any other OCR Tesseract out there. IronOCR works out of the box with no need to tune performance or heavily modify input images. On top of all of that, you can always get all five of the Iron Software products for the price of just two. Click here to see the webpage.
The LeadTools OCR is a toolkit from LeadTools that provides most recognition features quickly and efficiently. Programmers can conduct character recognition on document pictures, and output recognized text to over 20 file formats using the LEADTOOLS OCR class library. Its library can be integrated with most of the programing languages and nearly all of the platforms available out there. Its features include:
IronOCR and LeadTools OCR are both top-of-the-line tools and provide all the features that a C# or .NET developer could wish for. IronOCR is easier to use and code than its competitor. Both sets of software do not incur ongoing costs, but IronOCR is a lot more price-efficient than the LeadTools OCR Library. IronOCR provides more accuracy then any of its competitors out there. IronOCR provides international language support for 125+ languages. On the other hand, LeadTools only provides support for 40+ languages. Taking into account all the various aspects of performance, the only conclusion we can draw is that IronOCR holds significant advantages over LeadTools OCR.
9 .NET API products for your office documents