Test in a live environment
Test in production without watermarks.
Works wherever you need it to.
Optical Character Recognition (OCR) provides the ability to convert an image file into machine-encoded text. This is incredibly useful given that scanned documents are saved as image files, and the data in these image files cannot be searched, edited, or saved in text format using a normal text editor or even a word processing application. OCR processing helps to convert these images into machine-readable text for further processing by users.
In this modern age, documents shared over the internet are usually in digital format, mostly in the form of PDFs or images. There are numerous online resources available that convert images to text. However, most businesses require this functionality in their software applications. Bearing this in mind, there are many libraries that provide OCR processing technology to be embedded in software applications.
In this article, we are going to discuss two of the most popular OCR libraries for C#. These are:
IronOCR for .NET is a C# library that enables users to scan, search, and read images and PDFs. It takes an image or PDF file as input and uses the latest Tesseract 5 custom-built .NET OCR engine to output text, structured data, or searchable PDF documents. Tesseract is available in 125+ languages, and IronOCR offers cross-platform support in .NET Core, Standard, from 2.0 up to 7.
IronOCR is a user-friendly API that allows C# developers to automatically convert images to text using the IronTesseract
class. The library prioritizes speed, accuracy, and ease of use.
Another powerful feature of IronOCR is its ability to scan barcodes and QR codes from all image files and read their text. Additional important features of IronOCR are listed below.
System.Drawing Objects
, streams, PDF documents, and more.Now, let's have a look at Nanonets OCR API.
Nanonets OCR API is a REST API that provides real-time data extraction tailored to your business needs for automated workflows. The OCR API is AI-powered and can securely capture, categorize, and extract data from unstructured documents within seconds. With Nanonets, you can automate manual data entry, reducing the manual effort required.
Nanonets understands documents using machine learning, even those that do not follow a standard template. You can upload any unstructured document and capture only the desired information based on different fields. Unlike traditional OCR, the Nanonets OCR model can be trained for better results. As your business grows, the Nanonets intelligent document processing OCR model also grows and learns with every new document, providing fast and accurate results.
Additionally, Nanonets provides a Python package that enables easy integration and data capture in Python applications without requiring API requests. Other features include:
GDPR compliance
The rest of the article goes as follows:
In this tutorial, we are going to use the latest version of Visual Studio 2022. If you do not already have it downloaded and installed, you can do so from the Visual Studio website.
Now, we need to create a console project to get started with both libraries. Follow the steps to create a project:
Click on Create a new Project.
Select C# Console Application from the given options.
Click Next.
Under additional information, select .NET 6.0 Framework, as it is the most stable version.
Next, we will install the libraries in our project for comparison.
There are multiple ways to install the IronOCR library. Let's have a look at them one by one.
NuGet is the package manager for downloading and installing dependencies in your project. Its packages contain compiled code and (DLLs) and the manifest file. Access it using the following method:
Click Manage NuGet Packages for Solutions.
Alternatively:
Click Manage NuGet Packages.
Now, the NuGet Package Manager window will open. Browse for IronOCR and click Install.
IronOCR can also be downloaded directly from the NuGet official website. Follow the following steps:
Simply visit the Iron Software website and navigate to the IronOCR for .NET webpage. Scroll to the bottom and click Download DLL or Download Windows installer.
A zip file will be downloaded. Extract the project file or run the Windows installer. Follow the below steps to add it to your project.
Install-Package IronOcr
This will automatically download and install IronOCR in your project.
Now, we are ready to use IronOCR in our project.
There is only one Namespace required, which needs to be added on top of the source code file where we need to access IronOCR's functions.
using IronOcr;
using IronOcr;
Imports IronOcr
Now, let's install Nanonets OCR API.
Here are the corrected paragraphs:
Nanonets can be used in multiple ways to capture data. It provides an online OCR facility that can be used for instant data extraction, reducing turnaround times. As a REST API, it can be integrated into multiple programming languages. Here, we will demonstrate how to integrate it into a C# programming language.
To automate data capture using the Nanonets OCR API in C#, you will need the following:
RestSharp is a simple Rest and HTTP client library for .NET. It is used to send and receive API requests and handle responses. This library is needed to execute Nanonets API code since it is also a REST API.
To install RestSharp, open the NuGet Package Manager for your solution, browse for RestSharp, and install it. Alternatively, you can open the Package Manager Console and type the following command:
PM> Install-Package RestSharp
Now, everything is set up and ready to use.
Reading data from images can be quite a tedious task. Images resolution and quality play an important role while extracting content. Both the IronOCR and Nanonets provide optical character recognition functionality to extract text from images.
IronOCR makes it very easy for developers to read the contents of an image file with its powerful IronTessaract
class. We will use the following code to read text from a PNG image file:
var Ocr = new IronTesseract();
using (var Input = new OcrInput()){
Input.AddImage("test-files/employmentapp.png");
var Result = Ocr.Read(Input);
Console.WriteLine(Result.Text);
}
var Ocr = new IronTesseract();
using (var Input = new OcrInput()){
Input.AddImage("test-files/employmentapp.png");
var Result = Ocr.Read(Input);
Console.WriteLine(Result.Text);
}
Dim Ocr = New IronTesseract()
Using Input = New OcrInput()
Input.AddImage("test-files/employmentapp.png")
Dim Result = Ocr.Read(Input)
Console.WriteLine(Result.Text)
End Using
The output of IronOCR matches the original image given to it. The code is clean and easy to understand without any technicalities.
Nanonets also provides the facility to extract text from images. To do this, an API call is made with the authentication key, and then the image is uploaded to the Nanonets server. The fast OCR tool will then return the extracted text as a response to the application. Here is an example of the code:
var client = new RestClient("https://app.nanonets.com/api/v2/OCR/FullText");
client.Timeout = -1;
var request = new RestRequest(Method.Post.ToString());
request.AddHeader("Authorization", "Basic " + Convert.ToBase64String(Encoding.Default.GetBytes("REPLACE_YOUR_API_KEY:")));
request.AddFile("file", "FILE_PATH");
RestResponse response = client.Execute(request);
Console.WriteLine(response.Content);
var client = new RestClient("https://app.nanonets.com/api/v2/OCR/FullText");
client.Timeout = -1;
var request = new RestRequest(Method.Post.ToString());
request.AddHeader("Authorization", "Basic " + Convert.ToBase64String(Encoding.Default.GetBytes("REPLACE_YOUR_API_KEY:")));
request.AddFile("file", "FILE_PATH");
RestResponse response = client.Execute(request);
Console.WriteLine(response.Content);
Dim client = New RestClient("https://app.nanonets.com/api/v2/OCR/FullText")
client.Timeout = -1
Dim request = New RestRequest(Method.Post.ToString())
request.AddHeader("Authorization", "Basic " & Convert.ToBase64String(Encoding.Default.GetBytes("REPLACE_YOUR_API_KEY:")))
request.AddFile("file", "FILE_PATH")
Dim response As RestResponse = client.Execute(request)
Console.WriteLine(response.Content)
The output is not perfect. The image contained structured data, only some of which is properly fetched. With another simple text image, the output was fine. Note that the model can be trained for more accurate results.
IronOCR provides a useful feature for reading images that includes the ability to detect and read barcodes and QR codes. To enable this feature, set the ReadBarcodes
configuration property to true before processing the image. Once the OCR processing is complete, iterate through the OCR results to extract the value of each detected barcode. Below is an example code snippet for reading barcodes with IronOCR:
var Ocr = new IronTesseract();
Ocr.Configuration.ReadBarCodes = true;
using (var input = new OcrInput()) {
input.AddImage("test-files/Barcode.png");
var Result = Ocr.Read(input);
foreach (var Barcode in Result.Barcodes){
Console.WriteLine(Barcode.Value);
}
}
var Ocr = new IronTesseract();
Ocr.Configuration.ReadBarCodes = true;
using (var input = new OcrInput()) {
input.AddImage("test-files/Barcode.png");
var Result = Ocr.Read(input);
foreach (var Barcode in Result.Barcodes){
Console.WriteLine(Barcode.Value);
}
}
Dim Ocr = New IronTesseract()
Ocr.Configuration.ReadBarCodes = True
Using input = New OcrInput()
input.AddImage("test-files/Barcode.png")
Dim Result = Ocr.Read(input)
For Each Barcode In Result.Barcodes
Console.WriteLine(Barcode.Value)
Next Barcode
End Using
All three barcodes in the input image are read successfully, and their hidden text is displayed.
Nanonets OCR API provides the facility to detect QR codes. However, this functionality is only available in the Enterprise plan, and you will need to contact sales to use it. Additionally, Nanonets allows you to detect specific parts of documents or receipts. It also provides other features such as accounts payable, invoice processing, and accounting automation.
Reading PDF files is just as simple as reading image files with IronOCR. The only change required is to use the AddPDF
method instead of AddImage
in the code for reading images. The code is as follows:
var Ocr = new IronTesseract();
using (var Input = new OcrInput()) {
Input.AddPdf("test-files/example.pdf");
var Result = Ocr.Read(Input);
Console.WriteLine(Result.Text);
}
var Ocr = new IronTesseract();
using (var Input = new OcrInput()) {
Input.AddPdf("test-files/example.pdf");
var Result = Ocr.Read(Input);
Console.WriteLine(Result.Text);
}
Dim Ocr = New IronTesseract()
Using Input = New OcrInput()
Input.AddPdf("test-files/example.pdf")
Dim Result = Ocr.Read(Input)
Console.WriteLine(Result.Text)
End Using
The extracted text is in the same format as the PDF file.
Reading data from PDF files is also available in the Nanonets OCR API. The code is almost identical to the image text detection code, except for the URL used in the request. Let's take a look at the code:
var client = new RestClient("https://app.nanonets.com/api/v2/OCR/Model/{{model_id}}/LabelFile/?async=false");
var request = new RestRequest(Method.Post.ToString());
request.AddHeader("authorization", "Basic " + Convert.ToBase64String(Encoding.Default.GetBytes("REPLACE_YOUR_API_KEY:")));
request.AddHeader("accept", "Multipart/form-data");
request.AddFile("file", "test-files/example.pdf");
RestResponse response = client.Execute(request);
Console
var client = new RestClient("https://app.nanonets.com/api/v2/OCR/Model/{{model_id}}/LabelFile/?async=false");
var request = new RestRequest(Method.Post.ToString());
request.AddHeader("authorization", "Basic " + Convert.ToBase64String(Encoding.Default.GetBytes("REPLACE_YOUR_API_KEY:")));
request.AddHeader("accept", "Multipart/form-data");
request.AddFile("file", "test-files/example.pdf");
RestResponse response = client.Execute(request);
Console
Dim client = New RestClient("https://app.nanonets.com/api/v2/OCR/Model/{{model_id}}/LabelFile/?async=false")
Dim request = New RestRequest(Method.Post.ToString())
request.AddHeader("authorization", "Basic " & Convert.ToBase64String(Encoding.Default.GetBytes("REPLACE_YOUR_API_KEY:")))
request.AddHeader("accept", "Multipart/form-data")
request.AddFile("file", "test-files/example.pdf")
Dim response As RestResponse = client.Execute(request)
'INSTANT VB TODO TASK: The following line uses invalid syntax:
'Console
In the above code, replace the model_id with your OCR model ID. Also, replace the API key with your own API key. Then, replace the PDF file path with the path to your own file.
The output is similar to IronOCR but extra spaces and new lines are included in the output of Nanonets OCR.
IronOCR is free for development purposes, but it needs to be licensed for commercial use. It also provides a free trial to test all of its potential for your needs. The lite package starts at $749 with a 30-day money-back guarantee. IronOCR provides one year of product support and updates for free, and then $399 per year thereafter. All licenses are perpetual, meaning there is only a one-time purchase and no hidden charges. You can also choose royalty-free redistribution coverage for SaaS and OEM products for just a $1999 one-time purchase. For more information on license packages and pricing plans, please visit the following link.
Nanonets OCR API offers three different packages. You can sign up for free for its starter package. The first 500 pages are free, after which $0.3 per page is charged. You only pay for what you use. For more detailed information on pricing, you can visit this link.
IronOCR provides C# developers with the advanced Tesseract API available on most platforms. It can be deployed on Windows, Linux, Mac, Azure, AWS, and Lambda, and supports .NET Framework projects as well as .NET Standard and .NET Core. IronOCR also enables reading barcodes in OCR scans, and even exporting OCR as HTML and searchable PDFs. For more information on C# Tesseract OCR, click here.
Nanonets OCR API offers a variety of OCR tools. It provides ready-to-use OCR solutions for multiple document types like invoices, receipts, bills, forms, and ID cards to automate data capture. No template setup is required, there are no hidden charges, and it enables a 90% time saving and 10x productivity using Nanonets OCR API.
IronOCR licenses are developer-based, which means that you should always purchase a license based on the number of developers who will use the product. Nanonets pricing plans are based on the number of images or PDF pages to extract information and analyze the data. The Pro and Enterprise plans are on a monthly basis per model, and the prices increase when the number of models and pages increases compared to IronOCR licenses. Moreover, IronOCR licenses are a one-time purchase and can be used for a lifetime, and they support OEM and SaaS distribution.
In overall comparison, both APIs provide AI and ML-based OCR functionalities. IronOCR has a slight advantage over Nanonets because it can be used offline and provides more reliable results even for unstructured documents. IronOCR offers the facility to use custom-trained data with fast integration for more accurate results. Nanonets OCR provides the facility to train the model based on key fields, and it can be difficult to detect if not trained properly. Moreover, IronOCR provides multilingual support and supports up to 127+ international languages.
Now you can get five Iron products for the price of two as part of the complete Iron Suite. Visit this link to explore more.
IronOCR also provides a free trial with a money-back guarantee. You can download IronOCR from this link.
9 .NET API products for your office documents