Test in production without watermarks.
Works wherever you need it to.
Get 30 days of fully functional product.
Have it up and running in minutes.
Full access to our support engineering team during your product trial
Optical Character Recognition (OCR) technology has become an invaluable tool for automating the extraction of text from images, enabling efficient data retrieval and analysis and avoiding human error. This technology can be used to read driving licenses, passports, institution official documents, ID cards, residence permit cards, and travel documents of multiple languages and different countries to the exact expiration date, nationality, date of birth, etc. All the data extracted can be further fed to machine learning and artificial intelligence software products.
In this article, we will explore how to leverage IronOCR, a powerful OCR library in C# from Iron Software, to read and extract information from identity documents. IronOCR provides a straightforward and flexible OCR solution in the form of APIs for OCR tasks, making it an excellent choice for developers looking to integrate OCR software capabilities into their applications.
IronOCR enables computers to recognize and extract text from images, scan existing documents, or any other visual representation of text. To extract data, it involves a series of complex processes that mimic the way humans perceive and interpret text visually. This process involves Image Pre-processing, Text Detection, Character Segmentation, Feature Extraction, Character Recognition, and Post-Processing to correct errors.
IronOCR, crafted and maintained by Iron Software, serves as a powerful library for C# Software Engineers, facilitating OCR, Barcode Scanning, and Text Extraction within .NET projects.
Capable of reading relevant data from various formats, including images (JPEG, PNG, GIFF, TIFF, BMP), Streams, and PDFs.
Corrects low-quality scans and photos through an array of filters such as Deskew, Denoise, Binarize, Enhance Resolution, Dilate, and more.
Supports reading barcodes from a wide range of formats, encompassing over 20 barcode types, with added QR code recognition.
Utilizes the latest build of Tesseract OCR, finely tuned for optimal performance in extracting text from images.
Allows the export of searchable PDFs, HTML, and text content from image files, offering flexibility in managing extracted information.
Now, let's delve into the development of a demo application that utilizes IronOCR to read ID documents.
Begin by creating a fresh C# console application in Visual Studio, or alternatively, utilize an existing project. Select "Add New Project" from the Menu, then select console application from the templates below.
Provide a project name and location in the below windows.
Select the required .NET Version.
Click the Create button to create the new project.
IronOCR can be found in the NuGet package manager and can be installed using the package manager console with the following command:
Install-Package IronOcr
IronOCR can also be installed using Visual Studio. Open the NuGet Package manager, search for IronOCR like below, and click install.
Once installed, the application is ready to make use of IronOCR to read any identity document for data extraction and identity verification, reducing manual data entry work.
Using OCR for processing ID documents involves many steps, which are detailed below.
The OCR ID document processing begins with acquiring an image containing text. This image could be scanned ID documents, a photograph of ID cards, or any other visual representation of text. Identity card pre-processing steps may include resizing, noise reduction, and enhancement to improve the quality and clarity of the image.
OCR algorithms need to locate the specific data areas within the image where text is present. This step involves identifying text regions or bounding boxes.
Once text regions or data fields are identified, the image is further analyzed to segment individual characters. This step is crucial for languages that use distinct characters, like English or Chinese.
OCR algorithms analyze the segmented characters to extract features that help in differentiating between different characters. These features might include stroke patterns, shape, and spatial relationships between elements.
Based on the extracted features, OCR algorithms classify each segmented character and assign it a corresponding textual representation. Machine learning models, such as neural networks, are often employed in this step.
The recognized characters may undergo post-processing to correct errors or enhance accuracy. This step may involve dictionary-based corrections, context analysis, or language modeling.
IronOCR library takes care of all the above steps and allows us to perform OCR using just a few lines of code, saving time-consuming tedious tasks.
using IronOcr;
class Program
{
public static void Main()
{
// Configure IronTesseract with language and other settings
var ocrTesseract = new IronTesseract()
{
Language = OcrLanguage.EnglishBest,
Configuration = new TesseractConfiguration()
{
ReadBarCodes = false, // Disable reading of barcodes
BlackListCharacters = "`ë|^", // Blacklist specific characters
PageSegmentationMode = TesseractPageSegmentationMode.AutoOsd, // Set page segmentation mode
}
};
// Define the OCR input image
using var ocrInput = new OcrInput("id1.png");
// Perform OCR on the input image
var ocrResult = ocrTesseract.Read(ocrInput);
// Display the extracted text
Console.WriteLine(ocrResult.Text);
}
}
using IronOcr;
class Program
{
public static void Main()
{
// Configure IronTesseract with language and other settings
var ocrTesseract = new IronTesseract()
{
Language = OcrLanguage.EnglishBest,
Configuration = new TesseractConfiguration()
{
ReadBarCodes = false, // Disable reading of barcodes
BlackListCharacters = "`ë|^", // Blacklist specific characters
PageSegmentationMode = TesseractPageSegmentationMode.AutoOsd, // Set page segmentation mode
}
};
// Define the OCR input image
using var ocrInput = new OcrInput("id1.png");
// Perform OCR on the input image
var ocrResult = ocrTesseract.Read(ocrInput);
// Display the extracted text
Console.WriteLine(ocrResult.Text);
}
}
Imports IronOcr
Friend Class Program
Public Shared Sub Main()
' Configure IronTesseract with language and other settings
Dim ocrTesseract = New IronTesseract() With {
.Language = OcrLanguage.EnglishBest,
.Configuration = New TesseractConfiguration() With {
.ReadBarCodes = False,
.BlackListCharacters = "`ë|^",
.PageSegmentationMode = TesseractPageSegmentationMode.AutoOsd
}
}
' Define the OCR input image
Dim ocrInput As New OcrInput("id1.png")
' Perform OCR on the input image
Dim ocrResult = ocrTesseract.Read(ocrInput)
' Display the extracted text
Console.WriteLine(ocrResult.Text)
End Sub
End Class
Below is a sample image used as input for the program.
The above code uses the IronOCR library to read all the text fields from the ID document. We use the IronTesseract
class from the IronOCR library and configure it to use the English language and some blacklisted characters. Then we declare the OCR input using the OcrInput
class and read the text from the image. The extracted text fields can be seen in the console output.
We can also read from PDF documents. For this, we can use the IronPDF library from IronSoftware. First, install the library like below:
Install-Package IronOcr
using IronOcr;
using IronPdf;
class Program
{
public static void Main()
{
// Load the PDF document
var pdfReader = new PdfDocument("id1.pdf");
// Initialize IronTesseract for OCR
var ocrTesseract = new IronTesseract();
// Create OCR input from the PDF stream
using var ocrInput = new OcrInput();
ocrInput.AddPdf(pdfReader.Stream);
// Perform OCR on the PDF input
var ocrResult = ocrTesseract.Read(ocrInput);
// Display the extracted text
Console.WriteLine(ocrResult.Text);
}
}
using IronOcr;
using IronPdf;
class Program
{
public static void Main()
{
// Load the PDF document
var pdfReader = new PdfDocument("id1.pdf");
// Initialize IronTesseract for OCR
var ocrTesseract = new IronTesseract();
// Create OCR input from the PDF stream
using var ocrInput = new OcrInput();
ocrInput.AddPdf(pdfReader.Stream);
// Perform OCR on the PDF input
var ocrResult = ocrTesseract.Read(ocrInput);
// Display the extracted text
Console.WriteLine(ocrResult.Text);
}
}
Imports IronOcr
Imports IronPdf
Friend Class Program
Public Shared Sub Main()
' Load the PDF document
Dim pdfReader = New PdfDocument("id1.pdf")
' Initialize IronTesseract for OCR
Dim ocrTesseract = New IronTesseract()
' Create OCR input from the PDF stream
Dim ocrInput As New OcrInput()
ocrInput.AddPdf(pdfReader.Stream)
' Perform OCR on the PDF input
Dim ocrResult = ocrTesseract.Read(ocrInput)
' Display the extracted text
Console.WriteLine(ocrResult.Text)
End Sub
End Class
The above code uses IronPDF to load the id1.pdf
document, and the PDF is passed as a stream to OcrInput
and ocrTesseract
.
To use IronOCR, you'll need a license key. This key needs to be placed in appsettings.json.
{
"IRONOCR-LICENSE-KEY": "your license key"
}
Provide a user email ID to get a trial license.
1. Identity Verification in Financial Services:
2. Border Control and Immigration:
3. Access Control and Security:
4. E-Government Services:
5. Healthcare Identity Verification:
6. Automated Hotel Check-In:
7. Smart Cities and Public Services:
8. Education Administration:
Integrating OCR technology into your C# application using IronOCR allows you to efficiently extract information from ID documents. This comprehensive guide provides the necessary steps to set up your project and use IronOCR to read and process identity document images. Experiment with the code examples to tailor the extraction process to your specific requirements, providing a seamless and automated solution for handling identity document data.
IronOCR is a powerful OCR library in C# developed by IronSoftware. It allows developers to integrate OCR capabilities into their applications for tasks such as reading and extracting text from images and documents.
IronOCR can process a variety of identity documents, including driving licenses, passports, ID cards, residence permit cards, and travel documents from multiple languages and countries.
You can install IronOCR via the NuGet Package Manager in Visual Studio using the command: Install-Package IronOcr.
Key features of IronOCR include text reading versatility, image enhancement, barcode recognition, Tesseract OCR integration, and flexible output options.
Yes, IronOCR can read documents from PDFs. You can use IronPDF to load PDF documents and integrate with IronOCR to extract text.
To use IronOCR, you need Visual Studio or another C# development environment and the NuGet Package Manager to manage packages in your project.
Yes, IronOCR supports reading barcodes from over 20 barcode types, including QR codes.
OCR technology is used for identity verification in financial services, border control, access control, e-government services, healthcare, automated hotel check-ins, smart cities, and education administration.
IronOCR enhances low-quality images using filters such as Deskew, Denoise, Binarize, Enhance Resolution, and Dilate to improve text recognition accuracy.
Yes, a free trial is available for IronOCR. You need to provide an email ID to receive a trial license key.