USING IRONOCR

Passport OCR SDK (Developer Tutorial)

Published December 15, 2024
Share:

A passport is an individual's identity; we use passports to travel and register essential aspects of our lives. However, the passport format is not always easy to read. Imagine many travelers suddenly appearing during the holiday season for travel and leisure. How can the immigration agents handle that large amount of value with manual data entry and retrieve the correct information manually?

Hence, many applications and enterprises are turning to optical character recognition (OCR), which allows developers to quickly extract printed or handwritten text and digital images.

Similarly, Passport OCR is a technology that uses optical character recognition (OCR) software to extract meaningful information from passports; it also utilizes the machine-readable zone for all passports to retrieve information to identify the individual trying to pass through the imagination quickly. In scenarios where you need to recognize passport information quickly or in a process involving automating passport data extraction, Passport OCR is vital and is the cornerstone of efficiency and speed in airports and immigration borders.

Although Passport OCR software and technology have developed further and further over the years, there are a lot of factors when it comes to scanning the document. Digital images with noise or smugs on the passport can heavily affect the accuracy of the passport. Furthermore, OCR libraries can sometimes be confusing when operating on a passport, as the machine-readable zone is a unique structured data set. Developers might be able to extract data but have to sort the parameters independently. However, with IronOCR, specialized methods are optimized for reading passports; its results allow developers to obtain and manipulate the information quickly, which is ideal for high-volume scanning and automation.

In this article, we'll briefly discuss using IronOCR to obtain and manipulate passport information to automate data extraction and provide further details on how IronOCR interacts with the passport.

IronOCR: A C# OCR Library

Passport OCR SDK (Developer Tutorial): Figure 1 - IronOCR: A C# OCR Library

IronOCR is a C# Library that offers easy-to-use methods and flexible functionality for all OCR-related needs. In addition to the standard techniques, IronOCR allows developers to fully utilize and customize a customized version of Tesseract to achieve all related tasks.

Here's a quick rundown of its most notable features below:

  1. Cross compatibility: IronOCR is compatible with most .NET platforms, including .NET 8,7,6 and 5, and supports .NET Framework 4.6.2 upwards. With this library, developers don't have to worry about cross-compatibility, as it also supports all operating system forms. Ranging from Windows, macOS to Azure and even Linux:

  2. Flexibility: OCR input comes in many formats, so a library has to handle all sorts of formats to be truly flexible. IronOCR accepts all popular image formats (jpg, png, and gif) while supporting the native "System.Drawings.Objects" from C#, allowing easier integration to existing codebases.

  3. Support and ease of use: IronOCR is well documented, with extensive API and tutorials indicating all forms of functionality. Furthermore, there is 24/5 support, ensuring the developers are always supported.

    1. Multiple languages support: IronOCR also supports up to 125 languages and also supports custom languages, making it versatile for all international document processing.

Reading the Passport with IronOCR

License Key

Please remember that IronOCR requires a licensing key for operation. You can get a key as part of a free trial by visiting this link.

//Replace the license key variable with the trial key you obtained
IronOCr.License.LicenseKey = "REPLACE-WITH-YOUR-KEY";
//Replace the license key variable with the trial key you obtained
IronOCr.License.LicenseKey = "REPLACE-WITH-YOUR-KEY";
'Replace the license key variable with the trial key you obtained
IronOCr.License.LicenseKey = "REPLACE-WITH-YOUR-KEY"
VB   C#

After receiving a trial key, set this variable in your project.

Code example

The code below showcases how IronOCR takes a passport image and extracts all relevant information using the library's passport OCR SDK.

Input image

Passport OCR SDK (Developer Tutorial): Figure 2 - Input image

using IronOcr;
using System;
// Instantiate OCR engine
var ocr = new IronTesseract();
using var inputPassport = new OcrInput();
inputPassport.LoadImage("Passport.jpg");
// Perform OCR
OcrPassportResult result = ocr.ReadPassport(inputPassport);
// Output passport information
Console.WriteLine(result.PassportInfo.GivenNames);
Console.WriteLine(result.PassportInfo.Country);
Console.WriteLine(result.PassportInfo.PassportNumber);
Console.WriteLine(result.PassportInfo.Surname);
Console.WriteLine(result.PassportInfo.DateOfBirth);
Console.WriteLine(result.PassportInfo.DateOfExpiry);
using IronOcr;
using System;
// Instantiate OCR engine
var ocr = new IronTesseract();
using var inputPassport = new OcrInput();
inputPassport.LoadImage("Passport.jpg");
// Perform OCR
OcrPassportResult result = ocr.ReadPassport(inputPassport);
// Output passport information
Console.WriteLine(result.PassportInfo.GivenNames);
Console.WriteLine(result.PassportInfo.Country);
Console.WriteLine(result.PassportInfo.PassportNumber);
Console.WriteLine(result.PassportInfo.Surname);
Console.WriteLine(result.PassportInfo.DateOfBirth);
Console.WriteLine(result.PassportInfo.DateOfExpiry);
Imports IronOcr
Imports System
' Instantiate OCR engine
Private ocr = New IronTesseract()
Private inputPassport = New OcrInput()
inputPassport.LoadImage("Passport.jpg")
' Perform OCR
Dim result As OcrPassportResult = ocr.ReadPassport(inputPassport)
' Output passport information
Console.WriteLine(result.PassportInfo.GivenNames)
Console.WriteLine(result.PassportInfo.Country)
Console.WriteLine(result.PassportInfo.PassportNumber)
Console.WriteLine(result.PassportInfo.Surname)
Console.WriteLine(result.PassportInfo.DateOfBirth)
Console.WriteLine(result.PassportInfo.DateOfExpiry)
VB   C#

Code explanation

  1. We first import IronOCR to the code base.
  2. Then we create a new `OCrInput` and assign it as the `inputPassport.`
  3. We then read the image by providing the path of the image.
  4. We then use the specialized method for reading the passport, `ReadPassport,` and pass the inputted passport in.
  5. We then can manipulate and print out the result of the extracted data.

Console output

Passport OCR SDK (Developer Tutorial): Figure 3 - Console output

Machine Readable Zone

IronOCR can extract the Machine Readable Zone (MRZ) information from the bottom two rows of any passport following the International Civil Aviation Organization (ICAO) standard. The MRZ data comprises two rows, each containing unique information. For detailed information on what each position in the rows corresponds to and for any exceptions and unique identifiers, please consult the ICAO documentation standards.

Here's a brief table on it:

Passport OCR SDK (Developer Tutorial): Figure 4 - Table of MDZ

Challenges for Passport OCR and debugging

Image quality is always a problem when scanning digital images. A distorted image quality would obscure the information and make it harder to confirm the accuracy of the data. Furthermore, developers must consider data security and compliance when dealing with mission-critical information such as a passport.

IronOCR also provides a way to briefly debug and showcase the concept for the interaction information ion. These methods allow developers to troubleshoot and be confident in the extracted data.

Here's a brief example of it:

using IronOcr;
using System;
// Instantiate OCR engine
var ocr = new IronTesseract();
using var inputPassport = new OcrInput();
inputPassport.LoadImage("Passport.jpg");
// Perform OCR
OcrPassportResult result = ocr.ReadPassport(inputPassport);
// Output Confidence level and raw extracted text
Console.WriteLine(result.Confidence);
Console.WriteLine(result.Text);
using IronOcr;
using System;
// Instantiate OCR engine
var ocr = new IronTesseract();
using var inputPassport = new OcrInput();
inputPassport.LoadImage("Passport.jpg");
// Perform OCR
OcrPassportResult result = ocr.ReadPassport(inputPassport);
// Output Confidence level and raw extracted text
Console.WriteLine(result.Confidence);
Console.WriteLine(result.Text);
Imports IronOcr
Imports System
' Instantiate OCR engine
Private ocr = New IronTesseract()
Private inputPassport = New OcrInput()
inputPassport.LoadImage("Passport.jpg")
' Perform OCR
Dim result As OcrPassportResult = ocr.ReadPassport(inputPassport)
' Output Confidence level and raw extracted text
Console.WriteLine(result.Confidence)
Console.WriteLine(result.Text)
VB   C#

Similarly, the code in the example above remains the same; however, the console output differs as we directly access the `Text` and `Confidence` properties instead of the individual members of the passport.

  1. Confidence: The `Confidence` property in the `OcrPassportResult` is a floating-point number representing the OCR's statistical accuracy confidence, calculated as an average of every character. A lower value indicates that the passport image may be blurry or contain additional information. One represents the highest confidence level, while 0 represents the lowest.
  2. Text: The `Text` property in the `OcrPassportResult` holds the unprocessed text extracted from the passport image. Developers can use it in unit tests to validate the extracted text from the passport image by doing equal assertions.

Conclusion

Passport OCR SDK (Developer Tutorial): Figure 5 - IronOCR

Passport OCR technology significantly enhances document processing by automating data extraction and improving operational efficiency. It streamlines identity verification and KYC processes, ensuring high accuracy while handling sensitive personal information. Immigration borders and airports can reduce processing time and improve workflow efficiency by choosing IronOCR as their Passport OCR API.

IronOCR provides developers with flexibility and scalability through its easy-to-use methods. It allows developers to sort information quickly through the `OcrPassportResult` object. Furthermore, IronOCR provides debugging tools, including confidence levels and raw, unparsed text, for developers to use in product unit tests. IronOCR also minimizes the digital noise manually for more advanced usage by clearing out the passport image input before passing it through the method.

Feel free to take advantage of IronOCR's free trial license page.

< PREVIOUS
OCR Supermarket Receipts in C# (Developer Tutorial)
NEXT >
Tesseract OCR for Multiple Languages (Developer Tutorial)