USING IRONOCR

Passport OCR SDK (Developer Tutorial)

A passport is an individual's identity; we use passports to travel and register essential aspects of our lives. However, the passport format is not always easy to read. Imagine many travelers suddenly appearing during the holiday season for travel and leisure. How can the immigration agents handle that large amount of data with manual data entry and retrieve the correct information manually?

Hence, many applications and enterprises are turning to optical character recognition (OCR), which allows developers to quickly extract printed text and digital images.

Similarly, Passport OCR is a technology that uses optical character recognition (OCR) software to extract meaningful information from passports; it also utilizes the machine-readable zone for all passports to retrieve information to quickly identify the individual trying to pass through immigration. In scenarios where you need to recognize passport information quickly or in a process involving automating passport data extraction, Passport OCR is vital and is the cornerstone of efficiency and speed in airports and immigration borders.

Although Passport OCR software and technology have developed further and further over the years, various factors can affect the document scanning process. Digital images with noise or smudges on the passport can heavily affect the accuracy of the passport. Furthermore, OCR libraries can sometimes be confusing when operating on a passport, as the machine-readable zone is a unique structured data set. Developers might be able to extract data but have to sort the parameters independently. However, with IronOCR, specialized methods are optimized for reading passports; its results allow developers to obtain and manipulate the information quickly, which is ideal for high-volume scanning and automation.

In this article, we'll briefly discuss using IronOCR to obtain and manipulate passport information to automate data extraction and provide further details on how IronOCR interacts with the passport.

IronOCR: A C# OCR Library

Passport OCR SDK (Developer Tutorial): Figure 1 - IronOCR: A C# OCR Library

IronOCR is a C# Library that offers easy-to-use methods and flexible functionality for all OCR-related needs. In addition to the standard techniques, IronOCR allows developers to fully utilize and customize a version of Tesseract to achieve all related tasks.

Here's a quick rundown of its most notable features below:

  1. Cross-compatibility: IronOCR is compatible with most .NET platforms, including .NET 8, 7, 6, and 5, and supports .NET Framework 4.6.2 upwards. With this library, developers don't have to worry about cross-compatibility as it also supports all operating systems, ranging from Windows, macOS to Azure, and even Linux.
  2. Flexibility: OCR input comes in many formats, so a library has to handle all sorts of formats to be truly flexible. IronOCR accepts all popular image formats (jpg, png, and gif) while supporting the native "System.Drawing.Objects" from C#, allowing easier integration into existing codebases.
  3. Support and ease of use: IronOCR is well documented, with extensive API and tutorials indicating all forms of functionality. Furthermore, there is 24/5 support, ensuring developers are always supported.
  4. Multiple languages support: IronOCR supports up to 125 languages and also supports custom languages, making it versatile for all international document processing.

Reading the Passport with IronOCR

License Key

Please remember that IronOCR requires a licensing key for operation. You can get a key as part of a free trial by visiting this link.

// Replace the license key variable with the trial key you obtained
IronOcr.License.LicenseKey = "REPLACE-WITH-YOUR-KEY";
// Replace the license key variable with the trial key you obtained
IronOcr.License.LicenseKey = "REPLACE-WITH-YOUR-KEY";
' Replace the license key variable with the trial key you obtained
IronOcr.License.LicenseKey = "REPLACE-WITH-YOUR-KEY"
$vbLabelText   $csharpLabel

After receiving a trial key, set this variable in your project.

Code example

The code below showcases how IronOCR takes a passport image and extracts all relevant information using the library's passport OCR SDK.

Input image

Passport OCR SDK (Developer Tutorial): Figure 2 - Input image

using IronOcr;
using System;

class Program {
    public static void Main() {
        // Instantiate OCR engine
        var ocr = new IronTesseract();
        using var inputPassport = new OcrInput();
        inputPassport.AddImage("Passport.jpg");

        // Perform OCR to read the passport
        OcrResult result = ocr.Read(inputPassport);

        // Output passport information
        Console.WriteLine("Given Names: " + result.Passport?.GivenNames);
        Console.WriteLine("Country: " + result.Passport?.Country);
        Console.WriteLine("Passport Number: " + result.Passport?.PassportNumber);
        Console.WriteLine("Surname: " + result.Passport?.Surname);
        Console.WriteLine("Date of Birth: " + result.Passport?.DateOfBirth.ToString("yyyy-MM-dd"));
        Console.WriteLine("Date of Expiry: " + result.Passport?.DateOfExpiry.ToString("yyyy-MM-dd"));
    }
}
using IronOcr;
using System;

class Program {
    public static void Main() {
        // Instantiate OCR engine
        var ocr = new IronTesseract();
        using var inputPassport = new OcrInput();
        inputPassport.AddImage("Passport.jpg");

        // Perform OCR to read the passport
        OcrResult result = ocr.Read(inputPassport);

        // Output passport information
        Console.WriteLine("Given Names: " + result.Passport?.GivenNames);
        Console.WriteLine("Country: " + result.Passport?.Country);
        Console.WriteLine("Passport Number: " + result.Passport?.PassportNumber);
        Console.WriteLine("Surname: " + result.Passport?.Surname);
        Console.WriteLine("Date of Birth: " + result.Passport?.DateOfBirth.ToString("yyyy-MM-dd"));
        Console.WriteLine("Date of Expiry: " + result.Passport?.DateOfExpiry.ToString("yyyy-MM-dd"));
    }
}
Imports IronOcr
Imports System

Friend Class Program
	Public Shared Sub Main()
		' Instantiate OCR engine
		Dim ocr = New IronTesseract()
		Dim inputPassport = New OcrInput()
		inputPassport.AddImage("Passport.jpg")

		' Perform OCR to read the passport
		Dim result As OcrResult = ocr.Read(inputPassport)

		' Output passport information
		Console.WriteLine("Given Names: " & result.Passport?.GivenNames)
		Console.WriteLine("Country: " & result.Passport?.Country)
		Console.WriteLine("Passport Number: " & result.Passport?.PassportNumber)
		Console.WriteLine("Surname: " & result.Passport?.Surname)
		Console.WriteLine("Date of Birth: " & result.Passport?.DateOfBirth.ToString("yyyy-MM-dd"))
		Console.WriteLine("Date of Expiry: " & result.Passport?.DateOfExpiry.ToString("yyyy-MM-dd"))
	End Sub
End Class
$vbLabelText   $csharpLabel

Code explanation

  1. Import Libraries: We first import IronOCR to the code base and other necessary libraries.
  2. Instantiate OCR Engine: We create a new IronTesseract object to initialize the OCR engine.
  3. Load Passport Image: We then create a new OcrInput and load the image containing the passport using AddImage().
  4. Read Passport Using OCR: We use the Read() method to perform the OCR operation on the input image and save the result.
  5. Output Results: We output the extracted passport information such as given names, country, passport number, surname, date of birth, and date of expiry.

Console Output

Passport OCR SDK (Developer Tutorial): Figure 3 - Console output

Machine Readable Zone

IronOCR can extract the Machine Readable Zone (MRZ) information from the bottom two rows of any passport following the International Civil Aviation Organization (ICAO) standard. The MRZ data comprises two rows, each containing unique information. For detailed information on what each position in the rows corresponds to and for any exceptions and unique identifiers, please consult the ICAO documentation standards.

Here's a brief table on it:

Passport OCR SDK (Developer Tutorial): Figure 4 - Table of MRZ

Challenges for Passport OCR and Debugging

Image quality is always a problem when scanning digital images. A distorted image quality would obscure the information and make it harder to confirm the accuracy of the data. Furthermore, developers must consider data security and compliance when dealing with mission-critical information such as a passport.

IronOCR also provides a way to debug and showcase the concept for interaction information. These methods allow developers to troubleshoot and be confident in the extracted data.

Here's a brief example of it:

using IronOcr;
using System;

class DebugExample {
    public static void Main() {
        // Instantiate OCR engine
        var ocr = new IronTesseract();
        using var inputPassport = new OcrInput();
        inputPassport.AddImage("Passport.jpg");

        // Perform OCR
        OcrResult result = ocr.Read(inputPassport);

        // Output Confidence level and raw extracted text
        Console.WriteLine("OCR Confidence: " + result.Confidence);
        Console.WriteLine("Extracted Text: ");
        Console.WriteLine(result.Text);
    }
}
using IronOcr;
using System;

class DebugExample {
    public static void Main() {
        // Instantiate OCR engine
        var ocr = new IronTesseract();
        using var inputPassport = new OcrInput();
        inputPassport.AddImage("Passport.jpg");

        // Perform OCR
        OcrResult result = ocr.Read(inputPassport);

        // Output Confidence level and raw extracted text
        Console.WriteLine("OCR Confidence: " + result.Confidence);
        Console.WriteLine("Extracted Text: ");
        Console.WriteLine(result.Text);
    }
}
Imports IronOcr
Imports System

Friend Class DebugExample
	Public Shared Sub Main()
		' Instantiate OCR engine
		Dim ocr = New IronTesseract()
		Dim inputPassport = New OcrInput()
		inputPassport.AddImage("Passport.jpg")

		' Perform OCR
		Dim result As OcrResult = ocr.Read(inputPassport)

		' Output Confidence level and raw extracted text
		Console.WriteLine("OCR Confidence: " & result.Confidence)
		Console.WriteLine("Extracted Text: ")
		Console.WriteLine(result.Text)
	End Sub
End Class
$vbLabelText   $csharpLabel

Explanation of Debugging Code

  1. Confidence: The Confidence property in the OcrResult is a floating-point number representing the OCR's statistical accuracy confidence, calculated as an average of every character. A lower value indicates that the passport image may be blurry or contain extra information. One represents the highest confidence level, while zero represents the lowest.
  2. Text: The Text property in the OcrResult holds the unprocessed text extracted from the passport image. Developers can use it in unit tests to validate the extracted text from the passport image by doing equal assertions.

Conclusion

Passport OCR SDK (Developer Tutorial): Figure 5 - IronOCR

Passport OCR technology significantly enhances document processing by automating data extraction and improving operational efficiency. It streamlines identity verification and KYC processes, ensuring high accuracy while handling sensitive personal information. Immigration borders and airports can reduce processing time and improve workflow efficiency by choosing IronOCR as their Passport OCR API.

IronOCR provides developers with flexibility and scalability through its easy-to-use methods. It allows developers to sort information quickly through the OcrResult object. Furthermore, IronOCR provides debugging tools, including confidence levels and raw, unparsed text, for developers to use in product unit tests. IronOCR also minimizes the digital noise manually for more advanced usage by clearing out the passport image input before passing it through the method.

Feel free to take advantage of IronOCR's free trial license page.

Frequently Asked Questions

What is Passport OCR?

Passport OCR is a technology that uses optical character recognition software to extract meaningful information from passports, utilizing the machine-readable zone to quickly identify individuals. IronOCR is a useful tool for implementing this technology efficiently.

Why is Passport OCR important for immigration agents?

Passport OCR is vital because it automates the extraction of passport data, increasing efficiency and speed in processing travelers at airports and immigration borders, especially during high-traffic periods.

What advantages does using specialized OCR software offer for Passport OCR?

Using specialized OCR software, such as IronOCR, offers easy-to-use methods, flexible functionality, support for multiple languages, and cross-compatibility with various .NET platforms. It also provides debugging tools to aid developers in ensuring data accuracy.

How can different image formats be handled effectively in OCR applications?

Specialized OCR tools like IronOCR are flexible and accept all popular image formats, such as jpg, png, and gif. They also support native C# 'System.Drawing.Objects', facilitating easy integration into existing codebases.

What is required to use specialized OCR software in projects?

Specialized OCR software, such as IronOCR, requires a licensing key for operation. Developers can obtain a trial key from the provider's website to start using the library.

How is the accuracy of extracted information ensured in OCR processes?

Specialized OCR tools like IronOCR provide a 'Confidence' property in the 'OcrResult', representing the statistical accuracy of the OCR process. Developers can use this to gauge the reliability of the extracted data.

Can specialized OCR software extract the Machine Readable Zone (MRZ) from passports?

Yes, specialized OCR software like IronOCR can extract MRZ information from the bottom two rows of any passport, following the International Civil Aviation Organization (ICAO) standard.

What challenges are associated with implementing Passport OCR?

Challenges include image quality issues, such as noise or smudges, which can affect accuracy. Developers must also consider data security and compliance when handling sensitive passport information.

What programming languages are supported by advanced OCR software?

Advanced OCR software, such as IronOCR, supports up to 125 languages and allows for the addition of custom languages, making it versatile for international document processing.

What platforms are compatible with advanced OCR libraries?

Advanced OCR libraries like IronOCR are compatible with most .NET platforms, including .NET 8, 7, 6, and 5, as well as .NET Framework 4.6.2 and upwards. They also support all major operating systems, including Windows, macOS, Azure, and Linux.

Kannaopat Udonpant
Software Engineer
Before becoming a Software Engineer, Kannapat completed a Environmental Resources PhD from Hokkaido University in Japan. While pursuing his degree, Kannapat also became a member of the Vehicle Robotics Laboratory, which is part of the Department of Bioproduction Engineering. In 2022, he leveraged his C# skills to join Iron Software's engineering team, where he focuses on IronPDF. Kannapat values his job because he learns directly from the developer who writes most of the code used in IronPDF. In addition to peer learning, Kannapat enjoys the social aspect of working at Iron Software. When he's not writing code or documentation, Kannapat can usually be found gaming on his PS5 or rewatching The Last of Us.
< PREVIOUS
OCR Supermarket Receipts in C# (Developer Tutorial)
NEXT >
Tesseract OCR for Multiple Languages (Developer Tutorial)