How to Extract Passport Data with IronOCR

In applications and systems such as counter check-in and security immigration at airports, where agents have to deal with a large volume of passports daily, having a reliable system that accurately extracts essential mission-critical information about the traveler is crucial to ensuring an efficient and streamlined process through immigration.

IronOCR is a reliable tool that makes extracting and reading data from a passport effortless. The process becomes straightforward with a simple call to the ReadPassport Method.

To use this function, you must also install the IronOcr.Extension.AdvancedScan package.

Extracting Passport Data Example

As an example, we will use a passport image as input to showcase the functionality of IronOCR. After loading the image using OcrInput, you can utilize the ReadPassport method to identify and extract information from the passport. This method returns an OcrPassportResult object, which contains properties such as GivenNames, Country, PassportNumber, Surname, DateOfBirth, and DateOfExpiry. All members of the PassportInfo object are strings.

Please note

  • The method currently only works for English-based passports.
  • Using advanced scan on .NET Framework requires the project to run on x64 architecture.

Passport Input

Sample image

Code

:path=/static-assets/ocr/content-code-examples/how-to/read-passport-read-passport.cs
using IronOcr;
using System;

// This code performs Optical Character Recognition (OCR) on a passport image using the IronOcr library.

var ocr = new IronTesseract(); // Instantiate the OCR engine which uses the Tesseract algorithm

using (var inputPassport = new OcrInput()) // Create a new OCR input container
{
    inputPassport.AddImage("passport.jpg"); // Load the passport image into the OCR input container

    // Perform OCR on the image
    var result = ocr.Read(inputPassport); // Perform OCR on the input, Read method is used to process the image

    // Check if result contains extracted passport data
    if (result.Success)
    {
        // Output extracted passport information
        Console.WriteLine("Extracted Text: " + result.Text); // Assuming Text contains the entire extracted text; structured data extraction would require more processing
        
        // Example: To extract specific fields, you would typically need additional logic here
        // For example, fields can be extracted by defining regex patterns or using a structured data parsing approach sanctioned by the OCR library
        /*
        Console.WriteLine("Country: " + result.PassportInfo.Country);
        Console.WriteLine("Passport Number: " + result.PassportInfo.PassportNumber);
        Console.WriteLine("Surname: " + result.PassportInfo.Surname);
        Console.WriteLine("Date of Birth: " + result.PassportInfo.DateOfBirth);
        Console.WriteLine("Date of Expiry: " + result.PassportInfo.DateOfExpiry);
        */
    }
    else
    {
        Console.WriteLine("OCR failed to extract passport information."); // Error case if OCR processing fails
    }
}
Imports IronOcr
Imports System

' This code performs Optical Character Recognition (OCR) on a passport image using the IronOcr library.

Private ocr = New IronTesseract() ' Instantiate the OCR engine which uses the Tesseract algorithm

Using inputPassport = New OcrInput() ' Create a new OCR input container
	inputPassport.AddImage("passport.jpg") ' Load the passport image into the OCR input container

	' Perform OCR on the image
	Dim result = ocr.Read(inputPassport) ' Perform OCR on the input, Read method is used to process the image

	' Check if result contains extracted passport data
	If result.Success Then
		' Output extracted passport information
		Console.WriteLine("Extracted Text: " & result.Text) ' Assuming Text contains the entire extracted text; structured data extraction would require more processing

		' Example: To extract specific fields, you would typically need additional logic here
		' For example, fields can be extracted by defining regex patterns or using a structured data parsing approach sanctioned by the OCR library
'        
'        Console.WriteLine("Country: " + result.PassportInfo.Country);
'        Console.WriteLine("Passport Number: " + result.PassportInfo.PassportNumber);
'        Console.WriteLine("Surname: " + result.PassportInfo.Surname);
'        Console.WriteLine("Date of Birth: " + result.PassportInfo.DateOfBirth);
'        Console.WriteLine("Date of Expiry: " + result.PassportInfo.DateOfExpiry);
'        
	Else
		Console.WriteLine("OCR failed to extract passport information.") ' Error case if OCR processing fails
	End If
End Using
$vbLabelText   $csharpLabel

Output

Result output

We then access the PassportInfo data member obtained from the OcrPassportResult object.

  • GivenNames: A property of PassportInfo returns the given names of the passport input as a string. This corresponds to the first MRZ data row, with positions from 4 to 44.
  • Country: A property of PassportInfo returns the country of the passport input as a string. This corresponds to the first MRZ data row, with positions from 2 to 3. The returned string would spell out the full name of the issuing country instead of the abbreviation. In our example, the USA returns 'United States of America'.
  • PassportNumber: A property of PassportInfo returns the passport number of the passport input as a string. This corresponds to the second MRZ data row, with positions from 1 to 9.
  • Surname: A property of PassportInfo returns the passport input's surname as a string. This corresponds to the first MRZ data row, with positions from 4 to 44.
  • DateOfBirth: A property of PassportInfo returns the passport input's date of birth as a string in the format YYYY-MM-DD. This corresponds to the second MRZ data row, with positions 14 to 19.
  • DateOfExpiry: A property member of PassportInfo returns the passport input's date of expiry as a string in the format YYYY-MM-DD. This corresponds to the second MRZ data row, with positions 22 to 27.

Understanding The MRZ Information

IronOCR reads the MRZ information that is contained at the bottom two rows of any passport that follows the standard of the (International Civil Aviation Organization) ICAO. The MRZ data consists of two data rows, each set of positions containing unique information. Here's a brief table on which information corresponds to the index of the Row; for all exceptions and unique identifiers, please refer to the ICAO documentation standards.

Example Input

MRZ location

First Row

Position Field Description
1Document TypeTypically 'P' for passport
2-3Issuing CountryThree-letter country code (ISO 3166-1 alpha-3)
4-44Surname and Given NamesSurname followed by '<<' and then given names separated by '<'

Second Row

Position Field Description
1-9Passport NumberUnique passport number
10Check Digit (Passport Number)Check digit for the passport number
11-13NationalityThree-letter nationality code (ISO 3166-1 alpha-3)
14-19Date of BirthDate of birth in YYMMDD format
20Check Digit (Date of Birth)Check digit for the date of birth
21SexGender ('M' for male, 'F' for female, 'X' for unspecified)
22-27Date of ExpiryExpiry date in YYMMDD format
28Check Digit (Date of Expiry)Check digit for the date of expiry
29-42Personal NumberOptional personal number (usually national ID number)
43Check Digit (Personal Number)Check digit for the personal number
44Check Digit (Composite)Overall check digit

Debugging

We could also verify the results from IronOCR by obtaining the raw extracted text from the passport image and the confidence level to confirm whether the extracted information is accurate. Using the example from above, we can access the Confidence and Text properties of the OcrPassportResult object.

:path=/static-assets/ocr/content-code-examples/how-to/read-passport-debug.cs
using IronOcr;  // Import the IronOcr namespace
using System;   // Import the System namespace

var ocr = new IronTesseract(); // Instantiate the IronTesseract OCR engine

using var inputPassport = new OcrInput(); // Create an OCR input object for processing the image

// Load the image file into the OCR input; ensure the correct path is provided
inputPassport.AddImage("passport.jpg");

try
{
    // Perform OCR to read the passport details from the image
    var result = ocr.Read(inputPassport); // Changed from ReadPassport to Read

    // Check if the OCR result is not null before accessing properties
    if (result != null)
    {
        // Output the confidence level of the OCR and the raw extracted text
        Console.WriteLine($"Confidence Level: {result.Confidence}");
        Console.WriteLine("Extracted Text:");
        Console.WriteLine(result.Text);
    }
    else
    {
        Console.WriteLine("No text was extracted from the image.");
    }
}
catch (Exception ex)
{
    // Handle any exceptions that may occur during OCR processing
    Console.WriteLine($"An error occurred: {ex.Message}");
}
Imports IronOcr ' Import the IronOcr namespace
Imports System ' Import the System namespace

Private ocr = New IronTesseract() ' Instantiate the IronTesseract OCR engine

Private inputPassport = New OcrInput() ' Create an OCR input object for processing the image

' Load the image file into the OCR input; ensure the correct path is provided
inputPassport.AddImage("passport.jpg")

Try
	' Perform OCR to read the passport details from the image
	Dim result = ocr.Read(inputPassport) ' Changed from ReadPassport to Read

	' Check if the OCR result is not null before accessing properties
	If result IsNot Nothing Then
		' Output the confidence level of the OCR and the raw extracted text
		Console.WriteLine($"Confidence Level: {result.Confidence}")
		Console.WriteLine("Extracted Text:")
		Console.WriteLine(result.Text)
	Else
		Console.WriteLine("No text was extracted from the image.")
	End If
Catch ex As Exception
	' Handle any exceptions that may occur during OCR processing
	Console.WriteLine($"An error occurred: {ex.Message}")
End Try
$vbLabelText   $csharpLabel

Console output

Debug
  • Confidence: The Confidence property from OcrPassportResult is a float indicating the OCR statistical accuracy confidence as an average of every character. This float would be lower if the passport image is blurry or contains other information. One is the highest and most confident, and 0 is the lowest and the least confident.
  • Text: The Text property from OcrPassportResult contains the raw, unparsed text extracted from the passport image. Developers could use this in unit tests to verify the extracted text of the passport image.

Frequently Asked Questions

What is IronOCR used for in relation to passports?

IronOCR is used to extract and read data from passports, making the process effortless and efficient for applications such as airport security and immigration.

How can I start using IronOCR to read passport data?

To use IronOCR for reading passport data, download the C# library, import the passport image, and call the 'ReadPassport' method to extract the information.

What additional package is required to use the ReadPassport method?

You must install the IronOcr.Extension.AdvancedScan package to use the ReadPassport method effectively.

What type of passports can IronOCR read?

Currently, IronOCR can read English-based passports.

What information can be extracted from a passport using IronOCR?

IronOCR can extract the given names, country, passport number, surname, date of birth, and date of expiry from a passport.

What is the confidence level in IronOCR's results?

The confidence level indicates the accuracy of the OCR process as a float value, where 1 is the highest confidence and 0 is the lowest.

What does the MRZ stand for, and where is it located?

MRZ stands for Machine-Readable Zone, and it is located at the bottom two rows of a passport, containing crucial data that IronOCR reads.

Does IronOCR work with all programming architectures?

To use advanced scanning with IronOCR on the .NET Framework, the project must run on x64 architecture.

Can I verify the extracted text from a passport using IronOCR?

Yes, you can verify the extracted text from a passport using the 'Text' property of the OcrPassportResult object.

What is the purpose of the 'OcrPassportResult' object in IronOCR?

The 'OcrPassportResult' object contains the extracted data properties such as GivenNames, Country, PassportNumber, Surname, DateOfBirth, and DateOfExpiry from a passport.

Curtis Chau
Technical Writer

Curtis Chau holds a Bachelor’s degree in Computer Science (Carleton University) and specializes in front-end development with expertise in Node.js, TypeScript, JavaScript, and React. Passionate about crafting intuitive and aesthetically pleasing user interfaces, Curtis enjoys working with modern frameworks and creating well-structured, visually appealing manuals.

Beyond development, Curtis has a strong interest in the Internet of Things (IoT), exploring innovative ways to integrate hardware and software. In his free time, he enjoys gaming and building Discord bots, combining his love for technology with creativity.