How to extract passport data with IronOCR

by Curtis Chau

In applications and systems such as counter check-in and security immigration at airports, where agents have to deal with a large volume of passports daily, having a reliable system that accurately extracts essential mission-critical information about the traveler is crucial to ensuring an efficient and streamlined process through immigration.

IronOCR is a reliable tool that makes extracting and reading data from a passport effortless. The process becomes straightforward with a simple call to the ReadPassport Method.

Start using IronOCR in your project today with a free trial.

First Step:
green arrow pointer

To use this function, you must also install the IronOcr.Extension.AdvancedScan package.

Extracting Passport Data Example

As an example, we will use a passport image as input to showcase the functionality of IronOCR. After loading the image using OcrInput, you can utilize the ReadPassport method to identify and extract information from the passport. This method returns an OcrPassportResult object, which contains properties such as GivenNames, Country, PassportNumber, Surname, DateOfBirth, and DateOfExpiry. All members of the PassportInfo object are strings.

Please note

  • The method currently only works for English-based passports.
  • Using advanced scan on .NET Framework requires the project to run on x64 architecture.

Passport Input

Sample image

Code

:path=/static-assets/ocr/content-code-examples/how-to/read-passport-read-passport.cs
using IronOcr;
using System;

// Instantiate OCR engine
var ocr = new IronTesseract();

using var inputPassport = new OcrInput();

inputPassport.LoadImage("passport.jpg");

// Perform OCR
OcrPassportResult result = ocr.ReadPassport(inputPassport);

// Output passport information
Console.WriteLine(result.PassportInfo.GivenNames);
Console.WriteLine(result.PassportInfo.Country);
Console.WriteLine(result.PassportInfo.PassportNumber);
Console.WriteLine(result.PassportInfo.Surname);
Console.WriteLine(result.PassportInfo.DateOfBirth);
Console.WriteLine(result.PassportInfo.DateOfExpiry);
Imports IronOcr
Imports System

' Instantiate OCR engine
Private ocr = New IronTesseract()

Private inputPassport = New OcrInput()

inputPassport.LoadImage("passport.jpg")

' Perform OCR
Dim result As OcrPassportResult = ocr.ReadPassport(inputPassport)

' Output passport information
Console.WriteLine(result.PassportInfo.GivenNames)
Console.WriteLine(result.PassportInfo.Country)
Console.WriteLine(result.PassportInfo.PassportNumber)
Console.WriteLine(result.PassportInfo.Surname)
Console.WriteLine(result.PassportInfo.DateOfBirth)
Console.WriteLine(result.PassportInfo.DateOfExpiry)
VB   C#

Output

Result output

We then access the PassportInfo data member obtained from the OcrPassportResult object.

  • GivenNames: A property of PassportInfo returns the given names of the passport input as a string. This corresponds to the first MRZ data row with positions from 4 to 44.
  • Country: A property of PassportInfo returns the country of the passport input as a string. This corresponds to the first MRZ data row with positions from 2 to 3. The returned string would spell out the full name of the issuing country instead of the abbreviation. In our example, the USA returns to the United States of America.
  • PassportNumber: A property of PassportInfo returns the given names of the passport input as a string. This corresponds to the second MRZ data row, with positions from 1 to 9.
  • Surname: A property of PassportInfo returns the passport input's surname as a string. This corresponds to the first MRZ data row with positions from 4 to 44.
  • DateOfBirth: A property of PassportInfo returns the passport input's date of birth as a string in the format YYYY-MM-DD. This corresponds to the second MRZ data row with positions 14 to 19.
  • DateOfExpiry: A property member of PassportInfo returns the passport input's date of expiry as a string in the format YYYY-MM-DD. This corresponds to the second MRZ data row with positions 22 to 27.

Understanding the MRZ Information

IronOCR reads the MRZ information that is contained at the bottom two rows of any passport that follows the standard of (International Civil Aviation Organization) ICAO. The MRZ data consists of two data rows, each set of positions containing unique information. Here's a brief table on which information corresponds to the index of the Row; for all exceptions and unique identifiers, please refer to the ICAO documentation standards.

Example Passport Input:

MRZ location

First Row

Position Field Description
1Document TypeTypically 'P' for passport
2-3Issuing CountryThree-letter country code (ISO 3166-1 alpha-3)
4-44Surname and Given NamesSurname followed by '<<' and then given names separated by '<'

Second Row

Position Field Description
1-9Passport NumberUnique passport number
10Check Digit (Passport Number)Check digit for the passport number
11-13NationalityThree-letter nationality code (ISO 3166-1 alpha-3)
14-19Date of BirthDate of birth in YYMMDD format
20Check Digit (Date of Birth)Check digit for the date of birth
21SexGender ('M' for male, 'F' for female, 'X' for unspecified)
22-27Date of ExpiryExpiry date in YYMMDD format
28Check Digit (Date of Expiry)Check digit for the date of expiry
29-42Personal NumberOptional personal number (usually national ID number)
43Check Digit (Personal Number)Check digit for the personal number
44Check Digit (Composite)Overall check digit

Debugging

We could also verify the results from IronOCR by obtaining the raw extracted text from the passport image and the confidence level to confirm whether the extracted information is accurate. Using the example from above, we can access the Confidence and Text properties of the OcrPassportResult object.

:path=/static-assets/ocr/content-code-examples/how-to/read-passport-debug.cs
using IronOcr;
using System;

// Instantiate OCR engine
var ocr = new IronTesseract();

using var inputPassport = new OcrInput();

inputPassport.LoadImage("passport.jpg");

// Perform OCR
OcrPassportResult result = ocr.ReadPassport(inputPassport);

// Output Confidence level and raw extracted text
Console.WriteLine(result.Confidence);
Console.WriteLine(result.Text);
Imports IronOcr
Imports System

' Instantiate OCR engine
Private ocr = New IronTesseract()

Private inputPassport = New OcrInput()

inputPassport.LoadImage("passport.jpg")

' Perform OCR
Dim result As OcrPassportResult = ocr.ReadPassport(inputPassport)

' Output Confidence level and raw extracted text
Console.WriteLine(result.Confidence)
Console.WriteLine(result.Text)
VB   C#

Console output:

Debug
  • Confidence: The Confidence property from OcrPassportResult is a float indicating the OCR statistical accuracy confidence as an average of every character. This float would be lower if the passport image is blurry or contains other information. Where one is the highest and most confident, and 0 is the lowest and the least confident.

  • Text: The Text property from OcrPassportResult contains the raw, unparsed text extracted from the passport image. Developers could use this in unit tests to verify the extracted text of the passport image.