How to Extract Passport Data with IronOCR

In applications and systems such as counter check-in and security immigration at airports, where agents have to deal with a large volume of passports daily, having a reliable system that accurately extracts essential mission-critical information about the traveler is crucial to ensuring an efficient and streamlined process through immigration.

IronOCR is a reliable tool that makes extracting and reading data from a passport effortless. The process becomes straightforward with a simple call to the ReadPassport Method.

To use this function, you must also install the IronOcr.Extension.AdvancedScan package.

Extracting Passport Data Example

As an example, we will use a passport image as input to showcase the functionality of IronOCR. After loading the image using OcrInput, you can utilize the ReadPassport method to identify and extract information from the passport. This method returns an OcrPassportResult object, which contains properties such as GivenNames, Country, PassportNumber, Surname, DateOfBirth, and DateOfExpiry. All members of the PassportInfo object are strings.

Please note

  • The method currently only works for English-based passports.
  • Using advanced scan on .NET Framework requires the project to run on x64 architecture.

Passport Input

Sample image

Code

:path=/static-assets/ocr/content-code-examples/how-to/read-passport-read-passport.cs
using IronOcr;
using System;

// Instantiate OCR engine
var ocr = new IronTesseract();

using var inputPassport = new OcrInput();

inputPassport.LoadImage("passport.jpg");

// Perform OCR
OcrPassportResult result = ocr.ReadPassport(inputPassport);

// Output passport information
Console.WriteLine(result.PassportInfo.GivenNames);
Console.WriteLine(result.PassportInfo.Country);
Console.WriteLine(result.PassportInfo.PassportNumber);
Console.WriteLine(result.PassportInfo.Surname);
Console.WriteLine(result.PassportInfo.DateOfBirth);
Console.WriteLine(result.PassportInfo.DateOfExpiry);
Imports IronOcr
Imports System

' Instantiate OCR engine
Private ocr = New IronTesseract()

Private inputPassport = New OcrInput()

inputPassport.LoadImage("passport.jpg")

' Perform OCR
Dim result As OcrPassportResult = ocr.ReadPassport(inputPassport)

' Output passport information
Console.WriteLine(result.PassportInfo.GivenNames)
Console.WriteLine(result.PassportInfo.Country)
Console.WriteLine(result.PassportInfo.PassportNumber)
Console.WriteLine(result.PassportInfo.Surname)
Console.WriteLine(result.PassportInfo.DateOfBirth)
Console.WriteLine(result.PassportInfo.DateOfExpiry)
$vbLabelText   $csharpLabel

Output

Result output

We then access the PassportInfo data member obtained from the OcrPassportResult object.

  • GivenNames: A property of PassportInfo returns the given names of the passport input as a string. This corresponds to the first MRZ data row, with positions from 4 to 44.
  • Country: A property of PassportInfo returns the country of the passport input as a string. This corresponds to the first MRZ data row, with positions from 2 to 3. The returned string would spell out the full name of the issuing country instead of the abbreviation. In our example, the USA returns 'United States of America'.
  • PassportNumber: A property of PassportInfo returns the passport number of the passport input as a string. This corresponds to the second MRZ data row, with positions from 1 to 9.
  • Surname: A property of PassportInfo returns the passport input's surname as a string. This corresponds to the first MRZ data row, with positions from 4 to 44.
  • DateOfBirth: A property of PassportInfo returns the passport input's date of birth as a string in the format YYYY-MM-DD. This corresponds to the second MRZ data row, with positions 14 to 19.
  • DateOfExpiry: A property member of PassportInfo returns the passport input's date of expiry as a string in the format YYYY-MM-DD. This corresponds to the second MRZ data row, with positions 22 to 27.

Understanding The MRZ Information

IronOCR reads the MRZ information that is contained at the bottom two rows of any passport that follows the standard of the (International Civil Aviation Organization) ICAO. The MRZ data consists of two data rows, each set of positions containing unique information. Here's a brief table on which information corresponds to the index of the Row.

Example Input

MRZ location

First Row

Position Field Description
1Document TypeTypically 'P' for passport
2-3Issuing CountryThree-letter country code (ISO 3166-1 alpha-3)
4-44Surname and Given NamesSurname followed by '<<' and then given names separated by '<'

Second Row

Position Field Description
1-9Passport NumberUnique passport number
10Check Digit (Passport Number)Check digit for the passport number
11-13NationalityThree-letter nationality code (ISO 3166-1 alpha-3)
14-19Date of BirthDate of birth in YYMMDD format
20Check Digit (Date of Birth)Check digit for the date of birth
21SexGender ('M' for male, 'F' for female, 'X' for unspecified)
22-27Date of ExpiryExpiry date in YYMMDD format
28Check Digit (Date of Expiry)Check digit for the date of expiry
29-42Personal NumberOptional personal number (usually national ID number)
43Check Digit (Personal Number)Check digit for the personal number
44Check Digit (Composite)Overall check digit

Debugging

We could also verify the results from IronOCR by obtaining the raw extracted text from the passport image and the confidence level to confirm whether the extracted information is accurate. Using the example from above, we can access the Confidence and Text properties of the OcrPassportResult object.

:path=/static-assets/ocr/content-code-examples/how-to/read-passport-debug.cs
using IronOcr;
using System;

// Instantiate OCR engine
var ocr = new IronTesseract();

using var inputPassport = new OcrInput();

inputPassport.LoadImage("passport.jpg");

// Perform OCR
OcrPassportResult result = ocr.ReadPassport(inputPassport);

// Output Confidence level and raw extracted text
Console.WriteLine(result.Confidence);
Console.WriteLine(result.Text);
Imports IronOcr
Imports System

' Instantiate OCR engine
Private ocr = New IronTesseract()

Private inputPassport = New OcrInput()

inputPassport.LoadImage("passport.jpg")

' Perform OCR
Dim result As OcrPassportResult = ocr.ReadPassport(inputPassport)

' Output Confidence level and raw extracted text
Console.WriteLine(result.Confidence)
Console.WriteLine(result.Text)
$vbLabelText   $csharpLabel

Console output

Debug
  • Confidence: The Confidence property from OcrPassportResult is a float indicating the OCR statistical accuracy confidence as an average of every character. This float would be lower if the passport image is blurry or contains other information. One is the highest and most confident, and 0 is the lowest and the least confident.
  • Text: The Text property from OcrPassportResult contains the raw, unparsed text extracted from the passport image. Developers could use this in unit tests to verify the extracted text of the passport image.

Frequently Asked Questions

How can I extract passport data using OCR in C#?

You can use IronOCR's ReadPassport method to extract passport data in C#. First, install the IronOCR library and the IronOcr.Extension.AdvancedScan package. Import the passport image, and then call ReadPassport to retrieve the data.

What information can be extracted from a passport using IronOCR?

IronOCR can extract details such as given names, country, passport number, surname, date of birth, and expiry date from a passport.

What is the Machine-Readable Zone (MRZ) on a passport?

The MRZ is located at the bottom two rows of a passport and contains crucial data. IronOCR reads this zone to extract information according to ICAO standards.

What are the requirements for using the ReadPassport method in IronOCR?

You need to install the IronOcr.Extension.AdvancedScan package and ensure your project runs on x64 architecture for advanced scanning capabilities.

Can IronOCR be used to verify the accuracy of extracted passport data?

Yes, you can verify the accuracy of extracted data using the Confidence property of the OcrPassportResult object, where a value close to 1 indicates high confidence.

Which languages are supported by IronOCR for passport data extraction?

Currently, IronOCR supports English-based passports for data extraction.

How does IronOCR handle the text extracted from a passport?

The extracted data is stored in the OcrPassportResult object, with each member represented as a string for easy access and manipulation.

What is the 'Text' property in the OcrPassportResult object used for?

The 'Text' property allows you to access and verify the extracted text from a passport, ensuring the information is accurate and complete.

What should I do if the passport image includes headers or footers?

Ensure the passport image is devoid of headers or footers before using IronOCR, as these can interfere with the accuracy of the data extraction process.

What are common troubleshooting tips when using IronOCR for passports?

Verify that the correct packages are installed, ensure the image is clear and properly cropped, and check the architecture compatibility (x64) for optimal performance.

Curtis Chau
Technical Writer

Curtis Chau holds a Bachelor’s degree in Computer Science (Carleton University) and specializes in front-end development with expertise in Node.js, TypeScript, JavaScript, and React. Passionate about crafting intuitive and aesthetically pleasing user interfaces, Curtis enjoys working with modern frameworks and creating well-structured, visually appealing manuals.

...Read More

Reviewed by
Jeff Fritz
Jeffrey T. Fritz
Principal Program Manager - .NET Community Team
Jeff is also a Principal Program Manager for the .NET and Visual Studio teams. He is the executive producer of the .NET Conf virtual conference series and hosts 'Fritz and Friends' a live stream for developers that airs twice weekly where he talks tech and writes code together with viewers. Jeff writes workshops, presentations, and plans content for the largest Microsoft developer events including Microsoft Build, Microsoft Ignite, .NET Conf, and the Microsoft MVP Summit