How to extract passport data with IronOCR
In applications and systems such as counter check-in and security immigration at airports, where agents have to deal with a large volume of passports daily, having a reliable system that accurately extracts essential mission-critical information about the traveler is crucial to ensuring an efficient and streamlined process through immigration.
IronOCR is a reliable tool that makes extracting and reading data from a passport effortless. The process becomes straightforward with a simple call to the ReadPassport
Method.
How to extract passport data with IronOCR
- Download a C# library to read passports
- Import the passport image for reading
- Ensure the document contains only the passport image, without headers or footers
- Use the
ReadPassport
method to extract data from the image - Access the OcrPassportResult property to view and further manipulate the extracted passport data
Start using IronOCR in your project today with a free trial.
To use this function, you must also install the IronOcr.Extension.AdvancedScan package.
Extracting Passport Data Example
As an example, we will use a passport image as input to showcase the functionality of IronOCR. After loading the image using OcrInput, you can utilize the ReadPassport
method to identify and extract information from the passport. This method returns an OcrPassportResult object, which contains properties such as GivenNames, Country, PassportNumber, Surname, DateOfBirth, and DateOfExpiry. All members of the PassportInfo object are strings.
Please note
- The method currently only works for English-based passports.
- Using advanced scan on .NET Framework requires the project to run on x64 architecture.
Passport Input
Code
:path=/static-assets/ocr/content-code-examples/how-to/read-passport-read-passport.cs
using IronOcr;
using System;
// Instantiate OCR engine
var ocr = new IronTesseract();
using var inputPassport = new OcrInput();
inputPassport.LoadImage("passport.jpg");
// Perform OCR
OcrPassportResult result = ocr.ReadPassport(inputPassport);
// Output passport information
Console.WriteLine(result.PassportInfo.GivenNames);
Console.WriteLine(result.PassportInfo.Country);
Console.WriteLine(result.PassportInfo.PassportNumber);
Console.WriteLine(result.PassportInfo.Surname);
Console.WriteLine(result.PassportInfo.DateOfBirth);
Console.WriteLine(result.PassportInfo.DateOfExpiry);
Imports IronOcr
Imports System
' Instantiate OCR engine
Private ocr = New IronTesseract()
Private inputPassport = New OcrInput()
inputPassport.LoadImage("passport.jpg")
' Perform OCR
Dim result As OcrPassportResult = ocr.ReadPassport(inputPassport)
' Output passport information
Console.WriteLine(result.PassportInfo.GivenNames)
Console.WriteLine(result.PassportInfo.Country)
Console.WriteLine(result.PassportInfo.PassportNumber)
Console.WriteLine(result.PassportInfo.Surname)
Console.WriteLine(result.PassportInfo.DateOfBirth)
Console.WriteLine(result.PassportInfo.DateOfExpiry)
Output
We then access the PassportInfo
data member obtained from the OcrPassportResult
object.
- GivenNames: A property of
PassportInfo
returns the given names of the passport input as a string. This corresponds to the first MRZ data row with positions from 4 to 44. - Country: A property of
PassportInfo
returns the country of the passport input as a string. This corresponds to the first MRZ data row with positions from 2 to 3. The returned string would spell out the full name of the issuing country instead of the abbreviation. In our example, the USA returns to the United States of America. - PassportNumber: A property of
PassportInfo
returns the given names of the passport input as a string. This corresponds to the second MRZ data row, with positions from 1 to 9. - Surname: A property of
PassportInfo
returns the passport input's surname as a string. This corresponds to the first MRZ data row with positions from 4 to 44. - DateOfBirth: A property of
PassportInfo
returns the passport input's date of birth as a string in the format YYYY-MM-DD. This corresponds to the second MRZ data row with positions 14 to 19. - DateOfExpiry: A property member of
PassportInfo
returns the passport input's date of expiry as a string in the format YYYY-MM-DD. This corresponds to the second MRZ data row with positions 22 to 27.
Understanding the MRZ Information
IronOCR reads the MRZ information that is contained at the bottom two rows of any passport that follows the standard of (International Civil Aviation Organization) ICAO. The MRZ data consists of two data rows, each set of positions containing unique information. Here's a brief table on which information corresponds to the index of the Row; for all exceptions and unique identifiers, please refer to the ICAO documentation standards.
Example Passport Input:
First Row
Position | Field | Description |
---|---|---|
1 | Document Type | Typically 'P' for passport |
2-3 | Issuing Country | Three-letter country code (ISO 3166-1 alpha-3) |
4-44 | Surname and Given Names | Surname followed by '<<' and then given names separated by '<' |
Second Row
Position | Field | Description |
---|---|---|
1-9 | Passport Number | Unique passport number |
10 | Check Digit (Passport Number) | Check digit for the passport number |
11-13 | Nationality | Three-letter nationality code (ISO 3166-1 alpha-3) |
14-19 | Date of Birth | Date of birth in YYMMDD format |
20 | Check Digit (Date of Birth) | Check digit for the date of birth |
21 | Sex | Gender ('M' for male, 'F' for female, 'X' for unspecified) |
22-27 | Date of Expiry | Expiry date in YYMMDD format |
28 | Check Digit (Date of Expiry) | Check digit for the date of expiry |
29-42 | Personal Number | Optional personal number (usually national ID number) |
43 | Check Digit (Personal Number) | Check digit for the personal number |
44 | Check Digit (Composite) | Overall check digit |
Debugging
We could also verify the results from IronOCR by obtaining the raw extracted text from the passport image and the confidence level to confirm whether the extracted information is accurate. Using the example from above, we can access the Confidence
and Text
properties of the OcrPassportResult
object.
:path=/static-assets/ocr/content-code-examples/how-to/read-passport-debug.cs
using IronOcr;
using System;
// Instantiate OCR engine
var ocr = new IronTesseract();
using var inputPassport = new OcrInput();
inputPassport.LoadImage("passport.jpg");
// Perform OCR
OcrPassportResult result = ocr.ReadPassport(inputPassport);
// Output Confidence level and raw extracted text
Console.WriteLine(result.Confidence);
Console.WriteLine(result.Text);
Imports IronOcr
Imports System
' Instantiate OCR engine
Private ocr = New IronTesseract()
Private inputPassport = New OcrInput()
inputPassport.LoadImage("passport.jpg")
' Perform OCR
Dim result As OcrPassportResult = ocr.ReadPassport(inputPassport)
' Output Confidence level and raw extracted text
Console.WriteLine(result.Confidence)
Console.WriteLine(result.Text)
Console output:
-
Confidence: The
Confidence
property fromOcrPassportResult
is a float indicating the OCR statistical accuracy confidence as an average of every character. This float would be lower if the passport image is blurry or contains other information. Where one is the highest and most confident, and 0 is the lowest and the least confident. - Text: The
Text
property fromOcrPassportResult
contains the raw, unparsed text extracted from the passport image. Developers could use this in unit tests to verify the extracted text of the passport image.