Extract Passport Data in C# with IronOCR

IronOCR's ReadPassport method extracts structured data from passport images including names, passport numbers, birth dates, and expiry dates in a single line of C# code, making it ideal for immigration and security applications.

In applications and systems such as check-in counters and security immigration at airports, where agents handle large volumes of passports daily, having a reliable system that accurately extracts mission-critical traveler information is crucial to ensuring an efficient and streamlined immigration process. The IronOCR library provides advanced OCR capabilities specifically optimized for passport reading, leveraging Tesseract 5 under the hood with machine learning enhancements.

Quickstart: Extract Passport MRZ Info in One Line

This example shows how to load a passport image using OcrInput, use ReadPassport() to extract data, and access structured fields like names, number, and dates from the returned PassportInfo. No complex setup required—just one straightforward line. Unlike traditional Tesseract implementations, IronOCR provides a simplified API specifically designed for document extraction.

Nuget IconGet started making PDFs with NuGet now:

  1. Install IronOCR with NuGet Package Manager

    PM > Install-Package IronOcr

  2. Copy and run this code snippet.

    var passportInfo = new IronOcr.IronTesseract().ReadPassport(new IronOcr.OcrInput("passport.jpg")).PassportInfo;
  3. Deploy to test on your live environment

    Start using IronOCR in your project today with a free trial
    arrow pointer

How Do I Extract Passport Data in C#?

As an example, we will use a passport image as input to showcase the functionality of IronOCR. After loading the image using OcrInput, you can utilize the ReadPassport method to identify and extract information from the passport. This method returns an OcrPassportResult object, which contains properties such as GivenNames, Country, PassportNumber, Surname, DateOfBirth, and DateOfExpiry. All members of the PassportInfo object are strings.

The ReadPassport method is part of IronOCR's specialized document reading capabilities, which also includes methods for reading license plates, MICR cheques, and other structured documents. The method uses advanced computer vision techniques to locate and extract the MRZ (Machine Readable Zone) area automatically.

Please note

  • The method currently works only for English-based passports.
  • Using advanced scan on .NET Framework requires the project to run on x64 architecture.
  • For Mac users, please note that the ReadPassport method currently doesn't automatically rotate the input. When using the input, please ensure the MRZ is always at the bottom of the file; otherwise, the process will fail.

What Passport Image Format Should I Use?

IronOCR supports various image formats including JPG, PNG, TIFF, and BMP. For optimal results, ensure your passport image has adequate resolution (at least 300 DPI) and proper lighting. The DPI settings can be adjusted if working with lower quality scans.

US passport data page example showing biographical fields, dates, and machine readable zone for data extraction demo

What Code Do I Need to Extract Passport Data?

The following example demonstrates the complete process of passport data extraction. For applications processing multiple passports, consider implementing multithreading support to improve performance. You can also track OCR progress for long-running operations.

:path=/static-assets/ocr/content-code-examples/how-to/read-passport-read-passport.cs
using IronOcr;
using System;

// Instantiate OCR engine
var ocr = new IronTesseract();

using var inputPassport = new OcrInput();

inputPassport.LoadImage("passport.jpg");

// Perform OCR
OcrPassportResult result = ocr.ReadPassport(inputPassport);

// Output passport information
Console.WriteLine(result.PassportInfo.GivenNames);
Console.WriteLine(result.PassportInfo.Country);
Console.WriteLine(result.PassportInfo.PassportNumber);
Console.WriteLine(result.PassportInfo.Surname);
Console.WriteLine(result.PassportInfo.DateOfBirth);
Console.WriteLine(result.PassportInfo.DateOfExpiry);
Imports IronOcr
Imports System

' Instantiate OCR engine
Private ocr = New IronTesseract()

Private inputPassport = New OcrInput()

inputPassport.LoadImage("passport.jpg")

' Perform OCR
Dim result As OcrPassportResult = ocr.ReadPassport(inputPassport)

' Output passport information
Console.WriteLine(result.PassportInfo.GivenNames)
Console.WriteLine(result.PassportInfo.Country)
Console.WriteLine(result.PassportInfo.PassportNumber)
Console.WriteLine(result.PassportInfo.Surname)
Console.WriteLine(result.PassportInfo.DateOfBirth)
Console.WriteLine(result.PassportInfo.DateOfExpiry)
$vbLabelText   $csharpLabel

What Output Can I Expect from ReadPassport?

The extracted data is returned in a structured format that makes it easy to integrate with existing systems. The OcrResult class provides comprehensive access to all extracted information.

Debug console showing extracted passport data: name, country, passport number, and dates from OCR processing

We then access the PassportInfo data member obtained from the OcrPassportResult object. The extraction process automatically handles various passport layouts and formats, providing consistent results across different issuing countries.

  • GivenNames: A property of PassportInfo returns the given names of the passport input as a string. This corresponds to the first MRZ data row, positions 4 to 44.
  • Country: A property of PassportInfo returns the country of the passport input as a string. This corresponds to the first MRZ data row, positions 2 to 3. The returned string spells out the full name of the issuing country instead of the abbreviation. In our example, USA returns 'United States of America'.
  • PassportNumber: A property of PassportInfo returns the passport number of the passport input as a string. This corresponds to the second MRZ data row, positions 1 to 9.
  • Surname: A property of PassportInfo returns the passport input's surname as a string. This corresponds to the first MRZ data row, positions 4 to 44.
  • DateOfBirth: A property of PassportInfo returns the passport input's date of birth as a string in the format YYYY-MM-DD. This corresponds to the second MRZ data row, positions 14 to 19.
  • DateOfExpiry: A property member of PassportInfo returns the passport input's date of expiry as a string in the format YYYY-MM-DD. This corresponds to the second MRZ data row, positions 22 to 27.

What MRZ Information Can I Extract from Passports?

IronOCR reads the MRZ information contained at the bottom two rows of any passport that follows the standard of the (International Civil Aviation Organization) ICAO. The MRZ data consists of two data rows, with each set of positions containing unique information. Here's a brief table showing which information corresponds to the index of the row.

The MRZ parsing functionality in IronOCR is designed to handle variations in print quality and image orientation. For challenging documents, you can apply image correction filters to improve recognition accuracy.

What Does the MRZ Section Look Like?

The MRZ is typically located at the bottom of the passport page and consists of two lines of standardized text. Understanding the MRZ structure helps in troubleshooting extraction issues and validating results.

US passport page with Machine Readable Zone (MRZ) highlighted in red box showing encoded alphanumeric data lines

First Row

Position Field Description
1Document TypeTypically 'P' for passport
2-3Issuing CountryThree-letter country code (ISO 3166-1 alpha-3)
4-44Surname and Given NamesSurname followed by '<<' and then given names separated by '<'

Second Row

Position Field Description
1-9Passport NumberUnique passport number
10Check Digit (Passport Number)Check digit for the passport number
11-13NationalityThree-letter nationality code (ISO 3166-1 alpha-3)
14-19Date of BirthDate of birth in YYMMDD format
20Check Digit (Date of Birth)Check digit for the date of birth
21SexGender ('M' for male, 'F' for female, 'X' for unspecified)
22-27Date of ExpiryExpiry date in YYMMDD format
28Check Digit (Date of Expiry)Check digit for the date of expiry
29-42Personal NumberOptional personal number (usually national ID number)
43Check Digit (Personal Number)Check digit for the personal number
44Check Digit (Composite)Overall check digit

How Can I Debug and Verify Passport Extraction Results?

We can also verify the results from IronOCR by obtaining the raw extracted text from the passport image and the confidence level to confirm whether the extracted information is accurate. Using the example from above, we can access the Confidence and Text properties of the OcrPassportResult object.

For debugging purposes, you might want to highlight text regions to visually verify what areas were recognized. This feature is particularly useful when troubleshooting extraction issues or optimizing scan regions.

:path=/static-assets/ocr/content-code-examples/how-to/read-passport-debug.cs
using IronOcr;
using System;

// Instantiate OCR engine
var ocr = new IronTesseract();

using var inputPassport = new OcrInput();

inputPassport.LoadImage("passport.jpg");

// Perform OCR
OcrPassportResult result = ocr.ReadPassport(inputPassport);

// Output Confidence level and raw extracted text
Console.WriteLine(result.Confidence);
Console.WriteLine(result.Text);
Imports IronOcr
Imports System

' Instantiate OCR engine
Private ocr = New IronTesseract()

Private inputPassport = New OcrInput()

inputPassport.LoadImage("passport.jpg")

' Perform OCR
Dim result As OcrPassportResult = ocr.ReadPassport(inputPassport)

' Output Confidence level and raw extracted text
Console.WriteLine(result.Confidence)
Console.WriteLine(result.Text)
$vbLabelText   $csharpLabel

VS Code debug console showing passport data output with personal details, countries, and encoded strings

  • Confidence: The Confidence property from OcrPassportResult is a float indicating the OCR statistical accuracy confidence as an average of every character. This float is lower if the passport image is blurry or contains other information. One is the highest and most confident, and zero is the lowest and least confident. For production applications, consider implementing result confidence thresholds to ensure data quality.
  • Text: The Text property from OcrPassportResult contains the raw, unparsed text extracted from the passport image. Developers can use this in unit tests to verify the extracted text of the passport image. For advanced scenarios, you can export results in hOCR format for further analysis.

Best Practices for Passport Scanning Applications

When implementing passport scanning in production environments, consider these additional factors:

  1. Image Quality: Ensure input images meet minimum quality standards. The Filter Wizard can help optimize images for better recognition.

  2. Performance: For high-volume processing, implement async support and consider batch processing multiple passports.

  3. Security: Since passport data is sensitive, ensure proper data handling and consider integrating with secure document management systems.

  4. Validation: Implement check digit validation for the extracted MRZ data to ensure accuracy. The MRZ format includes multiple check digits that can be used to verify the integrity of the extracted information.

  5. Error Handling: Implement robust error handling for cases where passport images may be damaged, poorly lit, or contain non-standard formats.

Frequently Asked Questions

How do I extract passport data from images in C#?

You can extract passport data using IronOCR's ReadPassport method. Simply load your passport image with OcrInput and call ReadPassport() to get structured data including names, passport numbers, birth dates, and expiry dates in a single line of code.

What passport information can be extracted automatically?

IronOCR's ReadPassport method extracts GivenNames, Country, PassportNumber, Surname, DateOfBirth, and DateOfExpiry from passport images. All data is returned as strings in a structured PassportInfo object.

Do I need complex setup to read passport MRZ data?

No complex setup is required. IronOCR provides a simplified API that extracts passport MRZ data in just one line of code, unlike traditional Tesseract implementations which require more configuration.

What technology powers the passport reading capability?

IronOCR leverages Tesseract 5 under the hood with machine learning enhancements and advanced computer vision techniques to automatically locate and extract the MRZ (Machine Readable Zone) area from passport images.

Can this be used for airport immigration systems?

Yes, IronOCR is ideal for immigration and security applications at airports where agents handle large volumes of passports daily. It provides reliable extraction of mission-critical traveler information to ensure efficient immigration processes.

Curtis Chau
Technical Writer

Curtis Chau holds a Bachelor’s degree in Computer Science (Carleton University) and specializes in front-end development with expertise in Node.js, TypeScript, JavaScript, and React. Passionate about crafting intuitive and aesthetically pleasing user interfaces, Curtis enjoys working with modern frameworks and creating well-structured, visually appealing manuals.

...

Read More
Reviewed by
Jeff Fritz
Jeffrey T. Fritz
Principal Program Manager - .NET Community Team
Jeff is also a Principal Program Manager for the .NET and Visual Studio teams. He is the executive producer of the .NET Conf virtual conference series and hosts 'Fritz and Friends' a live stream for developers that airs twice weekly where he talks tech and writes code together with viewers. Jeff writes workshops, presentations, and plans content for the largest Microsoft developer events including Microsoft Build, Microsoft Ignite, .NET Conf, and the Microsoft MVP Summit
Ready to Get Started?
Nuget Downloads 5,246,844 | Version: 2025.12 just released