如何使用IronOCR提取護照數據

Curtis Chau

2024年8月7日

已更新 2025年2月16日

Translated

View the article in English

在像是機場的櫃檯報到和安檢入境等應用程式和系統中，代理需每日處理大量護照，擁有一套可靠的系統來準確提取關於旅客的重要任務關鍵資訊，對於確保高效且順暢的入境過程至關重要。

IronOCR 是一款可靠的工具，可以輕鬆提取和閱讀護照上的數據。通過簡單調用 ReadPassport 方法，這個過程變得簡單明瞭。

如何使用IronOCR提取護照數據

下載 C# 函式庫來讀取護照
導入護照圖像以進行讀取
確保文件僅包含護照照片，沒有頁眉或頁腳。
使用ReadPassport方法從影像中提取資料
存取 OcrPassportResult 屬性以查看和進一步操作提取的護照數據

立即在您的專案中使用IronOCR，並享受免費試用。

第一步：

若要使用此功能，您還必須安裝 IronOcr.Extension.AdvancedScan 套件。

護照數據提取範例

作為範例，我們將使用護照圖片作為輸入來展示 IronOCR 的功能。在使用OcrInput加載圖像後，您可以利用ReadPassport方法來識別和提取護照中的信息。此方法返回一個OcrPassportResult物件，其中包含屬性，如GivenNames、Country、PassportNumber、Surname、DateOfBirth和DateOfExpiry。 PassportInfo物件的所有成員都是字串。

請注意

該方法目前僅適用於以英語為基礎的護照。
使用高級掃描功能在 .NET Framework 上運行需要項目在 x64 架構上運行。

護照輸入

代碼

:path=/static-assets/ocr/content-code-examples/how-to/read-passport-read-passport.cs

using IronOcr;
using System;

// Instantiate OCR engine
var ocr = new IronTesseract();

using var inputPassport = new OcrInput();

inputPassport.LoadImage("passport.jpg");

// Perform OCR
OcrPassportResult result = ocr.ReadPassport(inputPassport);

// Output passport information
Console.WriteLine(result.PassportInfo.GivenNames);
Console.WriteLine(result.PassportInfo.Country);
Console.WriteLine(result.PassportInfo.PassportNumber);
Console.WriteLine(result.PassportInfo.Surname);
Console.WriteLine(result.PassportInfo.DateOfBirth);
Console.WriteLine(result.PassportInfo.DateOfExpiry);

Imports IronOcr
Imports System

' Instantiate OCR engine
Private ocr = New IronTesseract()

Private inputPassport = New OcrInput()

inputPassport.LoadImage("passport.jpg")

' Perform OCR
Dim result As OcrPassportResult = ocr.ReadPassport(inputPassport)

' Output passport information
Console.WriteLine(result.PassportInfo.GivenNames)
Console.WriteLine(result.PassportInfo.Country)
Console.WriteLine(result.PassportInfo.PassportNumber)
Console.WriteLine(result.PassportInfo.Surname)
Console.WriteLine(result.PassportInfo.DateOfBirth)
Console.WriteLine(result.PassportInfo.DateOfExpiry)

$vbLabelText $csharpLabel

輸出

然後我們訪問從OcrPassportResult對象獲得的PassportInfo數據成員。

GivenNames: PassportInfo 的一個屬性，以字串形式返回護照輸入的名子。這對應於第一行 MRZ 數據，位置從 4 至 44。
Country：PassportInfo的屬性，以字串形式返回護照輸入的國家。這對應於第一個 MRZ 數據行，位置從 2 到 3。返回的字串將拼出發證國的全名，而不是縮寫。在我們的範例中，USA 會翻譯為美國。
護照號碼：PassportInfo 的一個屬性，返回護照輸入的給定名稱作為字串。這對應於第二行 MRZ 數據，位置從1到9。
姓氏：PassportInfo 的屬性，將護照輸入的姓氏作為字符串返回。這對應於第一行 MRZ 數據，位置從 4 至 44。
DateOfBirth: PassportInfo 的一個屬性，返回護照輸入的出生日期，以 YYYY-MM-DD 格式的字符串表示。這對應於第二行 MRZ 資料，位置 14 到 19。
DateOfExpiry：PassportInfo 的一個屬性，返回護照輸入的到期日期，格式為 YYYY-MM-DD 的字串。這對應於第二個MRZ數據行，位置22到27。

理解 MRZ 資訊

IronOCR 讀取符合 (國際民航組織) ICAO 標準的任何護照底部兩行所包含的 MRZ 信息。 MRZ資料由兩行數據組成，每組位置包含獨特的信息。以下是一個簡短的表格，其中信息對應於行的索引；如需了解所有例外和唯一標識符，請參閱ICAO 文檔標準。

範例輸入：

第一列

Position	Field	Description
1	Document Type	Typically 'P' for passport
2-3	Issuing Country	Three-letter country code (ISO 3166-1 alpha-3)
4-44	Surname and Given Names	Surname followed by '<<' and then given names separated by '<'

第二行

Position	Field	Description
1-9	Passport Number	Unique passport number
10	Check Digit (Passport Number)	Check digit for the passport number
11-13	Nationality	Three-letter nationality code (ISO 3166-1 alpha-3)
14-19	Date of Birth	Date of birth in YYMMDD format
20	Check Digit (Date of Birth)	Check digit for the date of birth
21	Sex	Gender ('M' for male, 'F' for female, 'X' for unspecified)
22-27	Date of Expiry	Expiry date in YYMMDD format
28	Check Digit (Date of Expiry)	Check digit for the date of expiry
29-42	Personal Number	Optional personal number (usually national ID number)
43	Check Digit (Personal Number)	Check digit for the personal number
44	Check Digit (Composite)	Overall check digit

除錯

我們還可以通過從護照圖像中獲取原始提取的文本和信心水平來驗證IronOCR的結果，以確認提取的信息是否準確。使用上面的範例，我們可以訪問 OcrPassportResult 物件的 Confidence 和 Text 屬性。

:path=/static-assets/ocr/content-code-examples/how-to/read-passport-debug.cs

using IronOcr;
using System;

// Instantiate OCR engine
var ocr = new IronTesseract();

using var inputPassport = new OcrInput();

inputPassport.LoadImage("passport.jpg");

// Perform OCR
OcrPassportResult result = ocr.ReadPassport(inputPassport);

// Output Confidence level and raw extracted text
Console.WriteLine(result.Confidence);
Console.WriteLine(result.Text);

Imports IronOcr
Imports System

' Instantiate OCR engine
Private ocr = New IronTesseract()

Private inputPassport = New OcrInput()

inputPassport.LoadImage("passport.jpg")

' Perform OCR
Dim result As OcrPassportResult = ocr.ReadPassport(inputPassport)

' Output Confidence level and raw extracted text
Console.WriteLine(result.Confidence)
Console.WriteLine(result.Text)

$vbLabelText $csharpLabel

控制台輸出：

信心：OcrPassportResult 的 Confidence 屬性是一個浮點數，表示 OCR 統計準確性的信心，作為每個字符的平均值。如果護照圖像模糊或包含其他信息，此浮動數值將會較低。 1代表最高且最有信心，0代表最低且最沒有信心。
Text：OcrPassportResult 的 Text 屬性包含從護照圖像中提取的原始未解析文字。開發人員可以在單元測試中使用此功能來驗證護照圖像中提取的文本。