跳至页脚内容
使用 IRONOCR

如何使用 C# 中的 OCR 读取身份证件

Optical Character Recognition (OCR) technology has become an invaluable tool for automating the extraction of text from images, enabling efficient data retrieval and analysis and avoiding human error. This technology can be used to read driving licenses, passports, institution official documents, ID cards, residence permit cards, and travel documents of multiple languages and different countries to the exact expiration date, nationality, date of birth, etc. All the data extracted can be further fed to machine learning and artificial intelligence software products.

In this article, we will explore how to leverage IronOCR, a powerful OCR library in C# from Iron Software, to read and extract information from identity documents. IronOCR provides a straightforward and flexible OCR solution in the form of APIs for OCR tasks, making it an excellent choice for developers looking to integrate OCR software capabilities into their applications.

IronOCR enables computers to recognize and extract text from images, scan existing documents, or any other visual representation of text. To extract data, it involves a series of complex processes that mimic the way humans perceive and interpret text visually. This process involves Image Pre-processing, Text Detection, Character Segmentation, Feature Extraction, Character Recognition, and Post-Processing to correct errors.

How to Read Identity Documents Using OCR in C#

  1. Create a new C# project in Visual Studio
  2. Install the IronOCR .NET library and add it to your project.
  3. Read Identity document images using the IronOCR library.
  4. Read the identity documents from PDFs.

IronOCR, crafted and maintained by Iron Software, serves as a powerful library for C# Software Engineers, facilitating OCR, Barcode Scanning, and Text Extraction within .NET projects.

Key Features of IronOCR

Text Reading Versatility

Capable of reading relevant data from various formats, including images (JPEG, PNG, GIFF, TIFF, BMP), Streams, and PDFs.

Image Enhancement

Corrects low-quality scans and photos through an array of filters such as Deskew, Denoise, Binarize, Enhance Resolution, Dilate, and more.

Barcode Recognition

Supports reading barcodes from a wide range of formats, encompassing over 20 barcode types, with added QR code recognition.

Tesseract OCR Integration

Utilizes the latest build of Tesseract OCR, finely tuned for optimal performance in extracting text from images.

Flexible Output Options

Allows the export of searchable PDFs, HTML, and text content from image files, offering flexibility in managing extracted information.

Now, let's delve into the development of a demo application that utilizes IronOCR to read ID documents.

Prerequisites

  1. Visual Studio: Ensure you have Visual Studio or any other C# development environment installed.
  2. NuGet Package Manager: Make sure you can use NuGet to manage packages in your project.

Step 1: Create a New C# Project in Visual Studio

Begin by creating a fresh C# console application in Visual Studio, or alternatively, utilize an existing project. Select "Add New Project" from the Menu, then select console application from the templates below.

How to Read Identity Documents Using OCR in C#: Figure 1

Provide a project name and location in the below windows.

How to Read Identity Documents Using OCR in C#: Figure 2

Select the required .NET Version.

How to Read Identity Documents Using OCR in C#: Figure 3

Click the Create button to create the new project.

Step 2: Install the IronOCR library and add it to your project.

IronOCR can be found in the NuGet package manager and can be installed using the package manager console with the following command:

Install-Package IronOcr

IronOCR can also be installed using Visual Studio. Open the NuGet Package manager, search for IronOCR like below, and click install.

How to Read Identity Documents Using OCR in C#: Figure 5

Once installed, the application is ready to make use of IronOCR to read any identity document for data extraction and identity verification, reducing manual data entry work.

Step 3: Read Identity Document Images using the IronOCR library

Using OCR for processing ID documents involves many steps, which are detailed below.

Image Pre-processing

The OCR ID document processing begins with acquiring an image containing text. This image could be scanned ID documents, a photograph of ID cards, or any other visual representation of text. Identity card pre-processing steps may include resizing, noise reduction, and enhancement to improve the quality and clarity of the image.

Text Detection

OCR algorithms need to locate the specific data areas within the image where text is present. This step involves identifying text regions or bounding boxes.

Character Segmentation

Once text regions or data fields are identified, the image is further analyzed to segment individual characters. This step is crucial for languages that use distinct characters, like English or Chinese.

Feature Extraction

OCR algorithms analyze the segmented characters to extract features that help in differentiating between different characters. These features might include stroke patterns, shape, and spatial relationships between elements.

Character Recognition

Based on the extracted features, OCR algorithms classify each segmented character and assign it a corresponding textual representation. Machine learning models, such as neural networks, are often employed in this step.

Post-Processing

The recognized characters may undergo post-processing to correct errors or enhance accuracy. This step may involve dictionary-based corrections, context analysis, or language modeling.

IronOCR library takes care of all the above steps and allows us to perform OCR using just a few lines of code, saving time-consuming tedious tasks.

using IronOcr;

class Program
{
    public static void Main()
    {
        // Configure IronTesseract with language and other settings
        var ocrTesseract = new IronTesseract()
        {
            Language = OcrLanguage.EnglishBest,
            Configuration = new TesseractConfiguration()
            {
                ReadBarCodes = false, // Disable reading of barcodes
                BlackListCharacters = "`ë|^", // Blacklist specific characters
                PageSegmentationMode = TesseractPageSegmentationMode.AutoOsd, // Set page segmentation mode
            }
        };

        // Define the OCR input image
        using var ocrInput = new OcrInput("id1.png");

        // Perform OCR on the input image
        var ocrResult = ocrTesseract.Read(ocrInput);

        // Display the extracted text
        Console.WriteLine(ocrResult.Text);
    }
}
using IronOcr;

class Program
{
    public static void Main()
    {
        // Configure IronTesseract with language and other settings
        var ocrTesseract = new IronTesseract()
        {
            Language = OcrLanguage.EnglishBest,
            Configuration = new TesseractConfiguration()
            {
                ReadBarCodes = false, // Disable reading of barcodes
                BlackListCharacters = "`ë|^", // Blacklist specific characters
                PageSegmentationMode = TesseractPageSegmentationMode.AutoOsd, // Set page segmentation mode
            }
        };

        // Define the OCR input image
        using var ocrInput = new OcrInput("id1.png");

        // Perform OCR on the input image
        var ocrResult = ocrTesseract.Read(ocrInput);

        // Display the extracted text
        Console.WriteLine(ocrResult.Text);
    }
}
Imports IronOcr

Friend Class Program
	Public Shared Sub Main()
		' Configure IronTesseract with language and other settings
		Dim ocrTesseract = New IronTesseract() With {
			.Language = OcrLanguage.EnglishBest,
			.Configuration = New TesseractConfiguration() With {
				.ReadBarCodes = False,
				.BlackListCharacters = "`ë|^",
				.PageSegmentationMode = TesseractPageSegmentationMode.AutoOsd
			}
		}

		' Define the OCR input image
		Dim ocrInput As New OcrInput("id1.png")

		' Perform OCR on the input image
		Dim ocrResult = ocrTesseract.Read(ocrInput)

		' Display the extracted text
		Console.WriteLine(ocrResult.Text)
	End Sub
End Class
$vbLabelText   $csharpLabel

Input Image

Below is a sample image used as input for the program.

How to Read Identity Documents Using OCR in C#: Figure 6

Output

How to Read Identity Documents Using OCR in C#: Figure 7

Code Explanation

The above code uses the IronOCR library to read all the text fields from the ID document. We use the IronTesseract class from the IronOCR library and configure it to use the English language and some blacklisted characters. Then we declare the OCR input using the OcrInput class and read the text from the image. The extracted text fields can be seen in the console output.

Step 4: Read Identity documents from PDFs.

We can also read from PDF documents. For this, we can use the IronPDF library from IronSoftware. First, install the library like below:

Install-Package IronOcr
using IronOcr;
using IronPdf;

class Program
{
    public static void Main()
    {
        // Load the PDF document
        var pdfReader = new PdfDocument("id1.pdf");

        // Initialize IronTesseract for OCR
        var ocrTesseract = new IronTesseract();

        // Create OCR input from the PDF stream
        using var ocrInput = new OcrInput();
        ocrInput.AddPdf(pdfReader.Stream);

        // Perform OCR on the PDF input
        var ocrResult = ocrTesseract.Read(ocrInput);

        // Display the extracted text
        Console.WriteLine(ocrResult.Text);
    }
}
using IronOcr;
using IronPdf;

class Program
{
    public static void Main()
    {
        // Load the PDF document
        var pdfReader = new PdfDocument("id1.pdf");

        // Initialize IronTesseract for OCR
        var ocrTesseract = new IronTesseract();

        // Create OCR input from the PDF stream
        using var ocrInput = new OcrInput();
        ocrInput.AddPdf(pdfReader.Stream);

        // Perform OCR on the PDF input
        var ocrResult = ocrTesseract.Read(ocrInput);

        // Display the extracted text
        Console.WriteLine(ocrResult.Text);
    }
}
Imports IronOcr
Imports IronPdf

Friend Class Program
	Public Shared Sub Main()
		' Load the PDF document
		Dim pdfReader = New PdfDocument("id1.pdf")

		' Initialize IronTesseract for OCR
		Dim ocrTesseract = New IronTesseract()

		' Create OCR input from the PDF stream
		Dim ocrInput As New OcrInput()
		ocrInput.AddPdf(pdfReader.Stream)

		' Perform OCR on the PDF input
		Dim ocrResult = ocrTesseract.Read(ocrInput)

		' Display the extracted text
		Console.WriteLine(ocrResult.Text)
	End Sub
End Class
$vbLabelText   $csharpLabel

The above code uses IronPDF to load the id1.pdf document, and the PDF is passed as a stream to OcrInput and ocrTesseract.

Output

How to Read Identity Documents Using OCR in C#: Figure 9

Licensing (Free Trial Available)

To use IronOCR, you'll need a license key. This key needs to be placed in appsettings.json.

{
    "IRONOCR-LICENSE-KEY": "your license key"
}

Provide a user email ID to get a trial license.

How to Read Identity Documents Using OCR in C#: Figure 10

Use cases

1. Identity Verification in Financial Services:

  • Use Case: Banks and financial institutions utilize OCR to read identity documents such as passports, driver's licenses, and ID cards during the customer onboarding and KYC process.
  • Benefits: Ensures accurate and efficient identity verification for account creation, loan applications, and other financial transactions.

2. Border Control and Immigration:

  • Use Case: Immigration authorities employ OCR technology to read and authenticate information from passports and visas at border checkpoints.
  • Benefits: Streamlines the immigration process, enhances security, and reduces manual data entry errors.

3. Access Control and Security:

  • Use Case: OCR is used in access control systems to read information from ID cards, employee badges, or facial recognition scans for secure entry into buildings or restricted areas.
  • Benefits: Enhances security by ensuring only authorized individuals gain access and provides a digital record of entries.

4. E-Government Services:

  • Use Case: Government agencies use OCR to process and verify ID documents submitted online for services such as driver's license renewals, tax filings, and permit applications.
  • Benefits: Improves efficiency, reduces paperwork, and enhances the overall citizen experience.

5. Healthcare Identity Verification:

  • Use Case: Healthcare providers use OCR to read information from patient IDs, insurance cards, and other identity documents for accurate patient record-keeping.
  • Benefits: Facilitates precise patient identification, ensures proper medical record management, and supports billing processes.

6. Automated Hotel Check-In:

  • Use Case: Hotels implement OCR for automated check-in processes by scanning guests' identity documents, streamlining the registration process.
  • Benefits: Enhances guest experience, reduces check-in time, and minimizes errors in capturing guest information.

7. Smart Cities and Public Services:

  • Use Case: OCR is applied in smart city initiatives to read identity documents for services like public transportation access, library memberships, and city event registrations.
  • Benefits: Improves the efficiency of public services, facilitates seamless access, and enhances urban living experiences.

8. Education Administration:

  • Use Case: Educational institutions use OCR to process and verify ID documents during student admissions, examinations, and issuance of academic credentials.
  • Benefits: Ensures accurate student records, reduces administrative burden, and enhances the integrity of academic processes.

Conclusion

Integrating OCR technology into your C# application using IronOCR allows you to efficiently extract information from ID documents. This comprehensive guide provides the necessary steps to set up your project and use IronOCR to read and process identity document images. Experiment with the code examples to tailor the extraction process to your specific requirements, providing a seamless and automated solution for handling identity document data.

常见问题解答

如何使用C#从身份文档中提取文本?

通过使用IronSoftware的专业OCR库IronOCR,您可以从各种身份文档(如护照、身份证和驾驶执照)中提取文本。您可以通过Visual Studio中的NuGet包管理器安装IronOCR,并使用其方法从图像和PDF中读取文本。

使用OCR进行身份文档的好处是什么?

OCR技术,如IronSoftware的IronOCR,自动提取身份文档中的文本,减少人为错误并提高数据检索效率。它支持多种语言和文档格式,非常适合金融、医疗和边境控制领域的应用。

在C#项目中设置OCR涉及哪些步骤?

要在C#项目中设置OCR,您需要在Visual Studio中创建新项目,通过NuGet包管理器安装IronOCR,并利用其API读取文档中的文本。IronOCR提供全面的文档和示例,帮助您集成OCR功能。

我如何增强图像质量以获得更好的OCR结果?

IronOCR包括Deskew、Denoise、Binarize、提高分辨率和扩张的功能,以增强图像质量。这些滤镜可以提高从低质量图像中识别文本的准确性,确保可靠的数据提取。

OCR技术能否读取身份文档中的条形码?

是的,IronOCR支持从身份文档中识别条形码。它可以读取超过20种条形码类型,包括QR码,这在需要同时提取文本和条形码数据的应用中非常有用。

OCR在身份验证中有哪些具体的应用案例?

OCR广泛用于身份验证,如自动化签到、访问控制和电子政府服务。IronOCR提供必要的工具来提取和验证身份文件中的文本,增强安全性并简化流程。

如何处理OCR中的多语言文本提取?

IronOCR提供多语言支持,允许您从各种语言的文档中提取文本。这一功能对于需要高效处理不同语言文档的国际应用特别有用。

是否有可用的OCR库试用版?

IronSoftware的IronOCR提供免费试用版。您可以通过提供电子邮件地址获取试用版许可证密钥,从而在购买前探索该库的功能。

Kannaopat Udonpant
软件工程师
在成为软件工程师之前,Kannapat 在日本北海道大学完成了环境资源博士学位。在攻读学位期间,Kannapat 还成为了车辆机器人实验室的成员,隶属于生物生产工程系。2022 年,他利用自己的 C# 技能加入 Iron Software 的工程团队,专注于 IronPDF。Kannapat 珍视他的工作,因为他可以直接从编写大多数 IronPDF 代码的开发者那里学习。除了同行学习外,Kannapat 还喜欢在 Iron Software 工作的社交方面。不撰写代码或文档时,Kannapat 通常可以在他的 PS5 上玩游戏或重温《最后生还者》。