Cómo leer documentos específicos usando OCR en C#

How to Read Specialized Documents

This article was translated from English: Does it need improvement?
Translated
View the article in English

Accurately reading specific documents such as standard text documents, license plates, passports, and photos with a general singular method is very hard. These challenges stem from the diverse formats, layouts, and content of each document type, as well as variations in image quality, distortion, and specialized content. Additionally, achieving contextual understanding and balancing performance and efficiency becomes more complex with a broader scope of document types.

IronOCR introduces specific methods for performing OCR on particular documents such as standard text documents, license plates, passports, and photos to achieve optimal accuracy and performance.

Quickstart: Read a Passport in One Line

Use IronOCR’s ReadPassport extension to extract all key passport details with minimal setup. With just one line of code—assuming you’ve installed IronOCR and AdvancedScan—you’ll get structured result data like names, passport number, country, and more, fast and effortlessly.

Nuget IconGet started making PDFs with NuGet now:

  1. Install IronOCR with NuGet Package Manager

    PM > Install-Package IronOcr

  2. Copy and run this code snippet.

    var result = new IronTesseract().ReadPassport(new OcrInput().LoadImage("passport.jpg"));
  3. Deploy to test on your live environment

    Start using IronOCR in your project today with a free trial
    arrow pointer


About The Package

The methods ReadLicensePlate, ReadPassport, ReadPhoto, and ReadScreenShot are extension methods to the base IronOCR package and require the IronOcr.Extensions.AdvancedScan package to be installed.

The methods work with OCR engine configurations such as blacklist and whitelist. Multiple languages, including Chinese, Japanese, Korean, and LatinAlphabet, are supported in all methods except for the ReadPassport method. Please note that each language requires an additional language package, IronOcr.Languages.

Using advanced scan on .NET Framework requires the project to run on x64 architecture. Navigate to the project configuration and uncheck the "Prefer 32-bit" option to achieve this. Learn more in the following troubleshooting guide: "Advanced Scan on .NET Framework."

Read Document Example

The ReadDocument method is a robust document reading method that specializes in scanned documents or photos of paper documents containing a lot of text. The PageSegmentationMode configuration is very important in reading text documents with different layouts.

For example, the SingleBlock and SparseText types could retrieve much information from table layout. This is because SingleBlock assumes that the text stays as a block, whereas SparseText assumes that the text is scattered throughout the document.

:path=/static-assets/ocr/content-code-examples/how-to/read-specific-document-document.cs
using IronOcr;
using System;

// Instantiate OCR engine
var ocr = new IronTesseract();

// Configure OCR engine
ocr.Configuration.PageSegmentationMode = TesseractPageSegmentationMode.SingleBlock;

using var input = new OcrInput();

input.LoadPdf("Five.pdf");

// Perform OCR
OcrResult result = ocr.ReadDocument(input);

Console.WriteLine(result.Text);
Imports IronOcr
Imports System

' Instantiate OCR engine
Private ocr = New IronTesseract()

' Configure OCR engine
ocr.Configuration.PageSegmentationMode = TesseractPageSegmentationMode.SingleBlock

Dim input = New OcrInput()

input.LoadPdf("Five.pdf")

' Perform OCR
Dim result As OcrResult = ocr.ReadDocument(input)

Console.WriteLine(result.Text)
$vbLabelText   $csharpLabel

The methods below are extension methods to the base IronOCR package and require the IronOcr.Extensions.AdvancedScan package to be installed.

Read License Plate Example

The ReadLicensePlate method is optimized for reading license plates from photos. The special information returned from this method is the Licenseplate property, which contains the information of the license plate location in the provided document.

:path=/static-assets/ocr/content-code-examples/how-to/read-specific-document-license-plate.cs
using IronOcr;
using IronSoftware.Drawing;
using System;

// Instantiate OCR engine
var ocr = new IronTesseract();

using var inputLicensePlate = new OcrInput();

inputLicensePlate.LoadImage("LicensePlate.jpeg");

// Perform OCR
OcrLicensePlateResult result = ocr.ReadLicensePlate(inputLicensePlate);

// Retrieve license plate coordinates
Rectangle rectangle = result.Licenseplate;

// Retrieve license plate value
string output = result.Text;
Imports IronOcr
Imports IronSoftware.Drawing
Imports System

' Instantiate OCR engine
Private ocr = New IronTesseract()

Private inputLicensePlate = New OcrInput()

inputLicensePlate.LoadImage("LicensePlate.jpeg")

' Perform OCR
Dim result As OcrLicensePlateResult = ocr.ReadLicensePlate(inputLicensePlate)

' Retrieve license plate coordinates
Dim rectangle As Rectangle = result.Licenseplate

' Retrieve license plate value
Dim output As String = result.Text
$vbLabelText   $csharpLabel

Read Passport Example

The ReadPassport method is optimized for reading and extracts passport information from passport photos by scanning the machine-readable zone (MRZ) contents. An MRZ is a specially defined zone in official documents such as passports, ID cards, and visas. The MRZ typically contains essential personal information, such as the holder’s name, date of birth, nationality, and document number. Currently, this method only supports the English language.

:path=/static-assets/ocr/content-code-examples/how-to/read-specific-document-passport.cs
using IronOcr;
using System;

// Instantiate OCR engine
var ocr = new IronTesseract();

using var inputPassport = new OcrInput();

inputPassport.LoadImage("Passport.jpg");

// Perform OCR
OcrPassportResult result = ocr.ReadPassport(inputPassport);

// Output passport information
Console.WriteLine(result.PassportInfo.GivenNames);
Console.WriteLine(result.PassportInfo.Country);
Console.WriteLine(result.PassportInfo.PassportNumber);
Console.WriteLine(result.PassportInfo.Surname);
Console.WriteLine(result.PassportInfo.DateOfBirth);
Console.WriteLine(result.PassportInfo.DateOfExpiry);
Imports IronOcr
Imports System

' Instantiate OCR engine
Private ocr = New IronTesseract()

Private inputPassport = New OcrInput()

inputPassport.LoadImage("Passport.jpg")

' Perform OCR
Dim result As OcrPassportResult = ocr.ReadPassport(inputPassport)

' Output passport information
Console.WriteLine(result.PassportInfo.GivenNames)
Console.WriteLine(result.PassportInfo.Country)
Console.WriteLine(result.PassportInfo.PassportNumber)
Console.WriteLine(result.PassportInfo.Surname)
Console.WriteLine(result.PassportInfo.DateOfBirth)
Console.WriteLine(result.PassportInfo.DateOfExpiry)
$vbLabelText   $csharpLabel

Result

Read Passport

Please make sure that the document only contains the passport image. Any header and footer text could confuse the method and result in an unexpected output.

Read Photo Example

The ReadPhoto method is optimized for reading images that contain hard-to-read text. This method returns the TextRegions property, which contains useful information about the detected text, such as Region, TextInRegion, and FrameNumber.

:path=/static-assets/ocr/content-code-examples/how-to/read-specific-document-photo.cs
using IronOcr;
using IronSoftware.Drawing;

// Instantiate OCR engine
var ocr = new IronTesseract();

using var inputPhoto = new OcrInput();
inputPhoto.LoadImageFrame("photo.tif", 2);

// Perform OCR
OcrPhotoResult result = ocr.ReadPhoto(inputPhoto);

// index number refer to region order in the page
int number = result.TextRegions[0].PageNumber;
string textinregion = result.TextRegions[0].TextInRegion;
Rectangle region = result.TextRegions[0].Region;
Imports IronOcr
Imports IronSoftware.Drawing

' Instantiate OCR engine
Private ocr = New IronTesseract()

Private inputPhoto = New OcrInput()
inputPhoto.LoadImageFrame("photo.tif", 2)

' Perform OCR
Dim result As OcrPhotoResult = ocr.ReadPhoto(inputPhoto)

' index number refer to region order in the page
Dim number As Integer = result.TextRegions(0).PageNumber
Dim textinregion As String = result.TextRegions(0).TextInRegion
Dim region As Rectangle = result.TextRegions(0).Region
$vbLabelText   $csharpLabel

Read Screenshot Example

The ReadScreenShot method is optimized for reading screenshots that contain hard-to-read text. Similar to the ReadPhoto method, it also returns the TextRegions property.

:path=/static-assets/ocr/content-code-examples/how-to/read-specific-document-screenshot.cs
}
using IronOcr;
using System;
using System.Linq;

// Instantiate OCR engine
var ocr = new IronTesseract();

using var inputScreenshot = new OcrInput();
inputScreenshot.LoadImage("screenshot.png");

// Perform OCR
OcrPhotoResult result = ocr.ReadScreenShot(inputScreenshot);

// Output screenshoot information
Console.WriteLine(result.Text);
Console.WriteLine(result.TextRegions.First().Region.X);
Console.WriteLine(result.TextRegions.Last().Region.Width);
Console.WriteLine(result.Confidence);
}
Imports IronOcr
Imports System
Imports System.Linq

' Instantiate OCR engine
Private ocr = New IronTesseract()

Private inputScreenshot = New OcrInput()
inputScreenshot.LoadImage("screenshot.png")

' Perform OCR
Dim result As OcrPhotoResult = ocr.ReadScreenShot(inputScreenshot)

' Output screenshoot information
Console.WriteLine(result.Text)
Console.WriteLine(result.TextRegions.First().Region.X)
Console.WriteLine(result.TextRegions.Last().Region.Width)
Console.WriteLine(result.Confidence)
}
$vbLabelText   $csharpLabel

Preguntas Frecuentes

¿Cómo puedo leer matrículas usando OCR en C#?

Puede usar el método ReadLicensePlate proporcionado por IronOCR para leer con precisión matrículas de fotos. Este método devuelve el texto de la matrícula y sus detalles de ubicación.

¿Cuál es la mejor manera de extraer información de las fotos de pasaporte?

El método ReadPassport de IronOCR está diseñado para escanear la zona de lectura mecánica (MRZ) en las fotos de pasaporte, extrayendo información esencial como el nombre, la fecha de nacimiento y el número de documento.

¿Puede IronOCR leer texto de fotos con texto difícil?

Sí, el método ReadPhoto en IronOCR está optimizado para leer imágenes con texto difícil de leer, proporcionando datos detallados sobre el texto detectado y sus regiones.

¿Es posible usar IronOCR para leer texto de capturas de pantalla?

Absolutamente, el método ReadScreenShot de IronOCR está específicamente optimizado para procesar texto en capturas de pantalla, y proporciona información detallada sobre regiones de texto.

¿Cómo puedo mejorar la precisión de OCR para documentos con disposiciones complejas?

Para mejorar la precisión de OCR para disposiciones de documentos complejos, configure el PageSegmentationMode en IronOCR. Opciones como SingleBlock y SparseText son particularmente útiles para extraer información de disposiciones de tablas.

¿Qué debo hacer si las características avanzadas de escaneo de IronOCR no funcionan en mi proyecto .NET Framework?

Asegúrese de que su proyecto esté configurado para ejecutarse en arquitectura x64 al desmarcar la opción 'Preferir 32 bits' en la configuración de su proyecto para abordar problemas con las características avanzadas de escaneo en IronOCR en .NET Framework.

¿Existen limitaciones de soporte de idiomas en IronOCR?

IronOCR admite múltiples idiomas, incluidos chino, japonés, coreano y alfabeto latino. Sin embargo, el método ReadPassport actualmente solo admite documentos en inglés.

¿Qué necesito para usar características avanzadas de escaneo en IronOCR?

Para usar características avanzadas de escaneo en IronOCR, se requiere el paquete IronOcr.Extensions.AdvancedScan, que está disponible exclusivamente en Windows.

Curtis Chau
Escritor Técnico

Curtis Chau tiene una licenciatura en Ciencias de la Computación (Carleton University) y se especializa en el desarrollo front-end con experiencia en Node.js, TypeScript, JavaScript y React. Apasionado por crear interfaces de usuario intuitivas y estéticamente agradables, disfruta trabajando con frameworks modernos y creando manuales bien ...

Leer más
¿Listo para empezar?
Nuget Descargas 5,044,537 | Versión: 2025.11 recién lanzado