How to Read Specific Documents
Accurately reading specific documents such as standard text documents, license plates, passports, and photos with a general singular method is very hard. These challenges stem from the diverse formats, layouts, and content of each document type, as well as variations in image quality, distortion, and specialized content. Additionally, achieving contextual understanding and balancing performance and efficiency becomes more complex with a broader scope of document types.
IronOCR introduces specific methods for performing OCR on particular documents such as standard text documents, license plates, passports, and photos to achieve optimal accuracy and performance.
Get started with IronOCR
Start using IronOCR in your project today with a free trial.
How to Read Specific Documents
- Download a C# library to read license plates, passports, and photos
- Prepare the image and PDF document for OCR
- Set the
ReadLicensePlate
method to read a license plate - Set the
ReadPassport
method to retrieve information from a passport - Leverage the
ReadPhoto
andReadScreenShot
methods to read images that contain hard-to-read text
About The Package
The methods ReadLicensePlate
, ReadPassport
, ReadPhoto
, and ReadScreenShot
are extension methods to the base IronOCR package and require the IronOcr.Extensions.AdvancedScan package to be installed. Currently, this extension is only available on Windows.
The methods work with OCR engine configurations such as blacklist and whitelist. Multiple languages, including Chinese, Japanese, Korean, and LatinAlphabet, are supported in all methods except for the ReadPassport
method. Please note that each language requires an additional language package, IronOcr.Languages.
Using advanced scan on .NET Framework requires the project to run on x64 architecture. Navigate to the project configuration and uncheck the "Prefer 32-bit" option to achieve this. Learn more in the following troubleshooting guide: "Advanced Scan on .NET Framework."
Read Document Example
The ReadDocument
method is a robust document reading method that specializes in scanned documents or photos of paper documents containing a lot of text. The PageSegmentationMode configuration is very important in reading text documents with different layouts.
For example, the SingleBlock and SparseText types could retrieve much information from table layout. This is because SingleBlock assumes that the text stays as a block, whereas SparseText assumes that the text is scattered throughout the document.
:path=/static-assets/ocr/content-code-examples/how-to/read-specific-document-document.cs
// Import necessary namespaces for IronOcr and System
using IronOcr;
using System;
// Initialize an IronTesseract OCR engine instance
var ocr = new IronTesseract();
// Configure the OCR engine for processing the input as a single block of text.
// This is helpful in scenarios where the text is not scattered across the page.
ocr.Configuration.PageSegmentationMode = TesseractPageSegmentationMode.SingleBlock;
// Create an instance of OcrInput within a 'using' statement.
// This ensures proper resource management by disposing of the object when it goes out of scope.
using var input = new OcrInput();
// Load a PDF document into the OcrInput object.
// This prepares the document for OCR processing.
input.LoadPdf("Five.pdf");
// Perform OCR on the input document to extract the text,
// then store the recognition result in 'result'.
OcrResult result = ocr.Read(input);
// Output the recognized text to the console for review.
Console.WriteLine(result.Text);
' Import necessary namespaces for IronOcr and System
Imports IronOcr
Imports System
' Initialize an IronTesseract OCR engine instance
Private ocr = New IronTesseract()
' Configure the OCR engine for processing the input as a single block of text.
' This is helpful in scenarios where the text is not scattered across the page.
ocr.Configuration.PageSegmentationMode = TesseractPageSegmentationMode.SingleBlock
' Create an instance of OcrInput within a 'using' statement.
' This ensures proper resource management by disposing of the object when it goes out of scope.
Dim input = New OcrInput()
' Load a PDF document into the OcrInput object.
' This prepares the document for OCR processing.
input.LoadPdf("Five.pdf")
' Perform OCR on the input document to extract the text,
' then store the recognition result in 'result'.
Dim result As OcrResult = ocr.Read(input)
' Output the recognized text to the console for review.
Console.WriteLine(result.Text)
Read License Plate Example
The ReadLicensePlate
method is optimized for reading license plates from photos. The special information returned from this method is the Licenseplate property, which contains the information of the license plate location in the provided document.
:path=/static-assets/ocr/content-code-examples/how-to/read-specific-document-license-plate.cs
using IronOcr;
using System;
// Instantiate the OCR engine
var ocr = new IronTesseract();
// Create an instance for OCR input
using var inputLicensePlate = new OcrInput();
// Load the image of the license plate
inputLicensePlate.AddImage("LicensePlate.jpeg");
// Perform OCR on the input image
var result = ocr.Read(inputLicensePlate);
// Retrieve the detected text value from the license plate
string output = result.Text;
// Check if any pages and OCR regions are detected from the image
if (result.Pages.Count > 0 && result.Pages[0].OcrRegions.Count > 0)
{
// Assuming the detection includes a single result and obtaining its coordinates
var rectangle = result.Pages[0].OcrRegions[0].Bounds;
// Output the text and coordinates of the detected license plate
Console.WriteLine($"License Plate Text: {output}");
Console.WriteLine($"License Plate Coordinates: {rectangle}");
}
else
{
// Notify if no license plate is detected
Console.WriteLine("No license plate detected.");
}
/*
* Notes:
* - This code uses IronOcr to perform OCR on an image of a license plate and extract the text.
* - The `OcrInput` object is used to load images and prepare them for recognition.
* - The IronTesseract `Read` method processes the input and returns recognition results.
* - The license plate's text and position are output for identification.
* - Ensure the input image file "LicensePlate.jpeg" is located at the expected path.
*/
Imports IronOcr
Imports System
' Instantiate the OCR engine
Private ocr = New IronTesseract()
' Create an instance for OCR input
Private inputLicensePlate = New OcrInput()
' Load the image of the license plate
inputLicensePlate.AddImage("LicensePlate.jpeg")
' Perform OCR on the input image
Dim result = ocr.Read(inputLicensePlate)
' Retrieve the detected text value from the license plate
Dim output As String = result.Text
' Check if any pages and OCR regions are detected from the image
If result.Pages.Count > 0 AndAlso result.Pages(0).OcrRegions.Count > 0 Then
' Assuming the detection includes a single result and obtaining its coordinates
Dim rectangle = result.Pages(0).OcrRegions(0).Bounds
' Output the text and coordinates of the detected license plate
Console.WriteLine($"License Plate Text: {output}")
Console.WriteLine($"License Plate Coordinates: {rectangle}")
Else
' Notify if no license plate is detected
Console.WriteLine("No license plate detected.")
End If
'
' * Notes:
' * - This code uses IronOcr to perform OCR on an image of a license plate and extract the text.
' * - The `OcrInput` object is used to load images and prepare them for recognition.
' * - The IronTesseract `Read` method processes the input and returns recognition results.
' * - The license plate's text and position are output for identification.
' * - Ensure the input image file "LicensePlate.jpeg" is located at the expected path.
'
Read Passport Example
The ReadPassport
method is optimized for reading and extracts passport information from passport photos by scanning the machine-readable zone (MRZ) contents. An MRZ is a specially defined zone in official documents such as passports, ID cards, and visas. The MRZ typically contains essential personal information, such as the holder’s name, date of birth, nationality, and document number. Currently, this method only supports the English language.
:path=/static-assets/ocr/content-code-examples/how-to/read-specific-document-passport.cs
using IronOcr;
using System;
using System.Linq;
// This example uses the IronOCR library to extract text data from a passport image.
// Ensure you have a reference to the IronOcr library to run this code.
// Instantiate the IronTesseract engine
var ocr = new IronTesseract();
// Create an instance of OcrInput to store the image for OCR processing
using var inputPassport = new OcrInput();
// Load the image of the passport into the OcrInput instance
inputPassport.AddImage("Passport.jpg");
// Perform OCR on the loaded passport image
// The Read method processes the image and attempts to extract text information
var result = ocr.Read(inputPassport);
// Check if the OCR result contains any pages with data
if (result.Pages.Count > 0)
{
// Retrieve the text from the first page of the OCR result
var passportInfo = result.Pages.First().Text;
// Output the extracted passport information
Console.WriteLine("Extracted Passport Information:");
Console.WriteLine(passportInfo);
}
else
{
// Inform the user that no data could be extracted from the image
Console.WriteLine("No data could be extracted from the passport image.");
}
Imports IronOcr
Imports System
Imports System.Linq
' This example uses the IronOCR library to extract text data from a passport image.
' Ensure you have a reference to the IronOcr library to run this code.
' Instantiate the IronTesseract engine
Private ocr = New IronTesseract()
' Create an instance of OcrInput to store the image for OCR processing
Private inputPassport = New OcrInput()
' Load the image of the passport into the OcrInput instance
inputPassport.AddImage("Passport.jpg")
' Perform OCR on the loaded passport image
' The Read method processes the image and attempts to extract text information
Dim result = ocr.Read(inputPassport)
' Check if the OCR result contains any pages with data
If result.Pages.Count > 0 Then
' Retrieve the text from the first page of the OCR result
Dim passportInfo = result.Pages.First().Text
' Output the extracted passport information
Console.WriteLine("Extracted Passport Information:")
Console.WriteLine(passportInfo)
Else
' Inform the user that no data could be extracted from the image
Console.WriteLine("No data could be extracted from the passport image.")
End If
Result

Please make sure that the document only contains the passport image. Any header and footer text could confuse the method and result in an unexpected output.
Read Photo Example
The ReadPhoto
method is optimized for reading images that contain hard-to-read text. This method returns the TextRegions property, which contains useful information about the detected text, such as Region, TextInRegion, and FrameNumber.
:path=/static-assets/ocr/content-code-examples/how-to/read-specific-document-photo.cs
using IronOcr;
using IronSoftware.Drawing;
using System.Drawing; // Ensure System.Drawing is included for Rectangle
// Instantiate the Tesseract OCR engine
var ocr = new IronTesseract();
// Create a new OcrInput object for the image
using var inputPhoto = new OcrInput();
// Load a specific frame of the image file "photo.tif" into the OcrInput object
// The second parameter (2) indicates the frame number to be loaded if the image is multi-frame
inputPhoto.LoadImageFrame("photo.tif", 2);
// Perform OCR on the loaded image and store the result
OcrResult result = ocr.Read(inputPhoto);
// Ensure there is at least one text region detected before proceeding
if (result.TextRegions.Count > 0)
{
// Retrieve the data from the first detected text region
int pageNumber = result.TextRegions[0].PageNumber; // Get the page number from the first text region
string textInRegion = result.TextRegions[0].Text; // Get the text contained in the first text region
Rectangle region = result.TextRegions[0].Bounds; // Get the bounds of the first text region
// Optionally, print or use the extracted data
Console.WriteLine($"Detected text in page {pageNumber}:\n{textInRegion}");
Console.WriteLine($"Region located at: {region}");
}
else
{
// Print a message if no text regions are detected
Console.WriteLine("No text regions detected in the image.");
}
Imports Microsoft.VisualBasic
Imports IronOcr
Imports IronSoftware.Drawing
Imports System.Drawing ' Ensure System.Drawing is included for Rectangle
' Instantiate the Tesseract OCR engine
Private ocr = New IronTesseract()
' Create a new OcrInput object for the image
Private inputPhoto = New OcrInput()
' Load a specific frame of the image file "photo.tif" into the OcrInput object
' The second parameter (2) indicates the frame number to be loaded if the image is multi-frame
inputPhoto.LoadImageFrame("photo.tif", 2)
' Perform OCR on the loaded image and store the result
Dim result As OcrResult = ocr.Read(inputPhoto)
' Ensure there is at least one text region detected before proceeding
If result.TextRegions.Count > 0 Then
' Retrieve the data from the first detected text region
Dim pageNumber As Integer = result.TextRegions(0).PageNumber ' Get the page number from the first text region
Dim textInRegion As String = result.TextRegions(0).Text ' Get the text contained in the first text region
Dim region As Rectangle = result.TextRegions(0).Bounds ' Get the bounds of the first text region
' Optionally, print or use the extracted data
Console.WriteLine($"Detected text in page {pageNumber}:" & vbLf & "{textInRegion}")
Console.WriteLine($"Region located at: {region}")
Else
' Print a message if no text regions are detected
Console.WriteLine("No text regions detected in the image.")
End If
Read Screenshot Example
The ReadScreenShot
method is optimized for reading screenshots that contain hard-to-read text. Similar to the ReadPhoto method, it also returns the TextRegions property.
:path=/static-assets/ocr/content-code-examples/how-to/read-specific-document-screenshot.cs
}
using IronOcr;
using System;
using System.Linq;
// Initialize the IronTesseract class, which provides OCR functionalities.
var ocr = new IronTesseract();
// Using statement for automatic disposal of the OcrInput object.
using var inputScreenshot = new OcrInput();
// Load an image file to prepare it for OCR processing.
inputScreenshot.AddImage("screenshot.png");
// Perform Optical Character Recognition on the loaded image.
OcrResult result = ocr.Read(inputScreenshot);
// Display the text recognized from the image to the console.
Console.WriteLine("Recognized Text: ");
Console.WriteLine(result.Text);
// Check if there are any text regions detected.
if (result.TextRegions.Any())
{
// Output details about the position of the first detected text region.
Console.WriteLine("First text region X coordinate: " + result.TextRegions.First().Bounds.X);
// Output details about the width of the last detected text region.
Console.WriteLine("Last text region width: " + result.TextRegions.Last().Bounds.Width);
}
// Output the confidence level of the OCR result as an indicator of accuracy.
Console.WriteLine("OCR Confidence Level: " + result.Confidence);
}
Imports IronOcr
Imports System
Imports System.Linq
' Initialize the IronTesseract class, which provides OCR functionalities.
Private ocr = New IronTesseract()
' Using statement for automatic disposal of the OcrInput object.
Private inputScreenshot = New OcrInput()
' Load an image file to prepare it for OCR processing.
inputScreenshot.AddImage("screenshot.png")
' Perform Optical Character Recognition on the loaded image.
Dim result As OcrResult = ocr.Read(inputScreenshot)
' Display the text recognized from the image to the console.
Console.WriteLine("Recognized Text: ")
Console.WriteLine(result.Text)
' Check if there are any text regions detected.
If result.TextRegions.Any() Then
' Output details about the position of the first detected text region.
Console.WriteLine("First text region X coordinate: " & result.TextRegions.First().Bounds.X)
' Output details about the width of the last detected text region.
Console.WriteLine("Last text region width: " & result.TextRegions.Last().Bounds.Width)
End If
' Output the confidence level of the OCR result as an indicator of accuracy.
Console.WriteLine("OCR Confidence Level: " & result.Confidence)
}
Frequently Asked Questions
What is this OCR library?
IronOCR is a C# library used for performing Optical Character Recognition (OCR) on various document types such as text documents, license plates, passports, and photos.
How can I start using this library for reading documents?
To start using IronOCR, download the library from NuGet, prepare your images or PDF documents for OCR, and use specific methods like ReadLicensePlate, ReadPassport, ReadPhoto, or ReadScreenShot to perform OCR on your documents.
What additional package is required for using advanced scanning features?
The IronOcr.Extensions.AdvancedScan package is required for using advanced scanning features, and it is currently available only on Windows.
Which languages are supported by this OCR tool?
IronOCR supports multiple languages including Chinese, Japanese, Korean, and LatinAlphabet. However, the ReadPassport method currently only supports the English language.
How do I configure the library to read documents with different layouts?
You can configure IronOCR to read documents with different layouts by setting the PageSegmentationMode in the OCR configuration. Options like SingleBlock and SparseText can help in retrieving information from table layouts.
What is the method for reading license plates used for?
The ReadLicensePlate method is specifically optimized for reading license plates from photos, and it returns details such as the license plate text and its location.
How does the method for reading passports work?
The ReadPassport method extracts information from passport photos by scanning the machine-readable zone (MRZ), which contains essential personal information like name, date of birth, and document number.
What is the purpose of the method for reading photos?
The ReadPhoto method is designed to read images that contain hard-to-read text. It returns the TextRegions property, which includes information about detected text and its regions.
Can this tool read text from screenshots?
Yes, IronOCR can read text from screenshots using the ReadScreenShot method, which is optimized for processing text in screenshots and provides the TextRegions property.
What should I do if I encounter issues with advanced scan on .NET Framework?
If you encounter issues with advanced scan on .NET Framework, ensure your project is configured to run on x64 architecture by unchecking the 'Prefer 32-bit' option in project settings.