Test in a live environment
Test in production without watermarks.
Works wherever you need it to.
Invoice data processing refers to receiving, managing, and validating invoices from suppliers or vendors and ensuring that the payments are made correctly and on time. It involves steps designed to ensure accuracy, compliance, and efficiency in handling business transactions to avoid paper invoices. Automated invoice processing can significantly reduce manual data entry errors and improve efficiency. IronOCR is a powerful Optical Character Recognition (OCR) software library that can be used to extract data or text from invoices from a digital file, making it an excellent tool for automating invoice OCR processing in C# applications.
Optical Character Recognition is a technology that enables to recognize and convert different types of documents, PDFs, or images of text—into editable and searchable data. OCR technology processes images of text and extracts the characters, making them machine-readable. Advanced OCR invoice software systems help in financial management tools and invoice automation.
OCR technology has evolved significantly, making it highly accurate and useful for processing documents and invoice data extraction across many different invoice formats to reduce manual data entry, eliminate manual invoice processing, and enhance data security.
IronOCR is a powerful Optical Character Recognition (OCR) library for .NET (C#) that allows developers to extract text from images, PDFs, and other document formats, develop OCR invoice software, and implement accounts payable workflow. It provides an easy-to-use API for integrating OCR capabilities into the accounts payable system or accounting system.
Before you start, ensure you have the following:
Open Visual Studio and click on Create a new project.
Select Console App in the options.
Provide project name and path.
Select the .NET Version type.
In your project in Visual Studio go to Tools > NuGet Package Manager > Manage NuGet Packages for Solution. Click on the Browse tab and search for IronOCR. Select IronOCR and click Install.
Another option is to use the console and the below command.
dotnet add package IronOcr --version 2024.12.2
dotnet add package IronOcr --version 2024.12.2
'INSTANT VB TODO TASK: The following line uses invalid syntax:
'dotnet add package IronOcr --version 2024.12.2
Sample digital invoice image with the invoice number.
Now use the below code to extract data from an invoice for OCR invoice processing.
using IronOcr;
License.LicenseKey = "Your License";
string filePath = "sample1.jpg"; // image for invoice OCR
// Create an instance of IronTesseract
var ocr = new IronTesseract();
// Load the image or PDF file
using (var ocrInput = new OcrInput())
{
ocrInput.LoadImage(filePath);
// Optionally apply filters if needed
ocrInput.Deskew();
// ocrInput.DeNoise();
// Read the text from the image or PDF
var ocrResult = ocr.Read(ocrInput);
// Output the extracted text
Console.WriteLine("Extracted Text:");
Console.WriteLine(ocrResult.Text);
// next steps are to process data and use the extracted and validated data with invoice date
}
using IronOcr;
License.LicenseKey = "Your License";
string filePath = "sample1.jpg"; // image for invoice OCR
// Create an instance of IronTesseract
var ocr = new IronTesseract();
// Load the image or PDF file
using (var ocrInput = new OcrInput())
{
ocrInput.LoadImage(filePath);
// Optionally apply filters if needed
ocrInput.Deskew();
// ocrInput.DeNoise();
// Read the text from the image or PDF
var ocrResult = ocr.Read(ocrInput);
// Output the extracted text
Console.WriteLine("Extracted Text:");
Console.WriteLine(ocrResult.Text);
// next steps are to process data and use the extracted and validated data with invoice date
}
Imports IronOcr
License.LicenseKey = "Your License"
Dim filePath As String = "sample1.jpg" ' image for invoice OCR
' Create an instance of IronTesseract
Dim ocr = New IronTesseract()
' Load the image or PDF file
Using ocrInput As New OcrInput()
ocrInput.LoadImage(filePath)
' Optionally apply filters if needed
ocrInput.Deskew()
' ocrInput.DeNoise();
' Read the text from the image or PDF
Dim ocrResult = ocr.Read(ocrInput)
' Output the extracted text
Console.WriteLine("Extracted Text:")
Console.WriteLine(ocrResult.Text)
' next steps are to process data and use the extracted and validated data with invoice date
End Using
The provided code demonstrates how to use the IronOCR library in C# to extract text from an image (e.g., an invoice) using OCR (Optical Character Recognition). Here's an explanation of each part of the code without the actual code:
License Key Setup:
The code begins by setting the license key for IronOCR. This key is required to use the full functionality of the library. If you have a valid license, you replace "Your License" with your actual license key.
Specifying the Input File:
The filePath variable holds the location of the image that contains the invoice (in this case, "sample1.jpg"). This is the file that will be processed for text extraction.
Creating an OCR Instance:
An instance of IronTesseract is created. IronTesseract is the class responsible for performing the OCR operation on the input data (image or PDF).
Loading the Image:
The code then creates an OcrInput object, which is used to load the image (in this case, a JPG file specified by filePath). The LoadImage method is used to read the image file and prepare it for OCR.
Applying Image Filters:
The code contains a filter step where optional image processing methods, such as Deskew (correcting skewed images) and DeNoise (removing noise from the image), can be applied to improve OCR accuracy. In this case, only the Deskew method is active.
Performing OCR:
Displaying the Extracted Text:
To improve the efficiency only a part of the image can be used for extraction.
using IronOcr;
using IronSoftware.Drawing;
License.LicenseKey = "Your Key";
string filePath = "sample1.jpg";
// Create an instance of IronTesseract
var ocr = new IronTesseract();
// Load the image or PDF file
using (var ocrInput = new OcrInput())
{
var ContentArea = new Rectangle(x: 0, y: 0, width: 1000, height: 250);
ocrInput.LoadImage(filePath, ContentArea);
// Optionally apply filters if needed
ocrInput.Deskew();
// ocrInput.DeNoise();
// Read the text from the image or PDF
var ocrResult = ocr.Read(ocrInput);
// Output the extracted text
Console.WriteLine("Extracted Text:");
Console.WriteLine(ocrResult.Text);
}
using IronOcr;
using IronSoftware.Drawing;
License.LicenseKey = "Your Key";
string filePath = "sample1.jpg";
// Create an instance of IronTesseract
var ocr = new IronTesseract();
// Load the image or PDF file
using (var ocrInput = new OcrInput())
{
var ContentArea = new Rectangle(x: 0, y: 0, width: 1000, height: 250);
ocrInput.LoadImage(filePath, ContentArea);
// Optionally apply filters if needed
ocrInput.Deskew();
// ocrInput.DeNoise();
// Read the text from the image or PDF
var ocrResult = ocr.Read(ocrInput);
// Output the extracted text
Console.WriteLine("Extracted Text:");
Console.WriteLine(ocrResult.Text);
}
Imports IronOcr
Imports IronSoftware.Drawing
License.LicenseKey = "Your Key"
Dim filePath As String = "sample1.jpg"
' Create an instance of IronTesseract
Dim ocr = New IronTesseract()
' Load the image or PDF file
Using ocrInput As New OcrInput()
Dim ContentArea = New Rectangle(x:= 0, y:= 0, width:= 1000, height:= 250)
ocrInput.LoadImage(filePath, ContentArea)
' Optionally apply filters if needed
ocrInput.Deskew()
' ocrInput.DeNoise();
' Read the text from the image or PDF
Dim ocrResult = ocr.Read(ocrInput)
' Output the extracted text
Console.WriteLine("Extracted Text:")
Console.WriteLine(ocrResult.Text)
End Using
This code extracts text from a specific region of an image using IronOCR, and it optionally applies filters like deskewing to improve accuracy. The extracted text is then displayed, and ready for further use.
The first part of the code involves setting the license key for IronOCR. This is required to use the OCR functionality in the library. The license key should be replaced with the actual key you obtain from IronOCR, allowing you to access the full features of the library.
The file path of the image that you wish to process is specified. This image (in this case, a JPG file) contains the document or content from which the OCR will extract text. The path can point to an image file on the local system or other accessible storage.
An instance of the IronTesseract class is created. This object is the core engine that will perform the optical character recognition on the image.
A rectangle (area of interest) is defined within the image. This rectangle specifies the portion of the image that the OCR engine will focus on. In this example, the rectangle starts at the top-left corner (x=0, y=0) and has a width of 1000 pixels and a height of 250 pixels. This step helps the OCR process only the relevant section of the image, improving accuracy and speed.
The image is loaded into the OCR engine, but only the defined rectangle (the content area) is processed. This allows you to narrow the scope of OCR to a specific part of the image, which is especially useful when the image contains irrelevant areas, such as backgrounds or logos, that you don’t want to process.
The code optionally applies a deskewing filter to the image. Deskewing is the process of straightening an image if it has any tilt or rotation, improving the accuracy of OCR. Another filter, denoise, is available but commented out. If enabled, it would remove noise (unwanted marks) from the image, which might further enhance OCR accuracy.
The OCR engine reads the image (or the specified area of it) and extracts any text it recognizes. The result is stored in an object that holds the recognized text.
Finally, the extracted text is printed to the console. This text is the result of the OCR process, and it can be further processed, validated, or used in applications such as data entry or document management.
IronOCR requires a key to extract data from invoices, get your developer trial key from the licensing page.
using IronOcr;
License.LicenseKey = "Your Key";
using IronOcr;
License.LicenseKey = "Your Key";
Imports IronOcr
License.LicenseKey = "Your Key"
This article provided a basic example of how to get started with IronOCR for invoice processing. You can further customize and expand this code to fit your specific requirements.
IronOCR provides an efficient and easy-to-integrate solution for extracting text from images and PDFs, making it ideal for invoice processing. By using IronOCR in combination with C# string manipulation or regular expressions, you can quickly process and extract important data from invoices.
This is a basic example of invoice processing, and with more advanced configurations (like language recognition, multi-page PDF processing, etc.), you can fine-tune the OCR results to improve accuracy for your specific use case.
IronOCR's API is flexible, and it can be used for a wide variety of OCR tasks beyond invoice processing, including receipt scanning, document conversion, and data entry automation.
9 .NET API products for your office documents