using IronOcr; using IronSoftware.Drawing; // We can delve deep into OCR results as an object model of // Pages, Barcodes, Paragraphs, Lines, Words and Characters // This allows us to explore, export and draw OCR content using other APIs/ var ocrTesseract = new IronTesseract(); ocrTesseract.Configuration.ReadBarCodes = true; using var ocrInput = new OcrInput(); var pages = new int[] { 1, 2 }; ocrInput.LoadImageFrames("example.tiff", pages); OcrResult ocrResult = ocrTesseract.Read(ocrInput); foreach (var page in ocrResult.Pages) { // Page object int PageNumber = page.PageNumber; string PageText = page.Text; int PageWordCount = page.WordCount; // null if we dont set Ocr.Configuration.ReadBarCodes = true; OcrResult.Barcode[] Barcodes = page.Barcodes; AnyBitmap PageImage = page.ToBitmap(ocrInput); int PageWidth = page.Width; int PageHeight = page.Height; double PageRotation = page.Rotation; // angular correction in degrees from OcrInput.Deskew() foreach (var paragraph in page.Paragraphs) { // Pages -> Paragraphs int ParagraphNumber = paragraph.ParagraphNumber; string ParagraphText = paragraph.Text; AnyBitmap ParagraphImage = paragraph.ToBitmap(ocrInput); int ParagraphX_location = paragraph.X; int ParagraphY_location = paragraph.Y; int ParagraphWidth = paragraph.Width; int ParagraphHeight = paragraph.Height; double ParagraphOcrAccuracy = paragraph.Confidence; OcrResult.TextFlow paragrapthText_direction = paragraph.TextDirection; foreach (var line in paragraph.Lines) { // Pages -> Paragraphs -> Lines int LineNumber = line.LineNumber; string LineText = line.Text; AnyBitmap LineImage = line.ToBitmap(ocrInput); int LineX_location = line.X; int LineY_location = line.Y; int LineWidth = line.Width; int LineHeight = line.Height; double LineOcrAccuracy = line.Confidence; double LineSkew = line.BaselineAngle; double LineOffset = line.BaselineOffset; foreach (var word in line.Words) { // Pages -> Paragraphs -> Lines -> Words int WordNumber = word.WordNumber; string WordText = word.Text; AnyBitmap WordImage = word.ToBitmap(ocrInput); int WordX_location = word.X; int WordY_location = word.Y; int WordWidth = word.Width; int WordHeight = word.Height; double WordOcrAccuracy = word.Confidence; foreach (var character in word.Characters) { // Pages -> Paragraphs -> Lines -> Words -> Characters int CharacterNumber = character.CharacterNumber; string CharacterText = character.Text; AnyBitmap CharacterImage = character.ToBitmap(ocrInput); int CharacterX_location = character.X; int CharacterY_location = character.Y; int CharacterWidth = character.Width; int CharacterHeight = character.Height; double CharacterOcrAccuracy = character.Confidence; // Output alternative symbols choices and their probability. // Very useful for spellchecking OcrResult.Choice[] Choices = character.Choices; } } } } }

Published September 27, 2023

Invoice OCR Machine Learning (Step-By-Step-Tutorial)

In today's fast-paced business environment, automating tasks and unstructured data has become a key strategy for improving efficiency and reducing manual errors. One such task is the extraction of information from invoices or purchase orders, a process that traditionally required significant manual effort. However, thanks to advancements in machine learning, deep learning models and optical character recognition (OCR) software technology, businesses can now streamline this invoice information extraction process using tools like IronOCR. In this article, we will explore how machine learning and IronOCR can be leveraged to revolutionize the way invoices are processed.

Understanding Invoice OCR Tool

OCR technology has been around for some time, but its application to invoice processing and extracting data has seen a significant boost with the advent of machine learning. OCR, short for Optical Character Recognition, is a technology that converts different types of documents, such as scanned paper documents with invoice information, PDF files, financial documents or input images captured by a digital camera, into editable and searchable data. It essentially translates text from images into machine-readable text using image pre-processing.

IronOCR is a powerful OCR library built on top of machine learning algorithms that can be integrated into various applications and programming languages, making it a versatile tool for invoice processing. By using IronOCR, businesses can automate invoice data extraction, such as invoice number, date, vendor details, and line items, with remarkable accuracy.

The Benefits of Using IronOCR for Invoice OCR

Using IronOCR for invoice processing offers numerous benefits that can significantly improve efficiency and accuracy in your organization's financial operations such as accounts payable. Let's delve into these benefits in more detail:

1. Accuracy and Reduced Errors

IronOCR utilizes advanced machine learning algorithms to recognize and extract text from invoices accurately. This minimizes the chances of human errors in data entry, ensuring that critical financial information is recorded correctly.

2. Time and Cost Savings

Automating invoice processing with IronOCR significantly reduces the time and resources required for manual data entry. This can lead to substantial cost savings by optimizing staff time and reducing the need for manual labor.

3. Improved Efficiency

IronOCR can process a large volume of invoices quickly and efficiently. It eliminates the need for employees to manually input data from each invoice, allowing them to focus on more strategic tasks.

4. Scalability

IronOCR is scalable and can handle a growing volume of invoices as your business expands. You don't need to worry about increased workloads and bounding boxes overwhelming your invoice document processing system.

5. Global Reach

IronOCR supports 125+ languages which allows businesses to process invoices from vendors and clients around the world. Regardless of the language in which an invoice is written, IronOCR can extract data accurately.

6. Multi-format Support

IronOCR can process invoices in various formats, including scanned images, image-based PDFs, and text-based PDFs. This versatility ensures that you can handle invoices from different sources and formats with ease.

7. Customization and Data Extraction

You can customize IronOCR to extract specific data fields from invoices, such as invoice numbers, dates, vendor details, and line item information. This level of customization allows you to tailor the solution to your specific business needs.

8. Compliance and Audit Trail

Automated invoice processing with IronOCR helps maintain accurate records and provides an audit trail. This is crucial for compliance with financial regulations and for simplifying the auditing process.

9. Reduced Invoice Processing Cycle

The streamlined and automated nature of IronOCR reduces the time it takes to process invoices, which, in turn, shortens the invoice processing cycle. This can lead to faster payments to vendors and improved relationships.

10. Enhanced Data Analysis

By having invoice data in a structured digital format, you can perform more in-depth data analysis. This can help identify trends, optimize spending, and make informed financial decisions.

Implementing IronOCR for Invoice Processing

To implement IronOCR for invoice processing, follow these general steps:

Step 1: Create a New C#

Start by creating a new C# project or opening an existing project in your preferred development environment (e.g., Visual Studio or Visual Studio Code). I am using Visual Studio 2022 IDE and Console Application for this demonstration. You can use the same implementation in any project type such as ASP.NET Web APIs, ASP.NET MVC, ASP.NET Web Forms, or any .NET Framework.

Invoice OCR Machine Learning (Step-By-Step-Tutorial): Figure 1 - C# Project

Step 2: Install IronOCR via NuGet Package Manager

To use IronOCR in your project, you'll need to install the IronOCR NuGet package. Here's how to do it:

Open the NuGet Package Manager Console. In Visual Studio, you can find this under "Tools" > "NuGet Package Manager" > "Package Manager Console."
Run the following command to install the IronOCR package:
```
:PackageInstall
```
Wait for the package to be installed. Once completed, you can start using IronOCR in your project.

Step 3: Implement OCR in Your C#

Now, let's write the C# code to perform OCR on an invoice using IronOCR. We will use the following sample invoice for this example.

Invoice OCR Machine Learning (Step-By-Step-Tutorial): Figure 4 - Sample Invoice Template

The following sample code will take the invoice image as input and will extract data from the invoice such as invoice number, purchase orders, etc.

string invoicePath = @"D:\Invoices\SampleInvoice.png";
IronTesseract ocr = new IronTesseract();
using (OcrInput input = new OcrInput())
{
    // Add multiple images
    input.AddImage(invoicePath);
    OcrResult result = ocr.Read(input);
    Console.WriteLine(result.Text);
}

string invoicePath = @"D:\Invoices\SampleInvoice.png";
IronTesseract ocr = new IronTesseract();
using (OcrInput input = new OcrInput())
{
    // Add multiple images
    input.AddImage(invoicePath);
    OcrResult result = ocr.Read(input);
    Console.WriteLine(result.Text);
}

Dim invoicePath As String = "D:\Invoices\SampleInvoice.png"
Dim ocr As New IronTesseract()
Using input As New OcrInput()
	' Add multiple images
	input.AddImage(invoicePath)
	Dim result As OcrResult = ocr.Read(input)
	Console.WriteLine(result.Text)
End Using

VB C#

The above code is a concise C# example that uses IronOCR to perform OCR on a single invoice image (SampleInvoice.png) and then prints the extracted invoice data to the console. Make sure to replace the invoicePath variable with the path to your specific invoice image file.

Invoice OCR Machine Learning (Step-By-Step-Tutorial): Figure 5 - Invoice OCR Output

Let's take multiple invoices input at once and extract their data. The following is the Invoices directory we are using as input.

Invoice OCR Machine Learning (Step-By-Step-Tutorial): Figure 6 - Invoices directory

The following sample code will perform text extraction from multiple invoices at once.

string[] fileArray = Directory.GetFiles(@"D:\Invoices\", "*.png");
IronTesseract ocr = new IronTesseract();
using (OcrInput input = new OcrInput())
{
    foreach (string file in fileArray)
    {
        input.AddImage(file);
    }
    OcrResult result = ocr.Read(input);
    Console.WriteLine(result.Text);
}

string[] fileArray = Directory.GetFiles(@"D:\Invoices\", "*.png");
IronTesseract ocr = new IronTesseract();
using (OcrInput input = new OcrInput())
{
    foreach (string file in fileArray)
    {
        input.AddImage(file);
    }
    OcrResult result = ocr.Read(input);
    Console.WriteLine(result.Text);
}

Dim fileArray() As String = Directory.GetFiles("D:\Invoices\", "*.png")
Dim ocr As New IronTesseract()
Using input As New OcrInput()
	For Each file As String In fileArray
		input.AddImage(file)
	Next file
	Dim result As OcrResult = ocr.Read(input)
	Console.WriteLine(result.Text)
End Using

VB C#

The above code will get all the PNG images from the folder, extract data, and then extracted data of all the invoices in the folder will be printed on the console.

Invoice OCR Machine Learning (Step-By-Step-Tutorial): Figure 7 - Extracted Data

Save Extracted Data as a Searchable PDF Invoice

The following code will read all the images from the folder, perform data extraction, and save them as a single PDF searchable PDF invoice.

string[] fileArray = Directory.GetFiles(@"D:\Invoices\", "*.png");
IronTesseract ocr = new IronTesseract();
using (OcrInput input = new OcrInput())
{
    foreach (string file in fileArray)
    {
        input.AddImage(file);
    }
    OcrResult result = ocr.Read(input);
    result.SaveAsSearchablePdf(@"D:\Invoices\Searchable.pdf");
}

string[] fileArray = Directory.GetFiles(@"D:\Invoices\", "*.png");
IronTesseract ocr = new IronTesseract();
using (OcrInput input = new OcrInput())
{
    foreach (string file in fileArray)
    {
        input.AddImage(file);
    }
    OcrResult result = ocr.Read(input);
    result.SaveAsSearchablePdf(@"D:\Invoices\Searchable.pdf");
}

Dim fileArray() As String = Directory.GetFiles("D:\Invoices\", "*.png")
Dim ocr As New IronTesseract()
Using input As New OcrInput()
	For Each file As String In fileArray
		input.AddImage(file)
	Next file
	Dim result As OcrResult = ocr.Read(input)
	result.SaveAsSearchablePdf("D:\Invoices\Searchable.pdf")
End Using

VB C#

The code is almost similar in all examples, we are just making slight changes for demonstrating different use cases. The output PDF is shown below:

Invoice OCR Machine Learning (Step-By-Step-Tutorial): Figure 8 - PDF Output

In this way, IronPDF provides the easiest way to automate invoice processing and document processing.

Extract Invoice Data from PDF Invoices

To extract data from PDF invoices using IronOCR, you can follow a similar approach as in the previous code example. IronOCR is capable of handling both image-based and text-based PDFs. Here's a brief example of how to extract data from a PDF invoice:

string[] fileArray = Directory.GetFiles(@"D:\Invoices\", "*.pdf");
IronTesseract ocr = new IronTesseract();
using (OcrInput input = new OcrInput())
{
    foreach (string file in fileArray)
    {
        input.AddPdf(file);
    }
    OcrResult result = ocr.Read(input);
    Console.WriteLine(result.Text);
}

string[] fileArray = Directory.GetFiles(@"D:\Invoices\", "*.pdf");
IronTesseract ocr = new IronTesseract();
using (OcrInput input = new OcrInput())
{
    foreach (string file in fileArray)
    {
        input.AddPdf(file);
    }
    OcrResult result = ocr.Read(input);
    Console.WriteLine(result.Text);
}

Dim fileArray() As String = Directory.GetFiles("D:\Invoices\", "*.pdf")
Dim ocr As New IronTesseract()
Using input As New OcrInput()
	For Each file As String In fileArray
		input.AddPdf(file)
	Next file
	Dim result As OcrResult = ocr.Read(input)
	Console.WriteLine(result.Text)
End Using

VB C#

The above code efficiently batch processes multiple PDF invoices located in a directory (@"D:\Invoices\") using IronOCR. It retrieves the file paths, adds each PDF for OCR processing, combines the extracted text, and prints the result to the console. This approach streamlines invoice data extraction for organizations dealing with a substantial number of invoices, enhancing efficiency and reducing manual effort.

Invoice OCR Machine Learning (Step-By-Step-Tutorial): Figure 9 - Extract Output

Conclusion

In summary, the fusion of machine learning and advanced OCR technology, like IronOCR, is reshaping how invoices are handled. This article walked you through the process of using IronOCR, showing its remarkable advantages. By adopting IronOCR, businesses can achieve greater accuracy, save time and money, and effortlessly handle invoices in various formats and languages. The elimination of manual data entry not only boosts efficiency but also reduces the likelihood of costly errors in financial transactions. IronOCR simplifies and improves the invoice processing workflow, making it a smart choice for businesses aiming to enhance their financial operations in today's competitive environment. Moreover, IronOCR offers a suite of powerful features, including support for 125+ languages, customizable data extraction, and compatibility with image-based and text-based PDFs.

While IronOCR's feature set is impressive, it's also noteworthy that IronOCR's pricing model is designed to accommodate a wide range of business needs, offering flexible options with a free trial for both small enterprises and larger corporations. Whether you're processing a few invoices or managing a high volume of financial documents, IronOCR stands as a dependable and cost-effective solution.

Examples

Invoice OCR Machine Learning (Step-By-Step-Tutorial)

Understanding Invoice OCR Tool

The Benefits of Using IronOCR for Invoice OCR

1. Accuracy and Reduced Errors

2. Time and Cost Savings

3. Improved Efficiency

4. Scalability

5. Global Reach

6. Multi-format Support

7. Customization and Data Extraction

8. Compliance and Audit Trail

9. Reduced Invoice Processing Cycle

10. Enhanced Data Analysis

Implementing IronOCR for Invoice Processing

Step 1: Create a New C#

Step 2: Install IronOCR via NuGet Package Manager

Step 3: Implement OCR in Your C#

Save Extracted Data as a Searchable PDF Invoice

Extract Invoice Data from PDF Invoices

Conclusion

IronOCR Blog

Ready to get started? Version: 2024.4 just released

Test in a live environment

Fully-functional product

24/5 technical support

Test in a live environment

Fully-functional product

24/5 technical support

Test in a live environment

Fully-functional product

24/5 technical support

IronOCR is a part of IRONSUITE

Invoice OCR Machine Learning (Step-By-Step-Tutorial)

Understanding Invoice OCR Tool

The Benefits of Using IronOCR for Invoice OCR

1. Accuracy and Reduced Errors

2. Time and Cost Savings

3. Improved Efficiency

4. Scalability

5. Global Reach

6. Multi-format Support

7. Customization and Data Extraction

8. Compliance and Audit Trail

9. Reduced Invoice Processing Cycle

10. Enhanced Data Analysis

Implementing IronOCR for Invoice Processing

Step 1: Create a New C#

Step 2: Install IronOCR via NuGet Package Manager

Step 3: Implement OCR in Your C#

Save Extracted Data as a Searchable PDF Invoice

Extract Invoice Data from PDF Invoices

Conclusion

IronOCR Blog

Ready to get started? Version: 2024.4 just released

Get your FREE

The trial form was submittedsuccessfully.

The trial form was submittedsuccessfully.

The trial form was submittedsuccessfully.

The trial form was submittedsuccessfully.

Test in a live environment

Fully-functional product

24/5 technical support

Get your free 30-day Trial Key instantly.

The trial form was submittedsuccessfully.

Trusted by Over 2 Million Engineers Worldwide

Test in a live environment

Fully-functional product

24/5 technical support

Get your free 30-day Trial Key instantly.

The trial form was submittedsuccessfully.

Trusted by Over 2 Million Engineers Worldwide

Test in a live environment

Fully-functional product

24/5 technical support

Get your free 30-day Trial Key instantly.

Trusted by Over 2 Million Engineers Worldwide

IronOCR is a part of IRONSUITE

The trial form was submitted
successfully.

The trial form was submitted
successfully.

The trial form was submitted
successfully.

The trial form was submitted
successfully.

The trial form was submitted
successfully.

The trial form was submitted
successfully.