using IronOcr; using IronSoftware.Drawing; // We can delve deep into OCR results as an object model of // Pages, Barcodes, Paragraphs, Lines, Words and Characters // This allows us to explore, export and draw OCR content using other APIs/ var ocrTesseract = new IronTesseract(); ocrTesseract.Configuration.ReadBarCodes = true; using var ocrInput = new OcrInput(); var pages = new int[] { 1, 2 }; ocrInput.LoadImageFrames("example.tiff", pages); OcrResult ocrResult = ocrTesseract.Read(ocrInput); foreach (var page in ocrResult.Pages) { // Page object int PageNumber = page.PageNumber; string PageText = page.Text; int PageWordCount = page.WordCount; // null if we dont set Ocr.Configuration.ReadBarCodes = true; OcrResult.Barcode[] Barcodes = page.Barcodes; AnyBitmap PageImage = page.ToBitmap(ocrInput); int PageWidth = page.Width; int PageHeight = page.Height; double PageRotation = page.Rotation; // angular correction in degrees from OcrInput.Deskew() foreach (var paragraph in page.Paragraphs) { // Pages -> Paragraphs int ParagraphNumber = paragraph.ParagraphNumber; string ParagraphText = paragraph.Text; AnyBitmap ParagraphImage = paragraph.ToBitmap(ocrInput); int ParagraphX_location = paragraph.X; int ParagraphY_location = paragraph.Y; int ParagraphWidth = paragraph.Width; int ParagraphHeight = paragraph.Height; double ParagraphOcrAccuracy = paragraph.Confidence; OcrResult.TextFlow paragrapthText_direction = paragraph.TextDirection; foreach (var line in paragraph.Lines) { // Pages -> Paragraphs -> Lines int LineNumber = line.LineNumber; string LineText = line.Text; AnyBitmap LineImage = line.ToBitmap(ocrInput); int LineX_location = line.X; int LineY_location = line.Y; int LineWidth = line.Width; int LineHeight = line.Height; double LineOcrAccuracy = line.Confidence; double LineSkew = line.BaselineAngle; double LineOffset = line.BaselineOffset; foreach (var word in line.Words) { // Pages -> Paragraphs -> Lines -> Words int WordNumber = word.WordNumber; string WordText = word.Text; AnyBitmap WordImage = word.ToBitmap(ocrInput); int WordX_location = word.X; int WordY_location = word.Y; int WordWidth = word.Width; int WordHeight = word.Height; double WordOcrAccuracy = word.Confidence; foreach (var character in word.Characters) { // Pages -> Paragraphs -> Lines -> Words -> Characters int CharacterNumber = character.CharacterNumber; string CharacterText = character.Text; AnyBitmap CharacterImage = character.ToBitmap(ocrInput); int CharacterX_location = character.X; int CharacterY_location = character.Y; int CharacterWidth = character.Width; int CharacterHeight = character.Height; double CharacterOcrAccuracy = character.Confidence; // Output alternative symbols choices and their probability. // Very useful for spellchecking OcrResult.Choice[] Choices = character.Choices; } } } } }

USING IRONOCR

How To Create an OCR Receipt Scanner In C#

Updated February 18, 2024

This tutorial is designed to help beginners create an OCR Receipt Scanner using the IronOCR, an OCR API in C#. By the end of this guide, you will understand how to implement optical character recognition (OCR) to convert different types of receipt files into editable and searchable data using receipt OCR API. This technology can be a game-changer for businesses looking to automate expense management and minimize manual data entry. Let's get started!

How To Create an OCR Receipt Scanner In C#1. Create a C# Console project in Visual Studio

Install the OCR library using NuGet Package Manager3. Load the receipt into the program using the OcrInput method
Extract the text using the Read method5. Show extracted text on the console

Prerequisites

Before we dive into the coding part, make sure you have the following:

Visual Studio: This will be our Integrated Development Environment (IDE), where we will write and run our C# code.
IronOCR Library: We will use IronOCR, an advanced OCR library that can be easily integrated into C# applications.
1. Sample Receipt: A receipt image file named Sample_Receipt.jpg, which we will use to test our OCR implementation.

How To Create an OCR Receipt Scanner In C#: Figure 1 - Image of sample receipt

Step 1: Setting Up the Project

Open Visual Studio: Locate the Visual Studio icon on your desktop or in your applications menu and double-click it to open the program.

Create a New Project: Once Visual Studio is open, you’ll find a launch window. Click on the "Create a new project" button. If you have already opened Visual Studio and don’t see the launch window, you can access this by clicking File > New > Project from the top menu.

Select Project Type: In the “Create a new project” window, you’ll see a variety of project templates. In the search box, type “Console App” to filter the options, then select Console App (.NET Core) or Console App (.NET Framework), depending on your preference and compatibility. Then click the Next button.

Configure Your New Project: Now, you’ll see a screen titled "Configure your new project".

In the Project name field, enter OCRReceiptScanner as the name of your project.
Choose or confirm the location where your project will be saved in the location field.
Optionally, you can also specify a solution name if you want it to be different from the project name.
Click the Next button after filling in these details. Additional Information: You might be asked to select the target .NET Framework. Choose the most recent version (unless you have specific compatibility requirements) and click Create.

Step 2: Integrating IronOCR

Before we can use the IronOCR library, we need to include it in our project. Follow these steps:

Right-click on your project in the Solution Explorer.
Choose "Manage NuGet Packages".
In the NuGet Package Manager window, you will see several tabs like Browse, Installed, Updates, and Consolidate. Click on the Browse tab.
In the search box, type IronOcr. This is the name of the library we wish to add to our project. Press enter to search.
The search results will show the IronOCR library package. It should be one of the first results you see. Click on it to select it.
After selecting the IronOCR package, you will notice a panel on the right side displaying the package's information, including its description and version. There is also an Install button in this panel.
Click the Install button. This action might prompt you to review changes and may show a list of dependencies that will be included along with IronOcr. Review the changes and dependencies, and if everything looks correct, confirm and proceed with the installation.

Step 3: Configuring the Project

After installing IronOCR, your next step is to configure your project. Here's how:

Add Namespaces: At the top of your Program.cs file, include the following namespaces:

using IronOcr;
using System;

using IronOcr;
using System;

Imports IronOcr
Imports System

VB C#

Configuration Settings: If you have any configuration settings like an API key or a license key, make sure to include them. For IronOCR, you'll need to set the license key as shown in the provided code:

License.LicenseKey = "License-Key"; // replace 'License-Key' with your key

License.LicenseKey = "License-Key"; // replace 'License-Key' with your key

License.LicenseKey = "License-Key" ' replace 'License-Key' with your key

VB C#

Step 4: Reading the Receipt

Now, let's write the code to read the receipt.

Define the Path to Your Receipt: Specify the path to the receipt file you want to scan.

string pdfFilePath = "Sample_Receipt.jpg";

string pdfFilePath = "Sample_Receipt.jpg";

Dim pdfFilePath As String = "Sample_Receipt.jpg"

VB C#

Try-Catch Block: Implement error handling using a try-catch block. This will help you manage any exceptions that occur during the OCR process.

try
{
    // OCR code will go here
}
catch (Exception ex)
{
    // Handle exceptions here
    Console.WriteLine($"An error occurred: {ex.Message}");
}

try
{
    // OCR code will go here
}
catch (Exception ex)
{
    // Handle exceptions here
    Console.WriteLine($"An error occurred: {ex.Message}");
}

Try
	' OCR code will go here
Catch ex As Exception
	' Handle exceptions here
	Console.WriteLine($"An error occurred: {ex.Message}")
End Try

VB C#

Step 5: Implementing OCR

In Step 5, we delve into the core functionality of our application: implementing OCR to read and interpret the data from our receipt. This involves initializing the OCR engine, configuring the input, performing the OCR operation, and displaying the results.

Initialize IronTesseract

The first part of the code creates an instance of the IronTesseract class:

var ocr = new IronTesseract();

var ocr = new IronTesseract();

Dim ocr = New IronTesseract()

VB C#

By creating an instance of IronTesseract, we are essentially setting up our OCR tool, gearing it up to perform the text recognition tasks. It's like starting the engine of a car before you can drive it. This object will be used to control the OCR process, including reading the input and extracting text from it.

Configure OCR Input

Next, we define the input for our OCR process:

using (var input = new OcrInput(pdfFilePath))
{
    // OCR processing will go here
}

using (var input = new OcrInput(pdfFilePath))
{
    // OCR processing will go here
}

Using input = New OcrInput(pdfFilePath)
	' OCR processing will go here
End Using

VB C#

In this segment, OcrInput is used to specify the file we want to process. pdfFilePath is a variable that contains the path to our receipt file. By passing this variable to OcrInput, we are telling the OCR engine, "Here's the file I want you to read." The using statement is a special C# construct that ensures that the resources used by OcrInput (like file handles) are properly released once the processing is done. It's a way to manage resources efficiently and ensure that your application runs smoothly without unnecessary memory usage.

Perform OCR

Within the using block, we call the Read method on our ocr instance:

var result = ocr.Read(input);

var result = ocr.Read(input);

Dim result = ocr.Read(input)

VB C#

The Read method will take the input file path as the parameter. This line will start the receipt scanning. It'll do the OCR of the given input file, extract data, and store it into a variable result. We can use the extracted text from this method to perform any text operation.

Output the Results

Finally, we output the text that was recognized by the OCR process:

Console.WriteLine(result.Text);

Console.WriteLine(result.Text);

Console.WriteLine(result.Text)

VB C#

The result variable contains the output of the OCR process and result.Text contains the actual text extracted from the receipt. The Console.WriteLine function then takes this text and displays it on the console. This allows you to see and verify the results of the OCR process. Here is the complete Program.cs file code:

using IronOcr;
using Microsoft.Extensions.Configuration;
using System;
class Program
{
    static void Main(string [] args)
    {
        License.LicenseKey = "Your-License-Key";
        string pdfFilePath = "Sample_Receipt.jpg";
        try
        {
            var ocr = new IronTesseract();
            using (var input = new OcrInput(pdfFilePath))
            {
                var result = ocr.Read(input);
                    Console.WriteLine(result.Text);
            }
        }
        catch (Exception ex)
        {
            // Handle exceptions (e.g., file not found, OCR errors) and log them if necessary.
            Console.WriteLine($"An error occurred: {ex.Message}");
        }
    }
}

using IronOcr;
using Microsoft.Extensions.Configuration;
using System;
class Program
{
    static void Main(string [] args)
    {
        License.LicenseKey = "Your-License-Key";
        string pdfFilePath = "Sample_Receipt.jpg";
        try
        {
            var ocr = new IronTesseract();
            using (var input = new OcrInput(pdfFilePath))
            {
                var result = ocr.Read(input);
                    Console.WriteLine(result.Text);
            }
        }
        catch (Exception ex)
        {
            // Handle exceptions (e.g., file not found, OCR errors) and log them if necessary.
            Console.WriteLine($"An error occurred: {ex.Message}");
        }
    }
}

Imports IronOcr
Imports Microsoft.Extensions.Configuration
Imports System
Friend Class Program
	Shared Sub Main(ByVal args() As String)
		License.LicenseKey = "Your-License-Key"
		Dim pdfFilePath As String = "Sample_Receipt.jpg"
		Try
			Dim ocr = New IronTesseract()
			Using input = New OcrInput(pdfFilePath)
				Dim result = ocr.Read(input)
					Console.WriteLine(result.Text)
			End Using
		Catch ex As Exception
			' Handle exceptions (e.g., file not found, OCR errors) and log them if necessary.
			Console.WriteLine($"An error occurred: {ex.Message}")
		End Try
	End Sub
End Class

VB C#

Step 6: Running Your Application

Build the Project: Click on the 'Build' menu and then select 'Build Solution'.
Run the Project: Press F5 or click on the 'Start' button to run your application.

Now, you see the text from your receipt output to the console. This text represents the data extracted from your receipt image. It's how we scan receipts using IronOCR. This is a simple example of using OCR capabilities to extract data from paper receipts. It's a very generic implementation. You can modify your code to match the layout of your receipt images.

How To Create an OCR Receipt Scanner In C#: Figure 3 - Outputted text from the previous code example

After that, you can use the unstructured data from receipts that we got after scanning receipts. We can get important information from a particular section of the receipt. Or we can show the receipt data in a more organized way. We can make an OCR Receipt Scanning software application using the IronOCR. That will help us to extract accurate data of receipt fields.

Conclusion

Congratulations! You've successfully built an OCR receipt scanner using C# and IronOCR. This scanner can significantly increase the accuracy of data extraction for various business needs such as expense tracking, supply chain management, and more. There will be no more need to review the scanned receipts and extract data manually.

IronOCR offers a free trial, allowing users to explore and assess its capabilities at no initial cost. For those seeking to integrate and leverage the full spectrum of features in a professional setting, licenses begin at $599, providing a comprehensive solution for robust OCR receipt scanning and data extraction needs.

Remember, this is just the beginning. You can expand this application to support various file types, improve data privacy, or integrate additional features like receipt recognition for specific fields such as tax amount, date, line items, and more. With OCR technology, the possibilities are vast, paving the way for more efficient and intelligent business processes. Happy coding!

< PREVIOUS
How to Read Identity Documents Using OCR in C#

NEXT >
How to Create An OCR Solution for Invoice