How to Get Text From Invoice in C# Tutorial

Receipt images data/paper receipt document can be scanned and converted into structured data that other programs can use for processing, using OCR (Optical Character Recognition) software technology. Receipt OCR API processing helps developers in creating machine-encoded text from handwritten documents or printed text with line items, tax receipt image/resources, scanned document, data entry, accounting invoices, and receipt paper in any language using software technology like Machine Learning models, Computer Vision, Deep Learning etc. This process saves a lot of time, manual labor, provides enhanced data privacy with value support, relevant information, language accuracy and revolutionizes the process of converting physical receipt data/images with receipt fields to digitalized one.

1. IronOCR, An Optical Character Recognition API

IronOCR is an OCR library that can be used to recognize text data from images for information extraction, including receipt OCR. It is built on the Tesseract OCR engine, which is considered one of the most accurate OCR engines available till date for receipt recognition. IronOCR can be used to read text with key information from different document types, including PNG, JPG, TIFF, JSON format and PDF, and can recognize text in multiple languages.

One of the key features or services of IronOCR that makes it particularly useful for receipt OCR is its ability to automatically detect text orientation, even if the image has been rotated or skewed. This is essential for accurate text recognition on receipts upload and for data extraction, as receipts often have a lot of information and can be folded or crumpled, causing the text to be skewed.

2. IronOCR Features

  • C# OCR uses Deep Learning to scan and recognize texts from pictures, scanned documents, and PDFs.
  • .NET OCR supports more than 127 worldwide languages.
  • IronOCR can read text from images in many file formats, including PNG, JPG, TIFF, and PDF.
  • Text, structured data, JSON output, or searchable PDFs can be produced from extracted information.
  • IronOCR supports .NET 5, 6, and 7 (Core, Framework, and Standard).
  • IronOCR divides the input into different pictures based on text regions. It uses Computer Vision to identify areas that contain text elements.

3. Creating a New Project in Visual Studio

Open the Visual Studio software and go to the File menu. Select "New Project" and then select Console Application.

Enter the Project name and select the path in the appropriate text box. Then, click the Create button. Select the required .NET Framework, as in the screenshot below:

Receipt OCR in C#, Figure 1: New Project

Creating a New Project in Visual Studio

The Visual Studio project will now generate the structure for the Console Application. Once finished, it will open the Program.cs file, in which you can write and execute source code.

Receipt OCR in C#, Figure 2: Program.cs

The program.cs file generated from the Visual Studio New Project Wizard

4. Install IronOCR

In Visual Studio, you can easily integrate IronOCR with your C# project. IronOCR offers multiple ways to integrate with a C# .NET project. Here, we'll discuss one of them: installing IronOCR using the NuGet Package Manager.

In Visual Studio, go to Tools > NuGet Package Manager > Package Manager Console

Receipt OCR in C#, Figure 3: Package Manager Console

The Visual Studio NuGet Package Manager Console

After clicking, a new console will appear at the bottom of Visual Studio's window. Type the below command in the console and press enter.

Install-Package IronOcr

IronOCR will get installed in just a few seconds.

5. Data Extraction from Receipts Using IronOCR

IronOCR is a powerful OCR library that can be used to extract and access data with details from receipts. With IronOCR, you can take a picture of a receipt and convert it into machine-readable text that can be easily analyzed and processed without compromising data privacy.

Here's an example of how you can use IronOCR to extract text from a receipt and show how receipt OCR works.

using IronOcr;
using System;

IronTesseract ocrTesseract = new IronTesseract();

using (OcrInput ocrInput = new OcrInput("ocr.png"))
{
    OcrResult ocrResult = ocrTesseract.Read(ocrInput);
    string RecognizedText = ocrResult.Text;

    Console.WriteLine(RecognizedText);
}
using IronOcr;
using System;

IronTesseract ocrTesseract = new IronTesseract();

using (OcrInput ocrInput = new OcrInput("ocr.png"))
{
    OcrResult ocrResult = ocrTesseract.Read(ocrInput);
    string RecognizedText = ocrResult.Text;

    Console.WriteLine(RecognizedText);
}
Imports IronOcr
Imports System

Private ocrTesseract As New IronTesseract()

Using ocrInput As New OcrInput("ocr.png")
	Dim ocrResult As OcrResult = ocrTesseract.Read(ocrInput)
	Dim RecognizedText As String = ocrResult.Text

	Console.WriteLine(RecognizedText)
End Using
VB   C#

Please refer to the Reading Text from Image tutorial for further details on how IronOCR reads text from images using the C# computer language.

The output of the code presented above is below:

- LOGO SHOP
- LOREM IPSUM
- DOLOR SITAMET CONSECTETUR
- ADIPISCING ELIT
- 1 LOREM IPSUM $3.20
- 2 ORNARE MALESUADA $9.50
- 3 PORTA FERMENTUM $5.90
- 4 SODALES ARCU $6.00
- 5 ELEIFEND $9.00
- 6 SEMNISIMASSA $0.50
- 7 DUIS FAMES DIS $7.60
- 8 FACILISIRISUS $810
- TOTAL AMOUNT $49.80
- CASH $50.00

6. Specific Data Extraction From Receipt Image Using IronOCR

As a powerful OCR library, IronOCR allows developers to important information from scanned receipts, including such as tax amount and merchant name.

Here is an example of how you might use IronOCR to extract the total amount value from a receipt image:

using IronOcr;
using System;

IronTesseract ocrTesseract = new IronTesseract();

ocrTesseract.Language = OcrLanguage.English;

// Code line to load the receipt image
using (OcrInput ocrInput = new OcrInput("ocr.png"))
{
    // Optimize the input image for OCR
    ocrInput.DeNoise(true);
    ocrInput.Contrast();
    ocrInput.EnhanceResolution();
    ocrInput.ToGrayScale();

    OcrResult ocrResult = ocrTesseract.Read(ocrInput);

    // Search for the total price in the OCR result
    var totalPrice = ocrResult.Text.Contains("Total:") ? ocrResult.Text.Split("Total:")[1].Split("\n")[0] : "";
    Console.WriteLine("Total Price: " + totalPrice);
}
using IronOcr;
using System;

IronTesseract ocrTesseract = new IronTesseract();

ocrTesseract.Language = OcrLanguage.English;

// Code line to load the receipt image
using (OcrInput ocrInput = new OcrInput("ocr.png"))
{
    // Optimize the input image for OCR
    ocrInput.DeNoise(true);
    ocrInput.Contrast();
    ocrInput.EnhanceResolution();
    ocrInput.ToGrayScale();

    OcrResult ocrResult = ocrTesseract.Read(ocrInput);

    // Search for the total price in the OCR result
    var totalPrice = ocrResult.Text.Contains("Total:") ? ocrResult.Text.Split("Total:")[1].Split("\n")[0] : "";
    Console.WriteLine("Total Price: " + totalPrice);
}
Imports Microsoft.VisualBasic
Imports IronOcr
Imports System

Private ocrTesseract As New IronTesseract()

ocrTesseract.Language = OcrLanguage.English

' Code line to load the receipt image
Using ocrInput As New OcrInput("ocr.png")
	' Optimize the input image for OCR
	ocrInput.DeNoise(True)
	ocrInput.Contrast()
	ocrInput.EnhanceResolution()
	ocrInput.ToGrayScale()

	Dim ocrResult As OcrResult = ocrTesseract.Read(ocrInput)

	' Search for the total price in the OCR result
	Dim totalPrice = If(ocrResult.Text.Contains("Total:"), ocrResult.Text.Split("Total:")(1).Split(vbLf)(0), "")
	Console.WriteLine("Total Price: " & totalPrice)
End Using
VB   C#

Input

Receipt OCR in C#, Figure 4: Input

The input image used for demonstrating extraction of specific data from receipts

Output

- Total 16.5

7. Read Barcodes on Receipts

IronOCR can be used to read barcodes on receipts as well as text. In order to read barcodes on receipts using IronOCR, you will need to use the BarcodeReader class in combination with the ReadBarCodes method.

Here's an example of how you can use IronOCR to read barcodes on a receipt image.

using IronOcr;
using System;

var ocrTesseract = new IronTesseract();
ocrTesseract.Configuration.ReadBarCodes = true;
using (var ocrInput = new OcrInput("b.png"))
{
    var ocrResult = ocrTesseract.Read(ocrInput);
    foreach (var barcode in ocrResult.Barcodes)
    {
        Console.WriteLine(barcode.Value);
    }
}
using IronOcr;
using System;

var ocrTesseract = new IronTesseract();
ocrTesseract.Configuration.ReadBarCodes = true;
using (var ocrInput = new OcrInput("b.png"))
{
    var ocrResult = ocrTesseract.Read(ocrInput);
    foreach (var barcode in ocrResult.Barcodes)
    {
        Console.WriteLine(barcode.Value);
    }
}
Imports IronOcr
Imports System

Private ocrTesseract = New IronTesseract()
ocrTesseract.Configuration.ReadBarCodes = True
Using ocrInput As New OcrInput("b.png")
	Dim ocrResult = ocrTesseract.Read(ocrInput)
	For Each barcode In ocrResult.Barcodes
		Console.WriteLine(barcode.Value)
	Next barcode
End Using
VB   C#

Input Image

Receipt OCR in C#, Figure 5: Input for Reading Barcode

A sample barcode image

Output Text

Receipt OCR in C#, Figure 6: Output

The result of processing the barcode image

8. Conclusion

The article above explains the process of installing and using IronOCR in a C# project to extract data from receipts, with an example code snippet provided.

Please read the tutorial on reading text from images.

IronOCR is a part of the Iron Suite, which includes five different .NET libraries for manipulating documents and images. You can buy the entire Iron Suite for the price of just two IronOCR licenses.

Try IronOCR in your production apps with a free 30-day trial.

You can download the software product from this link.