How to Get Text From Invoice in C# Tutorial
How to OCR Receipt in Tesseract
- Install C# library OCR receipt with Tesseract
- Explore features rich C# library for performing OCR on receipt
- Extract data from receipt with Tesseract
- Search in the extracted text result for specific data
- Read barcodes value on the input receipt image
1. IronOCR, An Optical Character Recognition API
IronOCR is an OCR library that can be used to recognize text data from images for information extraction, including receipt OCR. It is built on the Tesseract OCR engine, which is considered one of the most accurate OCR engines available to date for receipt recognition. IronOCR can read key information from different document types, including PNG, JPG, TIFF, JSON, and PDF formats, and it can recognize text in multiple languages.
One of the key features of IronOCR that makes it particularly useful for receipt OCR is its ability to automatically detect text orientation, even if the image has been rotated or skewed. This is essential for accurate text recognition on receipts uploads and data extraction, as receipts often contain a lot of information and can be folded or crumpled, causing the text to become skewed.
2. IronOCR Features
- C# OCR uses Deep Learning to scan and recognize texts from pictures, scanned documents, and PDFs.
- .NET OCR supports more than 125 global languages.
- IronOCR can read text from images in many file formats, including PNG, JPG, TIFF, and PDF.
- Text, structured data, JSON output, or searchable PDFs can be produced from extracted information.
- IronOCR supports .NET versions 5, 6, and 7 (Core, Framework, and Standard).
- IronOCR divides the input into different pictures based on text regions. It uses Computer Vision to identify areas that contain text elements.
3. Creating a New Project in Visual Studio
Open Visual Studio and go to the File menu. Select "New Project" and then choose Console Application.
Enter the project name and select the path in the appropriate text box. Then, click the Create button. Select the required .NET Framework, as shown in the screenshot below:
The project structure for the Console Application will now be generated. Once finished, it will open the Program.cs file, in which you can write and execute source code.
4. Install IronOCR
In Visual Studio, you can integrate IronOCR with your C# project easily. IronOCR offers multiple ways to integrate with a C# .NET project. Here, we'll discuss one of them: Installing IronOCR using the NuGet Package Manager.
In Visual Studio, go to Tools > NuGet Package Manager > Package Manager Console
A new console will appear at the bottom of Visual Studio's window. Type the below command in the console and press enter.
Install-Package IronOcr
IronOCR will be installed in just a few seconds.
5. Data Extraction from Receipts Using IronOCR
IronOCR is a powerful OCR library that can be used to extract and access detailed data from receipts. With IronOCR, you can convert a picture of a receipt into machine-readable text that can be easily analyzed and processed without compromising data privacy.
Here's an example of how you can use IronOCR to extract text from a receipt:
using IronOcr;
using System;
class Program
{
static void Main()
{
IronTesseract ocrTesseract = new IronTesseract();
// Load the receipt image
using (OcrInput ocrInput = new OcrInput("ocr.png"))
{
// Read the OCR result
OcrResult ocrResult = ocrTesseract.Read(ocrInput);
string recognizedText = ocrResult.Text;
// Output the recognized text to the console
Console.WriteLine(recognizedText);
}
}
}
using IronOcr;
using System;
class Program
{
static void Main()
{
IronTesseract ocrTesseract = new IronTesseract();
// Load the receipt image
using (OcrInput ocrInput = new OcrInput("ocr.png"))
{
// Read the OCR result
OcrResult ocrResult = ocrTesseract.Read(ocrInput);
string recognizedText = ocrResult.Text;
// Output the recognized text to the console
Console.WriteLine(recognizedText);
}
}
}
Imports IronOcr
Imports System
Friend Class Program
Shared Sub Main()
Dim ocrTesseract As New IronTesseract()
' Load the receipt image
Using ocrInput As New OcrInput("ocr.png")
' Read the OCR result
Dim ocrResult As OcrResult = ocrTesseract.Read(ocrInput)
Dim recognizedText As String = ocrResult.Text
' Output the recognized text to the console
Console.WriteLine(recognizedText)
End Using
End Sub
End Class
Refer to the Reading Text from Image tutorial for further details on how IronOCR reads text from images using C#.
The output of the code above:
- LOGO SHOP
- LOREM IPSUM
- DOLOR SIT AMET CONSECTETUR
- ADIPISCING ELIT
- 1 LOREM IPSUM $3.20
- 2 ORNARE MALESUADA $9.50
- 3 PORTA FERMENTUM $5.90
- 4 SODALES ARCU $6.00
- 5 ELEIFEND $9.00
- 6 SEM NISIMASSA $0.50
- 7 DUIS FAMES DIS $7.60
- 8 FACILISI RISUS $810
- TOTAL AMOUNT $49.80
- CASH $50.00
6. Specific Data Extraction From Receipt Image Using IronOCR
IronOCR allows developers to retrieve crucial information from scanned receipts, such as tax amounts and merchant names.
Here is an example demonstrating how to extract the total amount value from a receipt image:
using IronOcr;
using System;
class Program
{
static void Main()
{
IronTesseract ocrTesseract = new IronTesseract();
// Set the language for OCR
ocrTesseract.Language = OcrLanguage.English;
// Load the receipt image
using (OcrInput ocrInput = new OcrInput("ocr.png"))
{
// Optimize the input image for OCR
ocrInput.DeNoise(true);
ocrInput.Contrast();
ocrInput.EnhanceResolution();
ocrInput.ToGrayScale();
OcrResult ocrResult = ocrTesseract.Read(ocrInput);
// Search for the total amount in the OCR result
var totalAmount = ocrResult.Text.Contains("Total:") ? ocrResult.Text.Split("Total:")[1].Split("\n")[0] : "";
Console.WriteLine("Total Amount: " + totalAmount);
}
}
}
using IronOcr;
using System;
class Program
{
static void Main()
{
IronTesseract ocrTesseract = new IronTesseract();
// Set the language for OCR
ocrTesseract.Language = OcrLanguage.English;
// Load the receipt image
using (OcrInput ocrInput = new OcrInput("ocr.png"))
{
// Optimize the input image for OCR
ocrInput.DeNoise(true);
ocrInput.Contrast();
ocrInput.EnhanceResolution();
ocrInput.ToGrayScale();
OcrResult ocrResult = ocrTesseract.Read(ocrInput);
// Search for the total amount in the OCR result
var totalAmount = ocrResult.Text.Contains("Total:") ? ocrResult.Text.Split("Total:")[1].Split("\n")[0] : "";
Console.WriteLine("Total Amount: " + totalAmount);
}
}
}
Imports Microsoft.VisualBasic
Imports IronOcr
Imports System
Friend Class Program
Shared Sub Main()
Dim ocrTesseract As New IronTesseract()
' Set the language for OCR
ocrTesseract.Language = OcrLanguage.English
' Load the receipt image
Using ocrInput As New OcrInput("ocr.png")
' Optimize the input image for OCR
ocrInput.DeNoise(True)
ocrInput.Contrast()
ocrInput.EnhanceResolution()
ocrInput.ToGrayScale()
Dim ocrResult As OcrResult = ocrTesseract.Read(ocrInput)
' Search for the total amount in the OCR result
Dim totalAmount = If(ocrResult.Text.Contains("Total:"), ocrResult.Text.Split("Total:")(1).Split(vbLf)(0), "")
Console.WriteLine("Total Amount: " & totalAmount)
End Using
End Sub
End Class
Thanks to the multiple settings provided by the OcrInput
class, it is possible to optimize the input image for better accuracy in the OCR process.
Input
Output
- Total 16.5
7. Read Barcodes on Receipts
IronOCR can be used to read barcodes on receipts as well as text. To read barcodes on receipts, you will need to use the BarcodeReader
class in combination with the ReadBarCodes
method.
Here's an example of how to read barcodes:
using IronOcr;
using System;
class Program
{
static void Main()
{
var ocrTesseract = new IronTesseract();
ocrTesseract.Configuration.ReadBarCodes = true;
// Load the receipt image with a barcode
using (var ocrInput = new OcrInput("b.png"))
{
OcrResult ocrResult = ocrTesseract.Read(ocrInput);
// Output the barcode values to the console
foreach (var barcode in ocrResult.Barcodes)
{
Console.WriteLine(barcode.Value);
}
}
}
}
using IronOcr;
using System;
class Program
{
static void Main()
{
var ocrTesseract = new IronTesseract();
ocrTesseract.Configuration.ReadBarCodes = true;
// Load the receipt image with a barcode
using (var ocrInput = new OcrInput("b.png"))
{
OcrResult ocrResult = ocrTesseract.Read(ocrInput);
// Output the barcode values to the console
foreach (var barcode in ocrResult.Barcodes)
{
Console.WriteLine(barcode.Value);
}
}
}
}
Imports IronOcr
Imports System
Friend Class Program
Shared Sub Main()
Dim ocrTesseract = New IronTesseract()
ocrTesseract.Configuration.ReadBarCodes = True
' Load the receipt image with a barcode
Using ocrInput As New OcrInput("b.png")
Dim ocrResult As OcrResult = ocrTesseract.Read(ocrInput)
' Output the barcode values to the console
For Each barcode In ocrResult.Barcodes
Console.WriteLine(barcode.Value)
Next barcode
End Using
End Sub
End Class
Input Image
Output Text
8. Conclusion
The article above explains the process of installing and using IronOCR in a C# project to extract data from receipts, with example code snippets provided.
Please read the tutorial on reading text from images.
IronOCR is a part of the Iron Suite, which includes five different .NET libraries for manipulating documents and images. You can buy the entire Iron Suite for the price of just two IronOCR licenses.
Try IronOCR in your production apps with a free trial.
Frequently Asked Questions
How can I use IronOCR to perform OCR on a receipt image in C#?
You can use IronOCR to perform OCR on a receipt image by loading the image into the OcrInput class and calling the Read
method to extract text data, such as itemized lists and total amounts.
What are the advantages of using IronOCR over Tesseract for invoice processing?
IronOCR offers enhanced accuracy, supports over 125 languages, and includes features like automatic text orientation detection and deep learning capabilities. It's also easier to integrate with C# projects using the NuGet Package Manager.
How do I integrate IronOCR into a Visual Studio project?
To integrate IronOCR into a Visual Studio project, use the NuGet Package Manager. Navigate to Tools > NuGet Package Manager > Package Manager Console, then execute Install-Package IronOcr
to add the library to your project.
Can IronOCR handle multiple languages in receipt OCR?
Yes, IronOCR can handle multiple languages, supporting over 125 global languages, which makes it ideal for processing receipts with multilingual text.
How does IronOCR improve text recognition accuracy in receipts?
IronOCR improves text recognition accuracy through features like deep learning, automatic text orientation detection, and the ability to optimize images using the OcrInput class for better OCR results.
Is it possible to extract itemized lists from receipts using IronOCR?
Yes, IronOCR can be used to extract itemized lists from receipts by processing the text data and identifying line items through pattern matching after performing OCR.
How does IronOCR handle barcode reading on receipts?
IronOCR handles barcode reading by using the BarcodeReader
class and the ReadBarCodes
method to scan and decode barcodes present on receipts.
What file formats can IronOCR process for receipt OCR?
IronOCR can process a variety of file formats for receipt OCR, including PNG, JPG, TIFF, and PDF, making it versatile for different input types.
What steps are involved in setting up IronOCR for invoice processing in C#?
Setting up IronOCR for invoice processing involves installing the library via NuGet, configuring the OcrInput with the receipt image, and using the Read
method to extract text data. You can also use the library's features to enhance accuracy and extract specific data like totals.