Receipt Scanning API (Developer Tutorial)

A receipt scanning API is a powerful tool that automates and simplifies the process of extracting data from receipts. By leveraging advanced receipt OCR API technology, this API can extract receipt fields containing key information such as vendor names, purchase dates, itemized lists, prices, taxes, and total amounts from receipt images or scans of paper receipts. By integrating the OCR receipt scanning API into applications or systems, businesses can automate data entry, eliminate manual data entry errors, and enhance productivity. With support for multiple languages, currencies, and receipt data formats, the receipts OCR API offers versatility and accuracy. By streamlining receipt management processes such as receipt parsing, businesses can save time, reduce manual labor, gain insights into spending patterns, and make data-driven decisions.

In this article, we will discuss how to scan a receipt to get important information out of it. For this purpose, we will use IronOCR, a C# OCR(Optical Character Recognition) library.

IronOCR

IronOCR is a versatile OCR library and API developed by Iron Software, offering developers a powerful solution for extracting text from various sources such as scanned documents, images, and PDFs. With its advanced OCR algorithms, computer vision and machine learning models, IronOCR ensures high accuracy and reliability, even in challenging scenarios. The library supports multiple languages and font styles, making it suitable for global applications. By incorporating IronOCR with machine learning model capabilities into their applications, developers can easily automate data entry, text analysis, and other tasks, enhancing productivity and efficiency.

With IronOCR, developers can effortlessly fetch text from a variety of sources, including documents, photographs, screenshots, and even live camera feeds as JSON response. By utilizing sophisticated algorithms and machine learning models, IronOCR analyzes the image data, recognizes individual characters, and converts them into machine-readable text. This extracted text can then be used for various purposes, such as data entry, information retrieval, text analysis, and automation of manual tasks.

Prerequisites

Before you can start working with IronOCR, there are a few prerequisites that need to be in place. These prerequisites include:

  1. Ensure that you have a suitable development environment set up on your computer. This typically involves having an Integrated Development Environment (IDE) such as Visual Studio installed.
  2. It's important to have a basic understanding of the C# programming language. This will enable you to comprehend and modify the code examples provided in the article effectively.
  3. You'll need to have the IronOCR library installed in your project. This can be accomplished by using the NuGet Package Manager within Visual Studio or through the command line interface.

By ensuring that these prerequisites are met, you'll be ready to dive into the process of working with IronOCR.

Creating a New Visual Studio Project

To get started with IronOCR, we need to create a new Visual Studio project.

Open Visual Studio and go to Files, then hover on New, and click on Project.

Receipt Scanning API (Developer Tutorial): Figure 1 - New Project Image

In the new window, select Console Application and click on Next.

Receipt Scanning API (Developer Tutorial): Figure 2 - Console Application

A new window will appear. Write the name of your new project, location and click on Next.

Receipt Scanning API (Developer Tutorial): Figure 3 - Project Configuration

Finally, provide the Target Framework and click on Create.

Receipt Scanning API (Developer Tutorial): Figure 4 - Target Framework

Now your new Visual Studio project is created, let's install the IronOCR.

Installing IronOCR

There are several methods for downloading and installing the IronOCR library. But we'll only go through two of them.

  1. Using the Visual Studio NuGet Package Manager
  2. Using the Visual Studio Command Line

Using the Visual Studio NuGet Package Manager

IronOCR may be included in a C# project by utilizing the Visual Studio NuGet Package Manager.

Navigate to the NuGet Package Manager graphical user interface by selecting Tools > NuGet Package Manager > Manage NuGet Packages for Solutions...

Receipt Scanning API (Developer Tutorial): Figure 5 - NuGet Package Manager

After this, a new window will appear. Search for IronOCR and install the package in the project.

Receipt Scanning API (Developer Tutorial): Figure 6 - IronOCR

Additional language packs for IronOCR can also be installed using the same method described above.

Using the Visual Studio Command Line

  1. In Visual Studio, go to Tools > NuGet Package Manager > Package Manager Console
  2. Enter the following line in the package manager console tab:

    Install-Package IronOcr

    Receipt Scanning API (Developer Tutorial): Figure 7 - Package Manager Console

The package will now download/install in the current project and be ready to use.

Data extraction using receipt OCR API

Extract data from receipt images using IronOCR and saving them in structured data form is a lifesaver for most developers. Using IronOCR, you can achieve that with just few lines of code. Using this you can extract line items, pricing, tax amount, total amount and many more with different document types.

using IronOcr;
using System;
using System.Collections.Generic;
using System.Text.RegularExpressions;

var Ocr = new IronTesseract();
using (var Input = new OcrInput(@"r2.png"))
{
    var Result = Ocr.Read(Input);
    var DescriptionPattern = @"\[([A-Z0-9_]+)]\s+(.*?)\s+(\d+\.\d+)\s+Units\s+(\d+\.\d+)\s+Tax15%\s+\$(\d+\.\d+)";
    var PricePattern = @"\$\d+(\.\d{2})?";
    var Descriptions = new List<string>();
    var Quantities = new List<string>();
    var UnitPrices = new List<decimal>();
    var Taxes = new List<decimal>();
    var Amounts = new List<decimal>();
    var Lines = Result.Text.Split('\n');
    var DescriptionMatch = Regex.Match(Lines, DescriptionPattern);
    var PriceMatch = Regex.Match(Lines, PricePattern);
    if (DescriptionMatch.Success)
    {
        var DescriptionValue = DescriptionMatch.Groups[2].Value.Trim();
        Descriptions.Add(DescriptionValue);
    }
    Console.WriteLine("Description: " + Descriptions[i]);
    Console.WriteLine("Quantity: 1.00 Units");
    Console.WriteLine("Unit Price: $" + UnitPrices[i]);
    Taxes.Add(cost[i] * 0.15m); // Calculate Taxes (15%)
    Console.WriteLine("Taxes: $" + Taxes[i]);
    Amounts.Add(UnitPrices[i] + Taxes[i]);
    Console.WriteLine("Amount: $" + Amounts[i]);
    Console.WriteLine("-----------------------");
}
using IronOcr;
using System;
using System.Collections.Generic;
using System.Text.RegularExpressions;

var Ocr = new IronTesseract();
using (var Input = new OcrInput(@"r2.png"))
{
    var Result = Ocr.Read(Input);
    var DescriptionPattern = @"\[([A-Z0-9_]+)]\s+(.*?)\s+(\d+\.\d+)\s+Units\s+(\d+\.\d+)\s+Tax15%\s+\$(\d+\.\d+)";
    var PricePattern = @"\$\d+(\.\d{2})?";
    var Descriptions = new List<string>();
    var Quantities = new List<string>();
    var UnitPrices = new List<decimal>();
    var Taxes = new List<decimal>();
    var Amounts = new List<decimal>();
    var Lines = Result.Text.Split('\n');
    var DescriptionMatch = Regex.Match(Lines, DescriptionPattern);
    var PriceMatch = Regex.Match(Lines, PricePattern);
    if (DescriptionMatch.Success)
    {
        var DescriptionValue = DescriptionMatch.Groups[2].Value.Trim();
        Descriptions.Add(DescriptionValue);
    }
    Console.WriteLine("Description: " + Descriptions[i]);
    Console.WriteLine("Quantity: 1.00 Units");
    Console.WriteLine("Unit Price: $" + UnitPrices[i]);
    Taxes.Add(cost[i] * 0.15m); // Calculate Taxes (15%)
    Console.WriteLine("Taxes: $" + Taxes[i]);
    Amounts.Add(UnitPrices[i] + Taxes[i]);
    Console.WriteLine("Amount: $" + Amounts[i]);
    Console.WriteLine("-----------------------");
}
IRON VB CONVERTER ERROR developers@ironsoftware.com
VB   C#

As you can see below, IronOCR can easily extract required text from the receipt.

Receipt Scanning API (Developer Tutorial): Figure 8 - Output

Extract whole receipt

If you want to extract whole receipt, you can easily do this with few lines of code on OCR receipt.


    using IronOcr;
    using System;

    var Ocr = new IronTesseract();
    using (var Input = new OcrInput(@"r3.png"))
    {
        var Result = Ocr.Read(Input);
        Console.WriteLine(Result.Text);
    }

    using IronOcr;
    using System;

    var Ocr = new IronTesseract();
    using (var Input = new OcrInput(@"r3.png"))
    {
        var Result = Ocr.Read(Input);
        Console.WriteLine(Result.Text);
    }
IRON VB CONVERTER ERROR developers@ironsoftware.com
VB   C#

Receipt Scanning API (Developer Tutorial): Figure 9 - Scan receipt API output

The receipt image scanning API, such as IronOCR, offers a powerful software solution for automating the extraction of data from receipts. By leveraging advanced OCR technology, businesses can easily extract important information from receipt images or scans, including business vendor names, purchase dates, itemized lists, prices, taxes, and total amounts. With support for multiple languages, currencies, and receipt formats, businesses can streamline their receipt management processes, save time, gain insights into spending patterns, and make data-driven decisions. IronOCR, as a versatile OCR library and API, provides developers with the tools they need to extract text from various sources accurately and efficiently, enabling automation of tasks and improving overall efficiency. By meeting the necessary prerequisites and integrating IronOCR into their applications, developers can unlock the benefits of receipt data processing and enhance their workflows.

For more information on IronOCR, visit this link. To know about how to use computer vision to find text, visit here. For tutorial on receipt OCR, visit the following link.