Saltar al pie de página
USANDO IRONOCR

API de escaneo de recibos (Tutorial para desarrolladores)

A receipt scanning API extracts key data from receipts using advanced OCR technology. It streamlines the data entry process by eliminating manual errors and enhancing productivity. The API, versatile and accurate, supports multiple languages, currencies, and formats. By automating receipt parsing, businesses can gain insights into spending patterns and make data-driven decisions. This article will demonstrate how to use the C# OCR library, IronOCR, to extract important information from a receipt.

IronOCR

IronOCR is a versatile OCR library and API developed by Iron Software, offering developers a powerful solution for extracting text from various sources such as scanned documents, images, and PDFs. With its advanced OCR algorithms, computer vision, and machine learning models, IronOCR ensures high accuracy and reliability, even in challenging scenarios. The library supports multiple languages and font styles, making it suitable for global applications. By incorporating IronOCR with machine learning model capabilities into their applications, developers can easily automate data entry, text analysis, and other tasks, enhancing productivity and efficiency.

With IronOCR, developers can effortlessly fetch text from a variety of sources, including documents, photographs, screenshots, and even live camera feeds as JSON responses. By utilizing sophisticated algorithms and machine learning models, IronOCR analyzes the image data, recognizes individual characters, and converts them into machine-readable text. This extracted text can then be used for various purposes, such as data entry, information retrieval, text analysis, and automation of manual tasks.

Prerequisites

Before you can start working with IronOCR, there are a few prerequisites that need to be in place. These prerequisites include:

  1. Ensure that you have a suitable development environment set up on your computer. This typically involves having an Integrated Development Environment (IDE) such as Visual Studio installed.
  2. It's important to have a basic understanding of the C# programming language. This will enable you to comprehend and modify the code examples provided in the article effectively.
  3. You'll need to have the IronOCR library installed in your project. This can be accomplished by using the NuGet Package Manager within Visual Studio or through the command line interface.

By ensuring that these prerequisites are met, you'll be ready to dive into the process of working with IronOCR.

Creating a New Visual Studio Project

To get started with IronOCR, the first step is to create a new Visual Studio project.

Open Visual Studio and go to Files, then hover on New, and click on Project.

Receipt Scanning API (Developer Tutorial), Figure 1: New Project Image New Project Image

In the new window, select Console Application and click on Next.

Receipt Scanning API (Developer Tutorial), Figure 2: Console Application Console Application

A new window will appear. Write the name of your new project, and location and click on Next.

Receipt Scanning API (Developer Tutorial), Figure 3: Project Configuration Project Configuration

Finally, provide the Target Framework and click on Create.

Receipt Scanning API (Developer Tutorial), Figure 4: Target Framework Target Framework

Now your new Visual Studio project is created, let's install the IronOCR.

Installing IronOCR

There are several methods for downloading and installing the IronOCR library. However, here are the two simplest approaches.

  1. Using the Visual Studio NuGet Package Manager
  2. Using the Visual Studio Command Line

Using the Visual Studio NuGet Package Manager

IronOCR may be included in a C# project by utilizing the Visual Studio NuGet Package Manager.

Navigate to the NuGet Package Manager graphical user interface by selecting Tools > NuGet Package Manager > Manage NuGet Packages for Solution

Receipt Scanning API (Developer Tutorial), Figure 5: NuGet Package Manager NuGet Package Manager

After this, a new window will appear. Search for IronOCR and install the package in the project.

Receipt Scanning API (Developer Tutorial), Figure 6: IronOCR IronOCR

Additional language packs for IronOCR can also be installed using the same method described above.

Using the Visual Studio Command Line

  1. In Visual Studio, go to Tools > NuGet Package Manager > Package Manager Console
  2. Enter the following line in the Package Manager Console tab:

    Install-Package IronOcr

    Receipt Scanning API (Developer Tutorial), Figure 7: Package Manager Console Package Manager Console

The package will now download/install in the current project and be ready to use.

Data extraction using receipt OCR API

Extracting data from receipt images using IronOCR and saving them in structured data form is a lifesaver for most developers. Using IronOCR, you can achieve that with just a few lines of code. Using this you can extract line items, pricing, tax amount, total amount, and many more with different document types.

using IronOcr;
using System;
using System.Collections.Generic;
using System.Text.RegularExpressions;

class ReceiptScanner
{
    static void Main()
    {
        var ocr = new IronTesseract();
        // Load the image of the receipt
        using (var input = new OcrInput(@"r2.png"))
        {
            // Perform OCR on the input image
            var result = ocr.Read(input);

            // Regular expression patterns to extract relevant details from the OCR result
            var descriptionPattern = @"\w+\s+(.*?)\s+(\d+\.\d+)\s+Units\s+(\d+\.\d+)\s+Tax15%\s+\$(\d+\.\d+)";
            var pricePattern = @"\$\d+(\.\d{2})?";

            // Variables to store extracted data
            var descriptions = new List<string>();
            var unitPrices = new List<decimal>();
            var taxes = new List<decimal>();
            var amounts = new List<decimal>();

            var lines = result.Text.Split('\n');
            foreach (var line in lines)
            {
                // Match each line against the description pattern
                var descriptionMatch = Regex.Match(line, descriptionPattern);
                if (descriptionMatch.Success)
                {
                    descriptions.Add(descriptionMatch.Groups[1].Value.Trim());
                    unitPrices.Add(decimal.Parse(descriptionMatch.Groups[2].Value));

                    // Calculate tax and total amount for each item
                    var tax = unitPrices[unitPrices.Count - 1] * 0.15m;
                    taxes.Add(tax);
                    amounts.Add(unitPrices[unitPrices.Count - 1] + tax);
                }
            }

            // Output the extracted data
            for (int i = 0; i < descriptions.Count; i++)
            {
                Console.WriteLine($"Description: {descriptions[i]}");
                Console.WriteLine($"Quantity: 1.00 Units");
                Console.WriteLine($"Unit Price: ${unitPrices[i]:0.00}");
                Console.WriteLine($"Taxes: ${taxes[i]:0.00}");
                Console.WriteLine($"Amount: ${amounts[i]:0.00}");
                Console.WriteLine("-----------------------");
            }
        }
    }
}
using IronOcr;
using System;
using System.Collections.Generic;
using System.Text.RegularExpressions;

class ReceiptScanner
{
    static void Main()
    {
        var ocr = new IronTesseract();
        // Load the image of the receipt
        using (var input = new OcrInput(@"r2.png"))
        {
            // Perform OCR on the input image
            var result = ocr.Read(input);

            // Regular expression patterns to extract relevant details from the OCR result
            var descriptionPattern = @"\w+\s+(.*?)\s+(\d+\.\d+)\s+Units\s+(\d+\.\d+)\s+Tax15%\s+\$(\d+\.\d+)";
            var pricePattern = @"\$\d+(\.\d{2})?";

            // Variables to store extracted data
            var descriptions = new List<string>();
            var unitPrices = new List<decimal>();
            var taxes = new List<decimal>();
            var amounts = new List<decimal>();

            var lines = result.Text.Split('\n');
            foreach (var line in lines)
            {
                // Match each line against the description pattern
                var descriptionMatch = Regex.Match(line, descriptionPattern);
                if (descriptionMatch.Success)
                {
                    descriptions.Add(descriptionMatch.Groups[1].Value.Trim());
                    unitPrices.Add(decimal.Parse(descriptionMatch.Groups[2].Value));

                    // Calculate tax and total amount for each item
                    var tax = unitPrices[unitPrices.Count - 1] * 0.15m;
                    taxes.Add(tax);
                    amounts.Add(unitPrices[unitPrices.Count - 1] + tax);
                }
            }

            // Output the extracted data
            for (int i = 0; i < descriptions.Count; i++)
            {
                Console.WriteLine($"Description: {descriptions[i]}");
                Console.WriteLine($"Quantity: 1.00 Units");
                Console.WriteLine($"Unit Price: ${unitPrices[i]:0.00}");
                Console.WriteLine($"Taxes: ${taxes[i]:0.00}");
                Console.WriteLine($"Amount: ${amounts[i]:0.00}");
                Console.WriteLine("-----------------------");
            }
        }
    }
}
Imports Microsoft.VisualBasic
Imports IronOcr
Imports System
Imports System.Collections.Generic
Imports System.Text.RegularExpressions

Friend Class ReceiptScanner
	Shared Sub Main()
		Dim ocr = New IronTesseract()
		' Load the image of the receipt
		Using input = New OcrInput("r2.png")
			' Perform OCR on the input image
			Dim result = ocr.Read(input)

			' Regular expression patterns to extract relevant details from the OCR result
			Dim descriptionPattern = "\w+\s+(.*?)\s+(\d+\.\d+)\s+Units\s+(\d+\.\d+)\s+Tax15%\s+\$(\d+\.\d+)"
			Dim pricePattern = "\$\d+(\.\d{2})?"

			' Variables to store extracted data
			Dim descriptions = New List(Of String)()
			Dim unitPrices = New List(Of Decimal)()
			Dim taxes = New List(Of Decimal)()
			Dim amounts = New List(Of Decimal)()

			Dim lines = result.Text.Split(ControlChars.Lf)
			For Each line In lines
				' Match each line against the description pattern
				Dim descriptionMatch = Regex.Match(line, descriptionPattern)
				If descriptionMatch.Success Then
					descriptions.Add(descriptionMatch.Groups(1).Value.Trim())
					unitPrices.Add(Decimal.Parse(descriptionMatch.Groups(2).Value))

					' Calculate tax and total amount for each item
					Dim tax = unitPrices(unitPrices.Count - 1) * 0.15D
					taxes.Add(tax)
					amounts.Add(unitPrices(unitPrices.Count - 1) + tax)
				End If
			Next line

			' Output the extracted data
			For i As Integer = 0 To descriptions.Count - 1
				Console.WriteLine($"Description: {descriptions(i)}")
				Console.WriteLine($"Quantity: 1.00 Units")
				Console.WriteLine($"Unit Price: ${unitPrices(i):0.00}")
				Console.WriteLine($"Taxes: ${taxes(i):0.00}")
				Console.WriteLine($"Amount: ${amounts(i):0.00}")
				Console.WriteLine("-----------------------")
			Next i
		End Using
	End Sub
End Class
$vbLabelText   $csharpLabel

As you can see below, IronOCR can easily extract the required text from the receipt.

Receipt Scanning API (Developer Tutorial), Figure 8: Output Output

Extract the whole receipt

If you want to extract the whole receipt, you can easily do this with a few lines of code on the OCR receipt.

using IronOcr;
using System;

class WholeReceiptExtractor
{
    static void Main()
    {
        var ocr = new IronTesseract();
        using (var input = new OcrInput(@"r3.png"))
        {
            // Perform OCR on the entire receipt and print text output to console
            var result = ocr.Read(input);
            Console.WriteLine(result.Text);
        }
    }
}
using IronOcr;
using System;

class WholeReceiptExtractor
{
    static void Main()
    {
        var ocr = new IronTesseract();
        using (var input = new OcrInput(@"r3.png"))
        {
            // Perform OCR on the entire receipt and print text output to console
            var result = ocr.Read(input);
            Console.WriteLine(result.Text);
        }
    }
}
Imports IronOcr
Imports System

Friend Class WholeReceiptExtractor
	Shared Sub Main()
		Dim ocr = New IronTesseract()
		Using input = New OcrInput("r3.png")
			' Perform OCR on the entire receipt and print text output to console
			Dim result = ocr.Read(input)
			Console.WriteLine(result.Text)
		End Using
	End Sub
End Class
$vbLabelText   $csharpLabel

Receipt Scanning API (Developer Tutorial), Figure 9: Scan receipt API output Scan receipt API output

The receipt image scanning API, such as IronOCR, offers a powerful software solution for automating the extraction of data from receipts. By leveraging advanced OCR technology, businesses can easily extract important information from receipt images or scans, including business vendor names, purchase dates, itemized lists, prices, taxes, and total amounts. With support for multiple languages, currencies, receipt formats, and barcode support, businesses can streamline their receipt management processes, save time, gain insights into spending patterns, and make data-driven decisions. IronOCR, as a versatile OCR library and API, provides developers with the tools they need to extract text from various sources accurately and efficiently, enabling automation of tasks and improving overall efficiency. By meeting the necessary prerequisites and integrating IronOCR into their applications, developers can unlock the benefits of receipt data processing and enhance their workflows.

For more information on IronOCR, visit this licensing page. To know about how to use computer vision to find text, visit this computer vision how-to page. For more tutorials on receipt OCR, visit the following OCR C# tutorial.

Preguntas Frecuentes

¿Cómo puedo automatizar la extracción de datos de recibos usando OCR en C#?

Puede automatizar la extracción de datos de recibos en C# utilizando IronOCR, que le permite extraer detalles clave como artículos de línea, precios, impuestos y montos totales de las imágenes de recibos con alta precisión.

¿Cuáles son los requisitos previos para configurar un proyecto de escaneo de recibos en C#?

Para configurar un proyecto de escaneo de recibos en C#, necesita Visual Studio, conocimiento básico de programación en C# y la biblioteca IronOCR instalada en su proyecto.

¿Cómo instalo la biblioteca OCR usando NuGet Package Manager en Visual Studio?

Abra Visual Studio y vaya a Herramientas > Administrador de paquetes NuGet > Administrar paquetes NuGet para la solución, busque IronOCR e instálelo en su proyecto.

¿Puedo instalar la biblioteca OCR utilizando la línea de comandos de Visual Studio?

Sí, puede instalar IronOCR abriendo la consola del Administrador de Paquetes en Visual Studio y ejecutando el comando: Install-Package IronOcr.

¿Cómo extraigo texto de un recibo completo usando OCR?

Para extraer texto de un recibo completo, use IronOCR para realizar OCR en la imagen completa del recibo y luego muestre el texto extraído usando código en C#.

¿Qué beneficios proporciona una API de escaneo de recibos?

Una API de escaneo de recibos como IronOCR automatiza la extracción de datos, minimiza errores manuales, mejora la productividad y proporciona información sobre patrones de gasto para mejores decisiones comerciales.

¿La biblioteca OCR es compatible con múltiples idiomas y monedas?

Sí, IronOCR es compatible con múltiples idiomas, monedas y formatos de recibos, lo que lo hace ideal para aplicaciones globales.

¿Qué tan precisa es la biblioteca OCR para extraer texto de imágenes?

IronOCR garantiza alta precisión utilizando algoritmos avanzados de OCR, visión por computadora y modelos de aprendizaje automático, incluso en escenarios desafiantes.

¿Qué tipos de datos se pueden extraer de los recibos usando OCR?

IronOCR puede extraer datos como artículos de línea, precios, montos de impuestos, montos totales y otros detalles del recibo.

¿Cómo puede la automatización del análisis de recibos mejorar los procesos empresariales?

La automatización del análisis de recibos con IronOCR mejora los procesos empresariales al reducir la entrada manual, permitir la recopilación precisa de datos y habilitar la toma de decisiones basada en datos.

Kannaopat Udonpant
Ingeniero de Software
Antes de convertirse en Ingeniero de Software, Kannapat completó un doctorado en Recursos Ambientales de la Universidad de Hokkaido en Japón. Mientras perseguía su grado, Kannapat también se convirtió en miembro del Laboratorio de Robótica de Vehículos, que es parte del Departamento de Ingeniería ...
Leer más