Saltar al pie de página
USANDO IRONOCR

API de factura OCR (Tutorial para desarrolladores)

Invoice OCR API utilizes machine learning and computer vision to transform invoice data into a format suitable for automated processing. This technology addresses manual data entry issues like delays, costs, and errors, accurately extracting details like vendor information, invoice numbers, and prices from both digital and scanned invoices.

This article will use a top-of-the-line invoice OCR API named IronOCR.

1. IronOCR

IronOCR, developed by Iron Software, is an OCR library offering a range of tools for developers. It uses machine learning and computer vision to extract text from scanned documents, images, and PDFs, enabling automated processing. Its APIs integrate into various languages and platforms, reducing manual data entry errors and improving efficiency. Extracted data can be analyzed and integrated into existing systems, aiding decision-making and productivity. Features like image preprocessing, barcode recognition, and file parsing increase its versatility. IronOCR empowers developers to incorporate text recognition into their applications.

2. Prerequisites

Before you can start working with IronOCR, there are a few prerequisites that need to be in place. These prerequisites include:

  1. Ensure that you have a suitable development environment set up on your computer. This typically involves having an Integrated Development Environment (IDE) such as Visual Studio installed.
  2. It's important to have a basic understanding of the C# programming language. This will enable you to comprehend and modify the code examples provided in the article effectively.
  3. You'll need to have the IronOCR library installed in your project. This can be accomplished by using the NuGet Package Manager within Visual Studio or through the command line interface.

By ensuring that these prerequisites are met, you'll be ready to dive into the process of working with IronOCR.

3. Creating a New Visual Studio Project

To get started with IronOCR, the first step is to create a new Visual Studio project.

Open Visual Studio and go to Files, then hover on New, and click on Project.

Invoice OCR API (Developer Tutorial): Figure 1 - New Project New Project

In the new window, select Console Application and click on Next.

Invoice OCR API (Developer Tutorial): Figure 2 - Console Application Console Application

A new window will appear, write the name of your new project, and location and click on Next.

Invoice OCR API (Developer Tutorial): Figure 3 - Project Configuration Project Configuration

Finally, provide the Target framework and click on Create.

Invoice OCR API (Developer Tutorial): Figure 4 - Target Framework Target Framework

Now your new Visual Studio project is created. Let's install IronOCR.

4. Installing IronOCR

There are several methods for downloading and installing the IronOCR library. But here are the two simplest approaches.

  1. Using the Visual Studio NuGet Package Manager
  2. Using the Visual Studio Command Line

4.1. Using the Visual Studio NuGet Package Manager

IronOCR may be included in a C# project by utilizing the Visual Studio NuGet Package Manager.

Navigate to the NuGet Package Manager graphical user interface by selecting Tools > NuGet Package Manager > Manage NuGet Packages for Solution

Invoice OCR API (Developer Tutorial): Figure 5 - NuGet Package Manager NuGet Package Manager

After this, a new window will appear. Search for IronOCR and install the package in the project.

Invoice OCR API (Developer Tutorial): Figure 6 - Select the IronOCR package in NuGet Package Manager UI Select the IronOCR package in NuGet Package Manager UI

Additional language packs for IronOCR can also be installed using the same method described above.

4.2. Using the Visual Studio Command-Line

  1. In Visual Studio, go to Tools > NuGet Package Manager > Package Manager Console
  2. Enter the following line in the Package Manager Console tab to install IronOCR:

    Install-Package IronOcr

Invoice OCR API (Developer Tutorial): Figure 7 - Package Manager Console Package Manager Console

The package will now download/install in the current project and be ready to use.

5. Extract data from Invoices using IronOCR

Using IronOCR, you can easily extract data from invoices with just a few lines of code and use that data extraction for further processes like data entry. This will replace manual data entry and many more.

Here is an example invoice to extract text from.

Invoice OCR API (Developer Tutorial): Figure 8 - The sample invoice The sample invoice

Now, let's write the code to extract all the data from this invoice.

using IronOcr;
using System;

// Initialize a new instance of the IronTesseract class
var ocr = new IronTesseract();

// Use the OcrInput object to load the image file
using (var input = new OcrInput(@"r2.png"))
{
    // Read the image using the Read method, which performs OCR
    var result = ocr.Read(input);

    // Output the extracted text to the console
    Console.WriteLine(result.Text);
}
using IronOcr;
using System;

// Initialize a new instance of the IronTesseract class
var ocr = new IronTesseract();

// Use the OcrInput object to load the image file
using (var input = new OcrInput(@"r2.png"))
{
    // Read the image using the Read method, which performs OCR
    var result = ocr.Read(input);

    // Output the extracted text to the console
    Console.WriteLine(result.Text);
}
Imports IronOcr
Imports System

' Initialize a new instance of the IronTesseract class
Private ocr = New IronTesseract()

' Use the OcrInput object to load the image file
Using input = New OcrInput("r2.png")
	' Read the image using the Read method, which performs OCR
	Dim result = ocr.Read(input)

	' Output the extracted text to the console
	Console.WriteLine(result.Text)
End Using
$vbLabelText   $csharpLabel

The above code gets input in the form of an image and then extracts data from that image using a Read method from the IronTesseract class.

Invoice OCR API (Developer Tutorial): Figure 9 - Invoice Parser Invoice Parser

5.1. Invoice Processing to extract specific data from invoices

You can also extract specific data from invoices like customer invoice numbers. Below is the code to extract the customer invoice number from the invoice.

using IronOcr;
using System;
using System.Text.RegularExpressions;

// Initialize a new instance of the IronTesseract class
var ocr = new IronTesseract();

// Use the OcrInput object to load the image file
using (var input = new OcrInput(@"r2.png"))
{
    // Perform OCR on the image
    var result = ocr.Read(input);

    // Define a regular expression pattern for the invoice number
    var linePattern = @"INV\/\d{4}\/\d{5}";

    // Match the pattern in the extracted text
    var lineMatch = Regex.Match(result.Text, linePattern);

    // Check if the pattern matches any part of the text
    if (lineMatch.Success)
    {
        // If a match is found, print the invoice number
        var lineValue = lineMatch.Value;
        Console.WriteLine("Customer Invoice number: " + lineValue);
    }
}
using IronOcr;
using System;
using System.Text.RegularExpressions;

// Initialize a new instance of the IronTesseract class
var ocr = new IronTesseract();

// Use the OcrInput object to load the image file
using (var input = new OcrInput(@"r2.png"))
{
    // Perform OCR on the image
    var result = ocr.Read(input);

    // Define a regular expression pattern for the invoice number
    var linePattern = @"INV\/\d{4}\/\d{5}";

    // Match the pattern in the extracted text
    var lineMatch = Regex.Match(result.Text, linePattern);

    // Check if the pattern matches any part of the text
    if (lineMatch.Success)
    {
        // If a match is found, print the invoice number
        var lineValue = lineMatch.Value;
        Console.WriteLine("Customer Invoice number: " + lineValue);
    }
}
Imports IronOcr
Imports System
Imports System.Text.RegularExpressions

' Initialize a new instance of the IronTesseract class
Private ocr = New IronTesseract()

' Use the OcrInput object to load the image file
Using input = New OcrInput("r2.png")
	' Perform OCR on the image
	Dim result = ocr.Read(input)

	' Define a regular expression pattern for the invoice number
	Dim linePattern = "INV\/\d{4}\/\d{5}"

	' Match the pattern in the extracted text
	Dim lineMatch = Regex.Match(result.Text, linePattern)

	' Check if the pattern matches any part of the text
	If lineMatch.Success Then
		' If a match is found, print the invoice number
		Dim lineValue = lineMatch.Value
		Console.WriteLine("Customer Invoice number: " & lineValue)
	End If
End Using
$vbLabelText   $csharpLabel

Invoice OCR API (Developer Tutorial): Figure 10 - Invoice Scanning Invoice Scanning

6. Conclusion

IronOCR's Invoice OCR API revolutionizes data extraction from invoices using machine learning and computer vision. This technology converts invoice text and numbers into a machine-readable format, simplifying data extraction for analysis, integration, and process improvement. It offers a robust solution for automating invoice processing, improving accuracy, and optimizing workflows like accounts payable. Automated data entry from scanned invoices is also made possible with this technology.

IronOCR offers high accuracy using the best results from Tesseract, without any additional settings. It supports multipage frame TIFF, PDF files, and all popular image formats. It is also possible to read barcode values from images.

Please visit the homepage website for more information on IronOCR. For more tutorials on invoice OCR, visit the following this details invoice OCR tutorial. To know about how to use computer vision to find text such as invoice fields, visit this computer vision how-to.

Preguntas Frecuentes

¿Cómo puedo automatizar el procesamiento de datos de facturas usando OCR?

Puede usar IronOCR para automatizar el procesamiento de datos de facturas aprovechando sus algoritmos de aprendizaje automático. IronOCR extrae detalles como información del proveedor, números de factura y precios de facturas digitales y escaneadas, reduciendo errores de entrada manual e incrementando la eficiencia.

¿Qué pasos se involucran en la configuración de una API de Factura OCR?

Para configurar una API de Factura OCR usando IronOCR, comience descargando e instalando la biblioteca a través del Gestor de Paquetes NuGet de Visual Studio. A continuación, cree un nuevo proyecto en C#, integre IronOCR, y use sus métodos para cargar y leer archivos de imagen para la extracción de texto.

¿Puede IronOCR extraer datos específicos como los números de factura?

Sí, IronOCR puede extraer datos específicos como los números de factura. Utiliza expresiones regulares para coincidir patrones en el texto extraído, permitiéndole obtener información específica de las facturas.

¿Cuáles son algunas características de IronOCR que benefician el procesamiento de facturas?

IronOCR incluye características como preprocesamiento de imágenes, reconocimiento de códigos de barras, y análisis de archivos. Estas mejoran su capacidad para extraer y procesar texto con precisión de varios formatos de factura, mejorando la captura de datos y la eficiencia del flujo de trabajo.

¿Cómo mejora el preprocesamiento de imágenes los resultados de OCR?

El preprocesamiento de imágenes en IronOCR ayuda a mejorar los resultados de OCR al optimizar la calidad de la imagen antes de la extracción de texto. Esto incluye operaciones como ajuste de contraste y reducción de ruido, que pueden llevar a una extracción de datos más precisa de las facturas.

¿Es posible usar IronOCR para facturas tanto digitales como escaneadas?

Sí, IronOCR es capaz de procesar facturas tanto digitales como escaneadas. Utiliza técnicas avanzadas de aprendizaje automático y visión por computadora para extraer texto con precisión de varios formatos y calidades de imagen.

¿Cómo maneja IronOCR múltiples formatos de página y tipos de archivo?

IronOCR admite múltiples formatos de página y tipos populares de archivos de imagen y PDF. Puede extraer texto eficientemente de documentos complejos, haciéndolo versátil para varias aplicaciones de procesamiento de facturas.

¿Dónde pueden los desarrolladores encontrar tutoriales para usar IronOCR?

Los desarrolladores pueden encontrar tutoriales y recursos adicionales en el sitio web de IronOCR. El sitio ofrece una variedad de materiales de aprendizaje, incluidos guías prácticas y artículos de blog para aplicar IronOCR en diferentes escenarios.

Kannaopat Udonpant
Ingeniero de Software
Antes de convertirse en Ingeniero de Software, Kannapat completó un doctorado en Recursos Ambientales de la Universidad de Hokkaido en Japón. Mientras perseguía su grado, Kannapat también se convirtió en miembro del Laboratorio de Robótica de Vehículos, que es parte del Departamento de Ingeniería ...
Leer más