Passer au contenu du pied de page
UTILISATION D'IRONOCR

Traitement des factures avec OCR en C# (Tutoriel pour développeurs)

Invoice data processing refers to receiving, managing, and validating invoices from suppliers or vendors and ensuring that the payments are made correctly and on time. It involves steps designed to ensure accuracy, compliance, and efficiency in handling business transactions to avoid paper invoices. Automated invoice processing can significantly reduce manual data entry errors and improve efficiency. IronOCR is a powerful Optical Character Recognition (OCR) software library that can be used to extract data or text from invoices from a digital file, making it an excellent tool for automating invoice OCR processing in C# applications.

How to process invoice data using OCR software like IronOCR

  1. Create a Visual Studio project.
  2. Install the IronOCR C# library.
  3. Sample input invoice image.
  4. Utilize Tesseract and extract data from the receipt image.
  5. Read only a region of an image.

Optical Character Recognition (OCR)

Optical Character Recognition is a technology that enables recognizing and converting different types of documents, PDFs, or images of text into editable and searchable data. OCR technology processes images of text and extracts the characters, making them machine-readable. Advanced OCR invoice software systems help in financial management tools and invoice automation.

Key Points about OCR

  • Functionality: OCR software scans images or text (e.g., photos or scanned documents) and converts the characters into digital text that can be edited, searched, and stored.
  • Applications: OCR is widely used in various industries for tasks like digitizing printed documents, invoice processing, form data extraction, automatic number plate recognition (ANPR), accounts payable workflow, and scanning books.
  • Technology: OCR uses algorithms to identify patterns of light and dark to interpret characters. Modern OCR systems also employ machine learning and artificial intelligence to improve accuracy.
  • Benefits: OCR improves productivity by automating data entry, reducing errors, and allowing for easier data search and retrieval. It also supports document archiving and helps businesses manage paperless workflows.

OCR technology has evolved significantly, making it highly accurate and useful for processing documents and invoice data extraction across many different invoice formats to reduce manual data entry, eliminate manual invoice processing, and enhance data security.

IronOCR

IronOCR is a powerful Optical Character Recognition (OCR) library for .NET (C#) that allows developers to extract text from images, PDFs, and other document formats, develop OCR invoice software, and implement accounts payable workflow. It provides an easy-to-use API for integrating OCR capabilities into the accounts payable system or accounting system.

Key Features of IronOCR

  • Text Extraction: It can extract text from various image formats (PNG, JPG, TIFF, etc.) and PDFs, including multipage PDFs for accounting software.
  • Accuracy: IronOCR uses advanced algorithms and machine learning techniques to provide high accuracy in text recognition, even for noisy or low-quality images for accounts payable processes and early payment discounts.
  • Language Support: The library supports multiple languages, including English, Spanish, French, and others, which helps in recognizing text in different languages.
  • Ease of Use: IronOCR offers a simple API that allows developers to quickly integrate OCR functionality into their applications without requiring deep technical knowledge of OCR techniques.
  • Barcode and QR Code Recognition: In addition to standard text recognition, IronOCR can also detect and extract barcodes and QR codes from images.
  • PDF Support: It can read and extract text from scanned PDFs, making it useful for processing invoices, receipts, and other business documents.
  • Customization: The library allows customization of OCR settings for specific needs, such as adjusting the accuracy or handling different image resolutions.

Prerequisites

Before you start, ensure you have the following:

  • Visual Studio is installed on your machine.
  • Basic understanding of C# programming.
  • IronOCR NuGet package installed in your project.

Step 1: Create a Visual Studio project

Open Visual Studio and click on Create a new project.

OCR Invoice Processing in C# (Developer Tutorial): Figure 1 - New Project

Select Console App in the options.

OCR Invoice Processing in C# (Developer Tutorial): Figure 2 - Console App

Provide project name and path.

OCR Invoice Processing in C# (Developer Tutorial): Figure 3 - Project Configuration

Select the .NET Version type.

OCR Invoice Processing in C# (Developer Tutorial): Figure 4 - Target Framework

Step 2: Install the IronOCR C# library

In your project in Visual Studio go to Tools > NuGet Package Manager > Manage NuGet Packages for Solution. Click on the Browse tab and search for IronOCR. Select IronOCR and click Install.

OCR Invoice Processing in C# (Developer Tutorial): Figure 5 - IronOCR

Another option is to use the console and the below command.

dotnet add package IronOcr --version 2024.12.2

Step 3: Sample input invoice image

Sample digital invoice image with the invoice number.

OCR Invoice Processing in C# (Developer Tutorial): Figure 6 - Sample Input

Step 4: Utilize Tesseract and extract data from the receipt image

Now use the below code to extract data from an invoice for OCR invoice processing.

using IronOcr;

// Set the license key
License.LicenseKey = "Your License";
string filePath = "sample1.jpg"; // Path to the invoice image

// Create an instance of IronTesseract
var ocr = new IronTesseract();

// Load the image for OCR
using (var ocrInput = new OcrInput())
{
    ocrInput.LoadImage(filePath);

    // Optionally apply filters if needed 
    ocrInput.Deskew();
    // ocrInput.DeNoise();

    // Perform OCR to extract text
    var ocrResult = ocr.Read(ocrInput);

    // Output the extracted text
    Console.WriteLine("Extracted Text:");
    Console.WriteLine(ocrResult.Text);

    // Next steps would involve processing the extracted text
}
using IronOcr;

// Set the license key
License.LicenseKey = "Your License";
string filePath = "sample1.jpg"; // Path to the invoice image

// Create an instance of IronTesseract
var ocr = new IronTesseract();

// Load the image for OCR
using (var ocrInput = new OcrInput())
{
    ocrInput.LoadImage(filePath);

    // Optionally apply filters if needed 
    ocrInput.Deskew();
    // ocrInput.DeNoise();

    // Perform OCR to extract text
    var ocrResult = ocr.Read(ocrInput);

    // Output the extracted text
    Console.WriteLine("Extracted Text:");
    Console.WriteLine(ocrResult.Text);

    // Next steps would involve processing the extracted text
}
Imports IronOcr

' Set the license key
License.LicenseKey = "Your License"
Dim filePath As String = "sample1.jpg" ' Path to the invoice image

' Create an instance of IronTesseract
Dim ocr = New IronTesseract()

' Load the image for OCR
Using ocrInput As New OcrInput()
	ocrInput.LoadImage(filePath)

	' Optionally apply filters if needed 
	ocrInput.Deskew()
	' ocrInput.DeNoise();

	' Perform OCR to extract text
	Dim ocrResult = ocr.Read(ocrInput)

	' Output the extracted text
	Console.WriteLine("Extracted Text:")
	Console.WriteLine(ocrResult.Text)

	' Next steps would involve processing the extracted text
End Using
$vbLabelText   $csharpLabel

Code Explanation

The provided code demonstrates how to use the IronOCR library in C# to extract text from an image (e.g., an invoice) using OCR (Optical Character Recognition). Here's an explanation of each part of the code:

  1. License Key Setup:

    • The code begins by setting the license key for IronOCR. This key is required to use the full functionality of the library. If you have a valid license, replace "Your License" with your actual license key.
  2. Specifying the Input File:

    • The filePath variable holds the location of the image that contains the invoice (in this case, "sample1.jpg"). This is the file that will be processed for text extraction.
  3. Creating an OCR Instance:

    • An instance of IronTesseract is created. IronTesseract is the class responsible for performing the OCR operation on the input data.
  4. Loading the Image:

    • The code creates an OcrInput object, which loads the image specified by filePath using the LoadImage method.
  5. Applying Image Filters:

    • The code optionally applies filters like Deskew() to correct skewed images and improve OCR accuracy.
  6. Performing OCR:

    • The ocr.Read() method extracts text from the loaded image, returning an OcrResult containing the extracted text.
  7. Displaying the Extracted Text:
    • The extracted text is printed to the console. This text is what IronOCR has recognized from the image and can be used for further processing.

Output

OCR Invoice Processing in C# (Developer Tutorial): Figure 7 - OCR Output with Invoice Number

Step 5: Read only a region of an image

To improve efficiency, only a part of the image can be processed for extraction.

using IronOcr;
using IronSoftware.Drawing;

// Set the license key
License.LicenseKey = "Your Key";
string filePath = "sample1.jpg"; // Path to the invoice image

// Create an instance of IronTesseract
var ocr = new IronTesseract();

// Load the image for OCR
using (var ocrInput = new OcrInput())
{
    // Define the region of interest
    var ContentArea = new Rectangle(x: 0, y: 0, width: 1000, height: 250);
    ocrInput.LoadImage(filePath, ContentArea);

    // Optionally apply filters if needed 
    ocrInput.Deskew();
    // ocrInput.DeNoise();

    // Perform OCR to extract text
    var ocrResult = ocr.Read(ocrInput);

    // Output the extracted text
    Console.WriteLine("Extracted Text:");
    Console.WriteLine(ocrResult.Text);
}
using IronOcr;
using IronSoftware.Drawing;

// Set the license key
License.LicenseKey = "Your Key";
string filePath = "sample1.jpg"; // Path to the invoice image

// Create an instance of IronTesseract
var ocr = new IronTesseract();

// Load the image for OCR
using (var ocrInput = new OcrInput())
{
    // Define the region of interest
    var ContentArea = new Rectangle(x: 0, y: 0, width: 1000, height: 250);
    ocrInput.LoadImage(filePath, ContentArea);

    // Optionally apply filters if needed 
    ocrInput.Deskew();
    // ocrInput.DeNoise();

    // Perform OCR to extract text
    var ocrResult = ocr.Read(ocrInput);

    // Output the extracted text
    Console.WriteLine("Extracted Text:");
    Console.WriteLine(ocrResult.Text);
}
Imports IronOcr
Imports IronSoftware.Drawing

' Set the license key
License.LicenseKey = "Your Key"
Dim filePath As String = "sample1.jpg" ' Path to the invoice image

' Create an instance of IronTesseract
Dim ocr = New IronTesseract()

' Load the image for OCR
Using ocrInput As New OcrInput()
	' Define the region of interest
	Dim ContentArea = New Rectangle(x:= 0, y:= 0, width:= 1000, height:= 250)
	ocrInput.LoadImage(filePath, ContentArea)

	' Optionally apply filters if needed 
	ocrInput.Deskew()
	' ocrInput.DeNoise();

	' Perform OCR to extract text
	Dim ocrResult = ocr.Read(ocrInput)

	' Output the extracted text
	Console.WriteLine("Extracted Text:")
	Console.WriteLine(ocrResult.Text)
End Using
$vbLabelText   $csharpLabel

Code Explanation

This code extracts text from a specific region of an image using IronOCR, with options for image filters that enhance accuracy. Here's a breakdown of each part:

  1. License Setup:

    • Sets the license key for IronOCR, which is necessary for using the library's OCR features. Replace "Your Key" with your valid license key.
  2. Defining the Image File Path:

    • Specifies the file path to the invoice image to be processed, which contains the content for text extraction.
  3. Creating an OCR Instance:

    • An instance of IronTesseract is created to perform the OCR operations.
  4. Defining the Area to Process:

    • Specifies a rectangle area within the image (starting at top-left corner) to focus the OCR process on a relevant section, improving efficiency.
  5. Loading the Image:

    • Loads the specified content area of the image from the file. This confines OCR processing to a specific part of the image.
  6. Applying Filters:

    • Applies filters like Deskew() to enhance image alignment and potentially DeNoise() to clean the image, improving OCR accuracy.
  7. Extracting the Text:

    • Reads the text from the defined region and stores it in an OcrResult.
  8. Output the Extracted Text:
    • Outputs the OCR-processed text to the console for further use.

Output

OCR Invoice Processing in C# (Developer Tutorial): Figure 8 - Extracted Output

License (Trial Available)

IronOCR requires a key to extract data from invoices. Get your developer trial key from the licensing page.

using IronOcr; 
License.LicenseKey = "Your Key";
using IronOcr; 
License.LicenseKey = "Your Key";
Imports IronOcr
License.LicenseKey = "Your Key"
$vbLabelText   $csharpLabel

Conclusion

This article provided a basic example of how to get started with IronOCR for invoice processing. You can further customize and expand this code to fit your specific requirements.

IronOCR provides an efficient and easy-to-integrate solution for extracting text from images and PDFs, making it ideal for invoice processing. By using IronOCR in combination with C# string manipulation or regular expressions, you can quickly process and extract important data from invoices.

This is a basic example of invoice processing, and with more advanced configurations (like language recognition, multi-page PDF processing, etc.), you can fine-tune the OCR results to improve accuracy for your specific use case.

IronOCR's API is flexible, and it can be used for a wide variety of OCR tasks beyond invoice processing, including receipt scanning, document conversion, and data entry automation.

Questions Fréquemment Posées

Comment puis-je automatiser le traitement des données de factures en C#?

Vous pouvez automatiser le traitement des données de factures en C# en utilisant IronOCR pour extraire le texte et les données des fichiers de factures numériques. Cela réduit les erreurs de saisie manuelle de données et améliore l'efficacité dans la gestion des factures.

Quelles étapes sont impliquées dans la mise en place de l'OCR pour le traitement des factures?

Pour mettre en place l'OCR pour le traitement des factures, commencez par créer un projet Visual Studio, installez la bibliothèque IronOCR, et utilisez des images de factures d'exemple. Vous pouvez ensuite utiliser les fonctionnalités d'IronOCR pour extraire et traiter les données des factures.

Comment extraire des données de régions spécifiques d'une facture en utilisant l'OCR?

IronOCR vous permet de définir des régions spécifiques d'une image en définissant une zone rectangulaire pour concentrer le processus OCR. Cette fonctionnalité améliore l'efficacité et la précision en ciblant uniquement les parties nécessaires d'une facture.

Quel est le rôle de Tesseract dans IronOCR?

Tesseract fait partie d'IronOCR et joue un rôle crucial dans l'extraction de texte à partir d'images. Il aide à convertir les images de texte en données lisibles par machine, ce qui est essentiel pour automatiser le traitement des factures dans les applications C#.

Les logiciels OCR peuvent-ils reconnaître du texte dans plusieurs langues?

Oui, IronOCR prend en charge plusieurs langues, ce qui le rend capable de reconnaître et de traiter du texte dans diverses langues, telles que l'anglais, l'espagnol et le français, augmentant sa polyvalence dans la gestion des factures mondiales.

Quels sont les avantages d'utiliser IronOCR pour le traitement des factures?

Utiliser IronOCR pour le traitement des factures offre des avantages comme une haute précision dans l'extraction de texte, la prise en charge de plusieurs langues, la reconnaissance de codes-barres et des capacités de traitement de PDF, qui simplifient les flux de travail des comptes fournisseurs.

Comment puis-je personnaliser les paramètres OCR pour des besoins spécifiques de traitement des factures?

IronOCR fournit une API simple qui permet aux développeurs de personnaliser les paramètres OCR. Cette flexibilité permet des solutions sur mesure pour des besoins spécifiques de traitement des factures, comme la gestion de différents formats ou langues de factures.

Quelle est l'importance de l'OCR dans la gestion numérique des factures?

L'OCR est critique dans la gestion numérique des factures car il automatise l'extraction des données des factures, réduisant la charge de travail manuelle, minimisant les erreurs et assurant un traitement efficace et précis des transactions financières.

Existe-t-il une version d'essai disponible pour tester les capacités d'IronOCR?

Oui, IronOCR offre une clé d'essai pour les développeurs que vous pouvez obtenir sur leur page de licences, vous permettant de tester la pleine fonctionnalité du logiciel avant de vous engager à un achat.

Comment IronOCR améliore-t-il la conversion de documents et l'automatisation de la saisie de données?

IronOCR améliore la conversion de documents et l'automatisation de la saisie de données en fournissant une extraction de texte de haute précision à partir de divers formats, permettant une intégration fluide dans les applications C# pour le traitement automatisé des données.

Kannaopat Udonpant
Ingénieur logiciel
Avant de devenir ingénieur logiciel, Kannapat a obtenu un doctorat en ressources environnementales à l'université d'Hokkaido au Japon. Pendant qu'il poursuivait son diplôme, Kannapat est également devenu membre du laboratoire de robotique de véhicules, qui fait partie du département de bioproduction. En 2022, il a utilisé ses compé...
Lire la suite