使用 IRONOCR C# 中的 OCR 发票处理(开发者教程) Kannapat Udonpant 已更新:六月 22, 2025 Download IronOCR NuGet 下载 DLL 下载 Windows 安装程序 Start Free Trial Copy for LLMs Copy for LLMs Copy page as Markdown for LLMs Open in ChatGPT Ask ChatGPT about this page Open in Gemini Ask Gemini about this page Open in Grok Ask Grok about this page Open in Perplexity Ask Perplexity about this page Share Share on Facebook Share on X (Twitter) Share on LinkedIn Copy URL Email article Invoice data processing refers to receiving, managing, and validating invoices from suppliers or vendors and ensuring that the payments are made correctly and on time. It involves steps designed to ensure accuracy, compliance, and efficiency in handling business transactions to avoid paper invoices. Automated invoice processing can significantly reduce manual data entry errors and improve efficiency. IronOCR is a powerful Optical Character Recognition (OCR) software library that can be used to extract data or text from invoices from a digital file, making it an excellent tool for automating invoice OCR processing in C# applications. How to process invoice data using OCR software like IronOCR Create a Visual Studio project. Install the IronOCR C# library. Sample input invoice image. Utilize Tesseract and extract data from the receipt image. Read only a region of an image. Optical Character Recognition (OCR) Optical Character Recognition is a technology that enables recognizing and converting different types of documents, PDFs, or images of text into editable and searchable data. OCR technology processes images of text and extracts the characters, making them machine-readable. Advanced OCR invoice software systems help in financial management tools and invoice automation. Key Points about OCR Functionality: OCR software scans images or text (e.g., photos or scanned documents) and converts the characters into digital text that can be edited, searched, and stored. Applications: OCR is widely used in various industries for tasks like digitizing printed documents, invoice processing, form data extraction, automatic number plate recognition (ANPR), accounts payable workflow, and scanning books. Technology: OCR uses algorithms to identify patterns of light and dark to interpret characters. Modern OCR systems also employ machine learning and artificial intelligence to improve accuracy. Benefits: OCR improves productivity by automating data entry, reducing errors, and allowing for easier data search and retrieval. It also supports document archiving and helps businesses manage paperless workflows. OCR technology has evolved significantly, making it highly accurate and useful for processing documents and invoice data extraction across many different invoice formats to reduce manual data entry, eliminate manual invoice processing, and enhance data security. IronOCR IronOCR is a powerful Optical Character Recognition (OCR) library for .NET (C#) that allows developers to extract text from images, PDFs, and other document formats, develop OCR invoice software, and implement accounts payable workflow. It provides an easy-to-use API for integrating OCR capabilities into the accounts payable system or accounting system. Key Features of IronOCR Text Extraction: It can extract text from various image formats (PNG, JPG, TIFF, etc.) and PDFs, including multipage PDFs for accounting software. Accuracy: IronOCR uses advanced algorithms and machine learning techniques to provide high accuracy in text recognition, even for noisy or low-quality images for accounts payable processes and early payment discounts. Language Support: The library supports multiple languages, including English, Spanish, French, and others, which helps in recognizing text in different languages. Ease of Use: IronOCR offers a simple API that allows developers to quickly integrate OCR functionality into their applications without requiring deep technical knowledge of OCR techniques. Barcode and QR Code Recognition: In addition to standard text recognition, IronOCR can also detect and extract barcodes and QR codes from images. PDF Support: It can read and extract text from scanned PDFs, making it useful for processing invoices, receipts, and other business documents. Customization: The library allows customization of OCR settings for specific needs, such as adjusting the accuracy or handling different image resolutions. Prerequisites Before you start, ensure you have the following: Visual Studio is installed on your machine. Basic understanding of C# programming. IronOCR NuGet package installed in your project. Step 1: Create a Visual Studio project Open Visual Studio and click on Create a new project. Select Console App in the options. Provide project name and path. Select the .NET Version type. Step 2: Install the IronOCR C# library In your project in Visual Studio go to Tools > NuGet Package Manager > Manage NuGet Packages for Solution. Click on the Browse tab and search for IronOCR. Select IronOCR and click Install. Another option is to use the console and the below command. dotnet add package IronOcr --version 2024.12.2 Step 3: Sample input invoice image Sample digital invoice image with the invoice number. Step 4: Utilize Tesseract and extract data from the receipt image Now use the below code to extract data from an invoice for OCR invoice processing. using IronOcr; // Set the license key License.LicenseKey = "Your License"; string filePath = "sample1.jpg"; // Path to the invoice image // Create an instance of IronTesseract var ocr = new IronTesseract(); // Load the image for OCR using (var ocrInput = new OcrInput()) { ocrInput.LoadImage(filePath); // Optionally apply filters if needed ocrInput.Deskew(); // ocrInput.DeNoise(); // Perform OCR to extract text var ocrResult = ocr.Read(ocrInput); // Output the extracted text Console.WriteLine("Extracted Text:"); Console.WriteLine(ocrResult.Text); // Next steps would involve processing the extracted text } using IronOcr; // Set the license key License.LicenseKey = "Your License"; string filePath = "sample1.jpg"; // Path to the invoice image // Create an instance of IronTesseract var ocr = new IronTesseract(); // Load the image for OCR using (var ocrInput = new OcrInput()) { ocrInput.LoadImage(filePath); // Optionally apply filters if needed ocrInput.Deskew(); // ocrInput.DeNoise(); // Perform OCR to extract text var ocrResult = ocr.Read(ocrInput); // Output the extracted text Console.WriteLine("Extracted Text:"); Console.WriteLine(ocrResult.Text); // Next steps would involve processing the extracted text } Imports IronOcr ' Set the license key License.LicenseKey = "Your License" Dim filePath As String = "sample1.jpg" ' Path to the invoice image ' Create an instance of IronTesseract Dim ocr = New IronTesseract() ' Load the image for OCR Using ocrInput As New OcrInput() ocrInput.LoadImage(filePath) ' Optionally apply filters if needed ocrInput.Deskew() ' ocrInput.DeNoise(); ' Perform OCR to extract text Dim ocrResult = ocr.Read(ocrInput) ' Output the extracted text Console.WriteLine("Extracted Text:") Console.WriteLine(ocrResult.Text) ' Next steps would involve processing the extracted text End Using $vbLabelText $csharpLabel Code Explanation The provided code demonstrates how to use the IronOCR library in C# to extract text from an image (e.g., an invoice) using OCR (Optical Character Recognition). Here's an explanation of each part of the code: License Key Setup: The code begins by setting the license key for IronOCR. This key is required to use the full functionality of the library. If you have a valid license, replace "Your License" with your actual license key. Specifying the Input File: The filePath variable holds the location of the image that contains the invoice (in this case, "sample1.jpg"). This is the file that will be processed for text extraction. Creating an OCR Instance: An instance of IronTesseract is created. IronTesseract is the class responsible for performing the OCR operation on the input data. Loading the Image: The code creates an OcrInput object, which loads the image specified by filePath using the LoadImage method. Applying Image Filters: The code optionally applies filters like Deskew() to correct skewed images and improve OCR accuracy. Performing OCR: The ocr.Read() method extracts text from the loaded image, returning an OcrResult containing the extracted text. Displaying the Extracted Text: The extracted text is printed to the console. This text is what IronOCR has recognized from the image and can be used for further processing. Output Step 5: Read only a region of an image To improve efficiency, only a part of the image can be processed for extraction. using IronOcr; using IronSoftware.Drawing; // Set the license key License.LicenseKey = "Your Key"; string filePath = "sample1.jpg"; // Path to the invoice image // Create an instance of IronTesseract var ocr = new IronTesseract(); // Load the image for OCR using (var ocrInput = new OcrInput()) { // Define the region of interest var ContentArea = new Rectangle(x: 0, y: 0, width: 1000, height: 250); ocrInput.LoadImage(filePath, ContentArea); // Optionally apply filters if needed ocrInput.Deskew(); // ocrInput.DeNoise(); // Perform OCR to extract text var ocrResult = ocr.Read(ocrInput); // Output the extracted text Console.WriteLine("Extracted Text:"); Console.WriteLine(ocrResult.Text); } using IronOcr; using IronSoftware.Drawing; // Set the license key License.LicenseKey = "Your Key"; string filePath = "sample1.jpg"; // Path to the invoice image // Create an instance of IronTesseract var ocr = new IronTesseract(); // Load the image for OCR using (var ocrInput = new OcrInput()) { // Define the region of interest var ContentArea = new Rectangle(x: 0, y: 0, width: 1000, height: 250); ocrInput.LoadImage(filePath, ContentArea); // Optionally apply filters if needed ocrInput.Deskew(); // ocrInput.DeNoise(); // Perform OCR to extract text var ocrResult = ocr.Read(ocrInput); // Output the extracted text Console.WriteLine("Extracted Text:"); Console.WriteLine(ocrResult.Text); } Imports IronOcr Imports IronSoftware.Drawing ' Set the license key License.LicenseKey = "Your Key" Dim filePath As String = "sample1.jpg" ' Path to the invoice image ' Create an instance of IronTesseract Dim ocr = New IronTesseract() ' Load the image for OCR Using ocrInput As New OcrInput() ' Define the region of interest Dim ContentArea = New Rectangle(x:= 0, y:= 0, width:= 1000, height:= 250) ocrInput.LoadImage(filePath, ContentArea) ' Optionally apply filters if needed ocrInput.Deskew() ' ocrInput.DeNoise(); ' Perform OCR to extract text Dim ocrResult = ocr.Read(ocrInput) ' Output the extracted text Console.WriteLine("Extracted Text:") Console.WriteLine(ocrResult.Text) End Using $vbLabelText $csharpLabel Code Explanation This code extracts text from a specific region of an image using IronOCR, with options for image filters that enhance accuracy. Here's a breakdown of each part: License Setup: Sets the license key for IronOCR, which is necessary for using the library's OCR features. Replace "Your Key" with your valid license key. Defining the Image File Path: Specifies the file path to the invoice image to be processed, which contains the content for text extraction. Creating an OCR Instance: An instance of IronTesseract is created to perform the OCR operations. Defining the Area to Process: Specifies a rectangle area within the image (starting at top-left corner) to focus the OCR process on a relevant section, improving efficiency. Loading the Image: Loads the specified content area of the image from the file. This confines OCR processing to a specific part of the image. Applying Filters: Applies filters like Deskew() to enhance image alignment and potentially DeNoise() to clean the image, improving OCR accuracy. Extracting the Text: Reads the text from the defined region and stores it in an OcrResult. Output the Extracted Text: Outputs the OCR-processed text to the console for further use. Output License (Trial Available) IronOCR requires a key to extract data from invoices. Get your developer trial key from the licensing page. using IronOcr; License.LicenseKey = "Your Key"; using IronOcr; License.LicenseKey = "Your Key"; Imports IronOcr License.LicenseKey = "Your Key" $vbLabelText $csharpLabel Conclusion This article provided a basic example of how to get started with IronOCR for invoice processing. You can further customize and expand this code to fit your specific requirements. IronOCR provides an efficient and easy-to-integrate solution for extracting text from images and PDFs, making it ideal for invoice processing. By using IronOCR in combination with C# string manipulation or regular expressions, you can quickly process and extract important data from invoices. This is a basic example of invoice processing, and with more advanced configurations (like language recognition, multi-page PDF processing, etc.), you can fine-tune the OCR results to improve accuracy for your specific use case. IronOCR's API is flexible, and it can be used for a wide variety of OCR tasks beyond invoice processing, including receipt scanning, document conversion, and data entry automation. 常见问题解答 如何在 C# 中自动化发票数据处理? 您可以使用 IronOCR 从数字发票文件中提取文本和数据来自动化发票数据处理。这减少了手动数据输入错误并提高了处理发票的效率。 设置用于发票处理的 OCR 需要哪些步骤? 要设置用于发票处理的 OCR,首先创建 Visual Studio 项目,安装 IronOCR 库并使用示例发票图像。然后,您可以利用 IronOCR 的功能来提取和处理发票数据。 如何使用 OCR 从发票的特定区域提取数据? IronOCR 允许您通过设置矩形区域来定义图像的特定区域以聚焦 OCR 过程。此功能通过仅针对发票的必要部分来提高效率和准确性。 Tesseract 在 IronOCR 中的作用是什么? Tesseract 是 IronOCR 的一部分,在从图像中提取文本方面起着至关重要的作用。它有助于将文本的图像转换为机器可读数据,这对于在 C# 应用程序中自动化发票处理至关重要。 OCR 软件能识别多种语言的文本吗? 是的,IronOCR 支持多种语言,能够识别和处理多种语言的文本,例如英语、西班牙语和法语,从而提高了其处理全球发票的多功能性。 使用 IronOCR 进行发票处理的好处是什么? 使用 IronOCR 进行发票处理可以提供高精度文本提取、多语言支持、条形码识别和 PDF 处理能力等好处,这些都能简化应付账款工作流程。 如何为特定的发票处理需求自定义 OCR 设置? IronOCR 提供了一个简单的 API,允许开发人员自定义 OCR 设置。这种灵活性使得能够为特定的发票处理需求提供定制解决方案,例如处理不同的发票格式或语言。 OCR 在数字发票管理中的重要性是什么? OCR 在数字发票管理中至关重要,因为它自动化了从发票中提取数据的过程,减少了手动工作量,减少了错误,并确保了财务交易的高效和准确的处理。 是否有试用版可用于测试 IronOCR 的功能? 是的,IronOCR 提供了一个开发者试用密钥,您可以从他们的许可页面获得,允许您在购买前测试软件的全部功能。 IronOCR 如何改善文档转换和数据输入自动化? IronOCR 通过提供高精度的文本提取从多种格式中提取文本,增强了文档转换和数据输入自动化,实现了 C# 应用程序中自动化数据处理的无缝集成。 Kannapat Udonpant 立即与工程团队聊天 软件工程师 在成为软件工程师之前,Kannapat 在日本北海道大学完成了环境资源博士学位。在攻读学位期间,Kannapat 还成为了车辆机器人实验室的成员,隶属于生物生产工程系。2022 年,他利用自己的 C# 技能加入 Iron Software 的工程团队,专注于 IronPDF。Kannapat 珍视他的工作,因为他可以直接从编写大多数 IronPDF 代码的开发者那里学习。除了同行学习外,Kannapat 还喜欢在 Iron Software 工作的社交方面。不撰写代码或文档时,Kannapat 通常可以在他的 PS5 上玩游戏或重温《最后生还者》。 相关文章 已发布九月 29, 2025 如何使用 IronOCR 创建 .NET OCR SDK 使用 IronOCR 的 .NET SDK 创建强大的 OCR 解决方案。简单的 API、企业功能,以及用于文档处理应用程序的跨平台支持。 阅读更多 已发布九月 29, 2025 如何在 C# GitHub 项目中集成 OCR 使用 IronOCR OCR C# GitHub 教程:使用 IronOCR 在您的 GitHub 项目中实施文本识别。包括代码示例和版本控制技巧。 阅读更多 已更新九月 4, 2025 我们如何将文档处理内存减少 98%:IronOCR 工程突破 IronOCR 2025.9 通过流架构将 TIFF 处理内存减少 98%,消除崩溃并提高企业工作流的速度。 阅读更多 优化性能以更快、更高效的 OCR 处理C# 中的超市收据 OCR(开发...
已发布九月 29, 2025 如何使用 IronOCR 创建 .NET OCR SDK 使用 IronOCR 的 .NET SDK 创建强大的 OCR 解决方案。简单的 API、企业功能,以及用于文档处理应用程序的跨平台支持。 阅读更多
已发布九月 29, 2025 如何在 C# GitHub 项目中集成 OCR 使用 IronOCR OCR C# GitHub 教程:使用 IronOCR 在您的 GitHub 项目中实施文本识别。包括代码示例和版本控制技巧。 阅读更多
已更新九月 4, 2025 我们如何将文档处理内存减少 98%:IronOCR 工程突破 IronOCR 2025.9 通过流架构将 TIFF 处理内存减少 98%,消除崩溃并提高企业工作流的速度。 阅读更多