跳過到頁腳內容
使用 IRONOCR

收據掃描 API:使用 C# 和 IronOCR 從收據中萃取資料

收據掃描 API 使用 OCR 技術自動從收據中提取數據,從而顯著減少人工輸入錯誤並加快處理速度。 本指南展示如何在 C# 中使用 IronOCR 從收據影像中準確提取供應商名稱、日期、商品、價格和總計,並內建影像預處理功能,支援多種格式。

為什麼選擇 IronOCR 進行收據掃描?

IronOCR 是一個靈活的OCR 庫,可從掃描文件、圖像和 PDF 中可靠地提取文字。 IronOCR 憑藉先進的演算法、電腦視覺和機器學習模型,即使在具有挑戰性的場景下也能確保高精度。 該庫支援多種語言和字體樣式,使其適用於全球應用。 透過將 IronOCR 整合到您的應用程式中,您可以自動執行資料輸入和文字分析,從而提高生產力。

IronOCR 如何從收據圖像中提取文字?

IronOCR 可以從文件、照片、螢幕截圖和即時相機視訊串流中檢索文本,並以 JSON 回應的形式返回。 IronOCR 利用複雜的演算法和機器學習技術,分析影像數據,辨識字符,並將其轉換為機器可讀的文字。 該程式庫採用Tesseract 5 技術,並結合專有改進技術,以實現更高的精度。

IronOCR為何是收據處理的理想選擇?

IronOCR 擅長處理低品質掃描、各種收據格式和不同方向的影像。 內建影像預處理濾鏡可在處理前自動改善影像質量,即使是皺巴巴或褪色的收據也能確保最佳效果。

使用 IronOCR 需要哪些條件?

在使用 IronOCR 之前,請確保滿足以下先決條件:

支援哪些開發環境?

1.開發環境:安裝適當的 IDE,例如 Visual Studio。 IronOCR 支援WindowsLinuxmacOSAzureAWS

需要哪些程式設計技能?

  1. C# 知識:對 C# 的基本了解有助於您修改程式碼範例。 IronOCR 提供簡單的範例API 文件

需要哪些軟體相依性?

  1. IronOCR 安裝:透過NuGet 套件管理器安裝。 可能需要特定於平台的依賴項。

是否需要許可證密鑰?

4.許可證金鑰(可選) :提供免費試用; 生產用途需要獲得許可

如何建立一個用於收據掃描的新 Visual Studio 專案?

如何在 Visual Studio 中建立一個新專案?

開啟 Visual Studio,轉到"檔案",然後將滑鼠懸停在"新建"上,然後按一下"專案"。

Visual Studio IDE 中,"檔案"功能表已展開,"新 > 專案"選項已高亮顯示,程式碼編輯器中正在顯示用於載入 Excel 工作簿的 C# 程式碼。 新項目圖片

我應該選擇哪個項目模板?

選擇"控制台應用程式",然後按一下"下一步"。 此模板非常適合在將 IronOCR 應用於 Web 應用程式之前進行學習。

Visual Studio 的"建立新專案"對話方塊顯示已選擇"控制台應用程式"模板,並包含 Windows、Linux 和 macOS 平台選項控制台應用程式

我的收據掃描器專案應該如何命名?

請填寫項目名稱和地點,然後按一下"下一步"。 選擇一個描述性的名稱,例如" ReceiptScanner API"。

Visual Studio 新專案配置畫面,用於建立名為"IronOCR"的控制台應用程序,已選擇 C# 並顯示解決方案設定。 專案配置

我該選擇哪個.NET Framework版本?

為了獲得最佳相容性,請選擇 .NET 5.0 或更高版本,然後按一下"建立"。

Visual Studio 的"附加資訊"對話方塊顯示了控制台應用程式的配置,其中目標框架選擇為 .NET 5.0,平台選項包括 Linux、macOS、Windows 和控制台。 目標框架

如何在我的專案中安裝 IronOCR?

有兩種簡單的安裝方法:

如何使用 NuGet 套件管理器?

前往"工具" > "NuGet 套件管理員" > "管理解決方案的 NuGet 套件"。

Visual Studio NuGet 套件管理器設定對話框,其中包含套件來源配置,以及解決方案資源管理器中的 C# 專案結構。 NuGet 套件管理器

搜尋 IronOCR 並安裝該軟體包。 對於非英文收據,請安裝特定語言的軟體包

Visual Studio 中的 NuGet 套件管理器顯示已安裝的 IronOCR 套件,包括主庫以及阿拉伯語、希伯來語和西班牙語的特定語言 OCR 套件。 IronOCR

如何使用命令列安裝?

  1. 前往"工具" > "NuGet 套件管理器" > "套件管理器控制台"
  2. 輸入以下命令:

    Install-Package IronOcr

    Visual Studio 套件管理器控制台視窗顯示正在為名為"Create PDF"的專案執行 NuGet 命令"PM> Install-Package IronOCR"。 軟體包管理器控制台

如何使用 IronOCR 快速擷取收據資料?

只需幾行程式碼即可提取收據資料:

Nuget Icon立即開始使用 NuGet 建立 PDF 檔案:

  1. 使用 NuGet 套件管理器安裝 IronOCR

    PM > Install-Package IronOcr

  2. 複製並運行這段程式碼。

    using IronOcr;
    using System;
    
    var ocr = new IronTesseract();
    
    // Configure for receipt scanning
    ocr.Configuration.ReadBarCodes = true;
    ocr.Configuration.WhiteListCharacters = "0123456789.$,ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz% ";
    
    using (var input = new OcrInput(@"receipt.jpg"))
    {
        // Apply automatic image enhancement
        input.DeNoise();
        input.Deskew();
        input.EnhanceResolution(225);
    
        // Extract text from receipt
        var result = ocr.Read(input);
    
        // Display extracted text and confidence
        Console.WriteLine($"Extracted Text:\n{result.Text}");
        Console.WriteLine($"\nConfidence: {result.Confidence}%");
    }
  3. 部署到您的生產環境進行測試

    立即開始在您的專案中使用 IronOCR,免費試用!
    arrow pointer

如何從收據影像中提取結構化資料?

IronOCR 可以從各種文件類型中提取商品明細、價格、稅費和總計。該庫支援PDF多頁 TIFF和各種圖像格式

using IronOcr;
using System;
using System.Collections.Generic;
using System.Text.RegularExpressions;

class ReceiptScanner
{
    static void Main()
    {
        var ocr = new IronTesseract();

        // Configure OCR for optimal receipt reading
        ocr.Configuration.WhiteListCharacters = "0123456789.$,ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz% ";
        ocr.Configuration.BlackListCharacters = "~`@#*_}{][|\\";
        ocr.Configuration.TesseractVersion = TesseractVersion.Tesseract5;

        // Load the image of the receipt
        using (var input = new OcrInput(@"r2.png"))
        {
            // Apply image enhancement filters
            input.Deskew(); // Fix image rotation
            input.EnhanceResolution(225); // Optimal DPI for receipts
            input.DeNoise(); // Remove background noise
            input.Sharpen(); // Improve text clarity

            // Perform OCR on the input image
            var result = ocr.Read(input);

            // Regular expression patterns to extract relevant details from the OCR result
            var descriptionPattern = @"\w+\s+(.*?)\s+(\d+\.\d+)\s+Units\s+(\d+\.\d+)\s+Tax15%\s+\$(\d+\.\d+)";
            var pricePattern = @"\$\d+(\.\d{2})?";
            var datePattern = @"\d{1,2}[/-]\d{1,2}[/-]\d{2,4}";

            // Variables to store extracted data
            var descriptions = new List<string>();
            var unitPrices = new List<decimal>();
            var taxes = new List<decimal>();
            var amounts = new List<decimal>();

            var lines = result.Text.Split('\n');
            foreach (var line in lines)
            {
                // Match each line against the description pattern
                var descriptionMatch = Regex.Match(line, descriptionPattern);
                if (descriptionMatch.Success)
                {
                    descriptions.Add(descriptionMatch.Groups[1].Value.Trim());
                    unitPrices.Add(decimal.Parse(descriptionMatch.Groups[2].Value));

                    // Calculate tax and total amount for each item
                    var tax = unitPrices[unitPrices.Count - 1] * 0.15m;
                    taxes.Add(tax);
                    amounts.Add(unitPrices[unitPrices.Count - 1] + tax);
                }

                // Extract date if found
                var dateMatch = Regex.Match(line, datePattern);
                if (dateMatch.Success)
                {
                    Console.WriteLine($"Receipt Date: {dateMatch.Value}");
                }
            }

            // Output the extracted data
            for (int i = 0; i < descriptions.Count; i++)
            {
                Console.WriteLine($"Description: {descriptions[i]}");
                Console.WriteLine($"Quantity: 1.00 Units");
                Console.WriteLine($"Unit Price: ${unitPrices[i]:0.00}");
                Console.WriteLine($"Taxes: ${taxes[i]:0.00}");
                Console.WriteLine($"Amount: ${amounts[i]:0.00}");
                Console.WriteLine("-----------------------");
            }

            // Calculate and display totals
            var subtotal = unitPrices.Sum();
            var totalTax = taxes.Sum();
            var grandTotal = amounts.Sum();

            Console.WriteLine($"\nSubtotal: ${subtotal:0.00}");
            Console.WriteLine($"Total Tax: ${totalTax:0.00}");
            Console.WriteLine($"Grand Total: ${grandTotal:0.00}");
        }
    }
}
using IronOcr;
using System;
using System.Collections.Generic;
using System.Text.RegularExpressions;

class ReceiptScanner
{
    static void Main()
    {
        var ocr = new IronTesseract();

        // Configure OCR for optimal receipt reading
        ocr.Configuration.WhiteListCharacters = "0123456789.$,ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz% ";
        ocr.Configuration.BlackListCharacters = "~`@#*_}{][|\\";
        ocr.Configuration.TesseractVersion = TesseractVersion.Tesseract5;

        // Load the image of the receipt
        using (var input = new OcrInput(@"r2.png"))
        {
            // Apply image enhancement filters
            input.Deskew(); // Fix image rotation
            input.EnhanceResolution(225); // Optimal DPI for receipts
            input.DeNoise(); // Remove background noise
            input.Sharpen(); // Improve text clarity

            // Perform OCR on the input image
            var result = ocr.Read(input);

            // Regular expression patterns to extract relevant details from the OCR result
            var descriptionPattern = @"\w+\s+(.*?)\s+(\d+\.\d+)\s+Units\s+(\d+\.\d+)\s+Tax15%\s+\$(\d+\.\d+)";
            var pricePattern = @"\$\d+(\.\d{2})?";
            var datePattern = @"\d{1,2}[/-]\d{1,2}[/-]\d{2,4}";

            // Variables to store extracted data
            var descriptions = new List<string>();
            var unitPrices = new List<decimal>();
            var taxes = new List<decimal>();
            var amounts = new List<decimal>();

            var lines = result.Text.Split('\n');
            foreach (var line in lines)
            {
                // Match each line against the description pattern
                var descriptionMatch = Regex.Match(line, descriptionPattern);
                if (descriptionMatch.Success)
                {
                    descriptions.Add(descriptionMatch.Groups[1].Value.Trim());
                    unitPrices.Add(decimal.Parse(descriptionMatch.Groups[2].Value));

                    // Calculate tax and total amount for each item
                    var tax = unitPrices[unitPrices.Count - 1] * 0.15m;
                    taxes.Add(tax);
                    amounts.Add(unitPrices[unitPrices.Count - 1] + tax);
                }

                // Extract date if found
                var dateMatch = Regex.Match(line, datePattern);
                if (dateMatch.Success)
                {
                    Console.WriteLine($"Receipt Date: {dateMatch.Value}");
                }
            }

            // Output the extracted data
            for (int i = 0; i < descriptions.Count; i++)
            {
                Console.WriteLine($"Description: {descriptions[i]}");
                Console.WriteLine($"Quantity: 1.00 Units");
                Console.WriteLine($"Unit Price: ${unitPrices[i]:0.00}");
                Console.WriteLine($"Taxes: ${taxes[i]:0.00}");
                Console.WriteLine($"Amount: ${amounts[i]:0.00}");
                Console.WriteLine("-----------------------");
            }

            // Calculate and display totals
            var subtotal = unitPrices.Sum();
            var totalTax = taxes.Sum();
            var grandTotal = amounts.Sum();

            Console.WriteLine($"\nSubtotal: ${subtotal:0.00}");
            Console.WriteLine($"Total Tax: ${totalTax:0.00}");
            Console.WriteLine($"Grand Total: ${grandTotal:0.00}");
        }
    }
}
Imports IronOcr
Imports System
Imports System.Collections.Generic
Imports System.Text.RegularExpressions
Imports System.Linq

Class ReceiptScanner
    Shared Sub Main()
        Dim ocr = New IronTesseract()

        ' Configure OCR for optimal receipt reading
        ocr.Configuration.WhiteListCharacters = "0123456789.$,ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz% "
        ocr.Configuration.BlackListCharacters = "~`@#*_}{][|\"
        ocr.Configuration.TesseractVersion = TesseractVersion.Tesseract5

        ' Load the image of the receipt
        Using input = New OcrInput("r2.png")
            ' Apply image enhancement filters
            input.Deskew() ' Fix image rotation
            input.EnhanceResolution(225) ' Optimal DPI for receipts
            input.DeNoise() ' Remove background noise
            input.Sharpen() ' Improve text clarity

            ' Perform OCR on the input image
            Dim result = ocr.Read(input)

            ' Regular expression patterns to extract relevant details from the OCR result
            Dim descriptionPattern = "\w+\s+(.*?)\s+(\d+\.\d+)\s+Units\s+(\d+\.\d+)\s+Tax15%\s+\$(\d+\.\d+)"
            Dim pricePattern = "\$\d+(\.\d{2})?"
            Dim datePattern = "\d{1,2}[/-]\d{1,2}[/-]\d{2,4}"

            ' Variables to store extracted data
            Dim descriptions = New List(Of String)()
            Dim unitPrices = New List(Of Decimal)()
            Dim taxes = New List(Of Decimal)()
            Dim amounts = New List(Of Decimal)()

            Dim lines = result.Text.Split(ControlChars.Lf)
            For Each line In lines
                ' Match each line against the description pattern
                Dim descriptionMatch = Regex.Match(line, descriptionPattern)
                If descriptionMatch.Success Then
                    descriptions.Add(descriptionMatch.Groups(1).Value.Trim())
                    unitPrices.Add(Decimal.Parse(descriptionMatch.Groups(2).Value))

                    ' Calculate tax and total amount for each item
                    Dim tax = unitPrices(unitPrices.Count - 1) * 0.15D
                    taxes.Add(tax)
                    amounts.Add(unitPrices(unitPrices.Count - 1) + tax)
                End If

                ' Extract date if found
                Dim dateMatch = Regex.Match(line, datePattern)
                If dateMatch.Success Then
                    Console.WriteLine($"Receipt Date: {dateMatch.Value}")
                End If
            Next

            ' Output the extracted data
            For i As Integer = 0 To descriptions.Count - 1
                Console.WriteLine($"Description: {descriptions(i)}")
                Console.WriteLine("Quantity: 1.00 Units")
                Console.WriteLine($"Unit Price: ${unitPrices(i):0.00}")
                Console.WriteLine($"Taxes: ${taxes(i):0.00}")
                Console.WriteLine($"Amount: ${amounts(i):0.00}")
                Console.WriteLine("-----------------------")
            Next

            ' Calculate and display totals
            Dim subtotal = unitPrices.Sum()
            Dim totalTax = taxes.Sum()
            Dim grandTotal = amounts.Sum()

            Console.WriteLine(vbCrLf & $"Subtotal: ${subtotal:0.00}")
            Console.WriteLine($"Total Tax: ${totalTax:0.00}")
            Console.WriteLine($"Grand Total: ${grandTotal:0.00}")
        End Using
    End Sub
End Class
$vbLabelText   $csharpLabel

哪些技術可以提高收據掃描準確度?

準確掃描收據的關鍵技巧: -字符白名單:將識別限制為預期字符 影像預處理:包括去斜解析度增強去雜訊。 -模式匹配:使用正規表示式擷取結構化資料 -置信度評分:基於識別置信度驗證結果

Visual Studio 偵錯控制台顯示從 PDF 提取的發票數據,包括項目描述、數量、價格、稅額和總計。 輸出

如何擷取完整的收據內容?

提取完整收據內容並保留格式:

using IronOcr;
using System;
using System.Linq;

class WholeReceiptExtractor
{
    static void Main()
    {
        var ocr = new IronTesseract();

        // Configure for receipt scanning
        ocr.Configuration.ReadBarCodes = true; // Enable barcode detection
        ocr.Configuration.TesseractVersion = TesseractVersion.Tesseract5; // Use latest engine
        ocr.Configuration.EngineMode = TesseractEngineMode.TesseractAndLstm; // Best accuracy

        using (var input = new OcrInput(@"r3.png"))
        {
            // Apply automatic image correction
            input.WithTitle("Receipt Scan");

            // Use computer vision to find text regions
            var textRegions = input.FindTextRegions();
            Console.WriteLine($"Found {textRegions.Count()} text regions");

            // Apply optimal filters for receipt processing
            input.ApplyOcrInputFilters();

            // Perform OCR on the entire receipt
            var result = ocr.Read(input);

            // Display extracted text
            Console.WriteLine("=== EXTRACTED RECEIPT TEXT ===");
            Console.WriteLine(result.Text);

            // Get detailed results
            Console.WriteLine($"\n=== OCR STATISTICS ===");
            Console.WriteLine($"OCR Confidence: {result.Confidence:F2}%");
            Console.WriteLine($"Pages Processed: {result.Pages.Length}");
            Console.WriteLine($"Paragraphs Found: {result.Paragraphs.Length}");
            Console.WriteLine($"Lines Detected: {result.Lines.Length}");
            Console.WriteLine($"Words Recognized: {result.Words.Length}");

            // Extract any barcodes found
            if (result.Barcodes.Any())
            {
                Console.WriteLine("\n=== BARCODES DETECTED ===");
                foreach(var barcode in result.Barcodes)
                {
                    Console.WriteLine($"Type: {barcode.Type}");
                    Console.WriteLine($"Value: {barcode.Value}");
                    Console.WriteLine($"Location: X={barcode.X}, Y={barcode.Y}");
                }
            }

            // Save as searchable PDF
            result.SaveAsSearchablePdf("receipt_searchable.pdf");
            Console.WriteLine("\nSearchable PDF saved as: receipt_searchable.pdf");

            // Export as hOCR for preservation
            result.SaveAsHocrFile("receipt_hocr.html");
            Console.WriteLine("hOCR file saved as: receipt_hocr.html");
        }
    }
}
using IronOcr;
using System;
using System.Linq;

class WholeReceiptExtractor
{
    static void Main()
    {
        var ocr = new IronTesseract();

        // Configure for receipt scanning
        ocr.Configuration.ReadBarCodes = true; // Enable barcode detection
        ocr.Configuration.TesseractVersion = TesseractVersion.Tesseract5; // Use latest engine
        ocr.Configuration.EngineMode = TesseractEngineMode.TesseractAndLstm; // Best accuracy

        using (var input = new OcrInput(@"r3.png"))
        {
            // Apply automatic image correction
            input.WithTitle("Receipt Scan");

            // Use computer vision to find text regions
            var textRegions = input.FindTextRegions();
            Console.WriteLine($"Found {textRegions.Count()} text regions");

            // Apply optimal filters for receipt processing
            input.ApplyOcrInputFilters();

            // Perform OCR on the entire receipt
            var result = ocr.Read(input);

            // Display extracted text
            Console.WriteLine("=== EXTRACTED RECEIPT TEXT ===");
            Console.WriteLine(result.Text);

            // Get detailed results
            Console.WriteLine($"\n=== OCR STATISTICS ===");
            Console.WriteLine($"OCR Confidence: {result.Confidence:F2}%");
            Console.WriteLine($"Pages Processed: {result.Pages.Length}");
            Console.WriteLine($"Paragraphs Found: {result.Paragraphs.Length}");
            Console.WriteLine($"Lines Detected: {result.Lines.Length}");
            Console.WriteLine($"Words Recognized: {result.Words.Length}");

            // Extract any barcodes found
            if (result.Barcodes.Any())
            {
                Console.WriteLine("\n=== BARCODES DETECTED ===");
                foreach(var barcode in result.Barcodes)
                {
                    Console.WriteLine($"Type: {barcode.Type}");
                    Console.WriteLine($"Value: {barcode.Value}");
                    Console.WriteLine($"Location: X={barcode.X}, Y={barcode.Y}");
                }
            }

            // Save as searchable PDF
            result.SaveAsSearchablePdf("receipt_searchable.pdf");
            Console.WriteLine("\nSearchable PDF saved as: receipt_searchable.pdf");

            // Export as hOCR for preservation
            result.SaveAsHocrFile("receipt_hocr.html");
            Console.WriteLine("hOCR file saved as: receipt_hocr.html");
        }
    }
}
Imports IronOcr
Imports System
Imports System.Linq

Class WholeReceiptExtractor
    Shared Sub Main()
        Dim ocr = New IronTesseract()

        ' Configure for receipt scanning
        ocr.Configuration.ReadBarCodes = True ' Enable barcode detection
        ocr.Configuration.TesseractVersion = TesseractVersion.Tesseract5 ' Use latest engine
        ocr.Configuration.EngineMode = TesseractEngineMode.TesseractAndLstm ' Best accuracy

        Using input = New OcrInput("r3.png")
            ' Apply automatic image correction
            input.WithTitle("Receipt Scan")

            ' Use computer vision to find text regions
            Dim textRegions = input.FindTextRegions()
            Console.WriteLine($"Found {textRegions.Count()} text regions")

            ' Apply optimal filters for receipt processing
            input.ApplyOcrInputFilters()

            ' Perform OCR on the entire receipt
            Dim result = ocr.Read(input)

            ' Display extracted text
            Console.WriteLine("=== EXTRACTED RECEIPT TEXT ===")
            Console.WriteLine(result.Text)

            ' Get detailed results
            Console.WriteLine(vbCrLf & "=== OCR STATISTICS ===")
            Console.WriteLine($"OCR Confidence: {result.Confidence:F2}%")
            Console.WriteLine($"Pages Processed: {result.Pages.Length}")
            Console.WriteLine($"Paragraphs Found: {result.Paragraphs.Length}")
            Console.WriteLine($"Lines Detected: {result.Lines.Length}")
            Console.WriteLine($"Words Recognized: {result.Words.Length}")

            ' Extract any barcodes found
            If result.Barcodes.Any() Then
                Console.WriteLine(vbCrLf & "=== BARCODES DETECTED ===")
                For Each barcode In result.Barcodes
                    Console.WriteLine($"Type: {barcode.Type}")
                    Console.WriteLine($"Value: {barcode.Value}")
                    Console.WriteLine($"Location: X={barcode.X}, Y={barcode.Y}")
                Next
            End If

            ' Save as searchable PDF
            result.SaveAsSearchablePdf("receipt_searchable.pdf")
            Console.WriteLine(vbCrLf & "Searchable PDF saved as: receipt_searchable.pdf")

            ' Export as hOCR for preservation
            result.SaveAsHocrFile("receipt_hocr.html")
            Console.WriteLine("hOCR file saved as: receipt_hocr.html")
        End Using
    End Sub
End Class
$vbLabelText   $csharpLabel

Visual Studio 偵錯控制台顯示從 PDF 提取的發票數據,包括項目描述、數量、價格、稅額和總計。 掃描收據 API 輸出

哪些進階功能可以提升收據掃描體驗?

IronOCR 提供多項進階功能,可顯著提高收據掃描準確率:

IronOCR 支援哪些語言?

1.多語言支援:處理125 多種語言的收據,或在一個文件中處理多種語言的收據

IronOCR 能讀取收據上的條碼嗎?

2.條碼讀取:自動偵測和讀取條碼和二維碼

電腦視覺如何幫助處理收據?

3.電腦視覺:在 OCR 之前使用高階文字偵測來定位文字區域。

我可以針對獨特的收據格式訓練自訂模型嗎?

4.自訂培訓:為特殊收據格式培訓自訂字體

如何提高批量處理的效能?

5.效能最佳化:對批次操作實現 多執行緒非同步處理

// Example: Async receipt processing for high-volume scenarios
using IronOcr;
using System;
using System.Threading.Tasks;
using System.Collections.Generic;
using System.IO;

class BulkReceiptProcessor
{
    static async Task Main()
    {
        var ocr = new IronTesseract();

        // Configure for optimal performance
        ocr.Configuration.TesseractVersion = TesseractVersion.Tesseract5;
        ocr.Configuration.UseMultiThreading = true;
        ocr.Configuration.ProcessorCount = Environment.ProcessorCount;

        // Process multiple receipts asynchronously
        var receiptFiles = Directory.GetFiles(@"C:\Receipts\", "*.jpg");
        var tasks = new List<Task<OcrResult>>();

        foreach (var file in receiptFiles)
        {
            tasks.Add(ProcessReceiptAsync(ocr, file));
        }

        // Wait for all receipts to be processed
        var results = await Task.WhenAll(tasks);

        // Aggregate results
        decimal totalAmount = 0;
        foreach (var result in results)
        {
            // Extract total from each receipt
            var match = System.Text.RegularExpressions.Regex.Match(
                result.Text, @"Total:?\s*\$?(\d+\.\d{2})");

            if (match.Success && decimal.TryParse(match.Groups[1].Value, out var amount))
            {
                totalAmount += amount;
            }
        }

        Console.WriteLine($"Processed {results.Length} receipts");
        Console.WriteLine($"Combined total: ${totalAmount:F2}");
    }

    static async Task<OcrResult> ProcessReceiptAsync(IronTesseract ocr, string filePath)
    {
        using (var input = new OcrInput(filePath))
        {
            // Apply preprocessing
            input.DeNoise();
            input.Deskew();
            input.EnhanceResolution(200);

            // Process asynchronously
            return await ocr.ReadAsync(input);
        }
    }
}
// Example: Async receipt processing for high-volume scenarios
using IronOcr;
using System;
using System.Threading.Tasks;
using System.Collections.Generic;
using System.IO;

class BulkReceiptProcessor
{
    static async Task Main()
    {
        var ocr = new IronTesseract();

        // Configure for optimal performance
        ocr.Configuration.TesseractVersion = TesseractVersion.Tesseract5;
        ocr.Configuration.UseMultiThreading = true;
        ocr.Configuration.ProcessorCount = Environment.ProcessorCount;

        // Process multiple receipts asynchronously
        var receiptFiles = Directory.GetFiles(@"C:\Receipts\", "*.jpg");
        var tasks = new List<Task<OcrResult>>();

        foreach (var file in receiptFiles)
        {
            tasks.Add(ProcessReceiptAsync(ocr, file));
        }

        // Wait for all receipts to be processed
        var results = await Task.WhenAll(tasks);

        // Aggregate results
        decimal totalAmount = 0;
        foreach (var result in results)
        {
            // Extract total from each receipt
            var match = System.Text.RegularExpressions.Regex.Match(
                result.Text, @"Total:?\s*\$?(\d+\.\d{2})");

            if (match.Success && decimal.TryParse(match.Groups[1].Value, out var amount))
            {
                totalAmount += amount;
            }
        }

        Console.WriteLine($"Processed {results.Length} receipts");
        Console.WriteLine($"Combined total: ${totalAmount:F2}");
    }

    static async Task<OcrResult> ProcessReceiptAsync(IronTesseract ocr, string filePath)
    {
        using (var input = new OcrInput(filePath))
        {
            // Apply preprocessing
            input.DeNoise();
            input.Deskew();
            input.EnhanceResolution(200);

            // Process asynchronously
            return await ocr.ReadAsync(input);
        }
    }
}
Imports IronOcr
Imports System
Imports System.Threading.Tasks
Imports System.Collections.Generic
Imports System.IO
Imports System.Text.RegularExpressions

Module BulkReceiptProcessor

    Sub Main()
        MainAsync().GetAwaiter().GetResult()
    End Sub

    Private Async Function MainAsync() As Task
        Dim ocr As New IronTesseract()

        ' Configure for optimal performance
        ocr.Configuration.TesseractVersion = TesseractVersion.Tesseract5
        ocr.Configuration.UseMultiThreading = True
        ocr.Configuration.ProcessorCount = Environment.ProcessorCount

        ' Process multiple receipts asynchronously
        Dim receiptFiles = Directory.GetFiles("C:\Receipts\", "*.jpg")
        Dim tasks As New List(Of Task(Of OcrResult))()

        For Each file In receiptFiles
            tasks.Add(ProcessReceiptAsync(ocr, file))
        Next

        ' Wait for all receipts to be processed
        Dim results = Await Task.WhenAll(tasks)

        ' Aggregate results
        Dim totalAmount As Decimal = 0
        For Each result In results
            ' Extract total from each receipt
            Dim match = Regex.Match(result.Text, "Total:?\s*\$?(\d+\.\d{2})")

            If match.Success AndAlso Decimal.TryParse(match.Groups(1).Value, totalAmount) Then
                totalAmount += totalAmount
            End If
        Next

        Console.WriteLine($"Processed {results.Length} receipts")
        Console.WriteLine($"Combined total: ${totalAmount:F2}")
    End Function

    Private Async Function ProcessReceiptAsync(ocr As IronTesseract, filePath As String) As Task(Of OcrResult)
        Using input As New OcrInput(filePath)
            ' Apply preprocessing
            input.DeNoise()
            input.Deskew()
            input.EnhanceResolution(200)

            ' Process asynchronously
            Return Await ocr.ReadAsync(input)
        End Using
    End Function

End Module
$vbLabelText   $csharpLabel

如何應對常見的收據掃描難題?

收據掃描面臨一些獨特的挑戰,而 IronOCR 可以幫助解決這些挑戰:

如何處理品質差的收據圖片?

-影像品質不佳:使用濾鏡精靈自動尋找最佳預處理設定。

如果收據傾斜或旋轉怎麼辦?

-傾斜或旋轉的收據:自動頁面旋轉偵測可確保正確的方向。

如何處理褪色或對比度低的收據?

-褪色或低對比文字:應用顏色校正增強濾鏡

IronOCR 可以辨識皺巴巴或破損的收據嗎?

-皺巴巴或破損的收據高級預處理技術可從難以辨認的圖像中恢復文字。

如何管理不同的收據格式和版面?

不同零售商的收據格式差異很大。 IronOCR 提供靈活的解決方案:

using IronOcr;
using System;
using System.Collections.Generic;
using System.Linq;

class ReceiptLayoutHandler
{
    static void Main()
    {
        var ocr = new IronTesseract();

        // Configure for different receipt layouts
        ocr.Configuration.PageSegmentationMode = TesseractPageSegmentationMode.AutoOsd;
        ocr.Configuration.EngineMode = TesseractEngineMode.TesseractAndLstm;

        using (var input = new OcrInput(@"complex_receipt.jpg"))
        {
            // Apply region-specific processing
            var cropRegion = new CropRectangle(x: 0, y: 100, width: 400, height: 800);
            input.AddImage(@"complex_receipt.jpg", cropRegion);

            // Process with confidence tracking
            var result = ocr.Read(input);

            // Parse using confidence scores
            var highConfidenceLines = result.Lines
                .Where(line => line.Confidence > 85)
                .Select(line => line.Text)
                .ToList();

            // Extract data with fallback strategies
            var total = ExtractTotal(highConfidenceLines) 
                        ?? ExtractTotalAlternative(result.Text);

            Console.WriteLine($"Receipt Total: {total}");
        }
    }

    static decimal? ExtractTotal(List<string> lines)
    {
        // Primary extraction method
        foreach (var line in lines)
        {
            if (line.Contains("TOTAL") && 
                System.Text.RegularExpressions.Regex.IsMatch(line, @"\d+\.\d{2}"))
            {
                var match = System.Text.RegularExpressions.Regex.Match(line, @"(\d+\.\d{2})");
                if (decimal.TryParse(match.Value, out var total))
                    return total;
            }
        }
        return null;
    }

    static decimal? ExtractTotalAlternative(string fullText)
    {
        // Fallback extraction method
        var pattern = @"(?:Total|TOTAL|Grand Total|Amount Due).*?(\d+\.\d{2})";
        var match = System.Text.RegularExpressions.Regex.Match(fullText, pattern);

        if (match.Success && decimal.TryParse(match.Groups[1].Value, out var total))
            return total;

        return null;
    }
}
using IronOcr;
using System;
using System.Collections.Generic;
using System.Linq;

class ReceiptLayoutHandler
{
    static void Main()
    {
        var ocr = new IronTesseract();

        // Configure for different receipt layouts
        ocr.Configuration.PageSegmentationMode = TesseractPageSegmentationMode.AutoOsd;
        ocr.Configuration.EngineMode = TesseractEngineMode.TesseractAndLstm;

        using (var input = new OcrInput(@"complex_receipt.jpg"))
        {
            // Apply region-specific processing
            var cropRegion = new CropRectangle(x: 0, y: 100, width: 400, height: 800);
            input.AddImage(@"complex_receipt.jpg", cropRegion);

            // Process with confidence tracking
            var result = ocr.Read(input);

            // Parse using confidence scores
            var highConfidenceLines = result.Lines
                .Where(line => line.Confidence > 85)
                .Select(line => line.Text)
                .ToList();

            // Extract data with fallback strategies
            var total = ExtractTotal(highConfidenceLines) 
                        ?? ExtractTotalAlternative(result.Text);

            Console.WriteLine($"Receipt Total: {total}");
        }
    }

    static decimal? ExtractTotal(List<string> lines)
    {
        // Primary extraction method
        foreach (var line in lines)
        {
            if (line.Contains("TOTAL") && 
                System.Text.RegularExpressions.Regex.IsMatch(line, @"\d+\.\d{2}"))
            {
                var match = System.Text.RegularExpressions.Regex.Match(line, @"(\d+\.\d{2})");
                if (decimal.TryParse(match.Value, out var total))
                    return total;
            }
        }
        return null;
    }

    static decimal? ExtractTotalAlternative(string fullText)
    {
        // Fallback extraction method
        var pattern = @"(?:Total|TOTAL|Grand Total|Amount Due).*?(\d+\.\d{2})";
        var match = System.Text.RegularExpressions.Regex.Match(fullText, pattern);

        if (match.Success && decimal.TryParse(match.Groups[1].Value, out var total))
            return total;

        return null;
    }
}
Imports IronOcr
Imports System
Imports System.Collections.Generic
Imports System.Linq

Class ReceiptLayoutHandler
    Shared Sub Main()
        Dim ocr = New IronTesseract()

        ' Configure for different receipt layouts
        ocr.Configuration.PageSegmentationMode = TesseractPageSegmentationMode.AutoOsd
        ocr.Configuration.EngineMode = TesseractEngineMode.TesseractAndLstm

        Using input = New OcrInput("complex_receipt.jpg")
            ' Apply region-specific processing
            Dim cropRegion = New CropRectangle(x:=0, y:=100, width:=400, height:=800)
            input.AddImage("complex_receipt.jpg", cropRegion)

            ' Process with confidence tracking
            Dim result = ocr.Read(input)

            ' Parse using confidence scores
            Dim highConfidenceLines = result.Lines _
                .Where(Function(line) line.Confidence > 85) _
                .Select(Function(line) line.Text) _
                .ToList()

            ' Extract data with fallback strategies
            Dim total = ExtractTotal(highConfidenceLines) _
                        OrElse ExtractTotalAlternative(result.Text)

            Console.WriteLine($"Receipt Total: {total}")
        End Using
    End Sub

    Shared Function ExtractTotal(lines As List(Of String)) As Decimal?
        ' Primary extraction method
        For Each line In lines
            If line.Contains("TOTAL") AndAlso _
                System.Text.RegularExpressions.Regex.IsMatch(line, "\d+\.\d{2}") Then
                Dim match = System.Text.RegularExpressions.Regex.Match(line, "(\d+\.\d{2})")
                Dim total As Decimal
                If Decimal.TryParse(match.Value, total) Then
                    Return total
                End If
            End If
        Next
        Return Nothing
    End Function

    Shared Function ExtractTotalAlternative(fullText As String) As Decimal?
        ' Fallback extraction method
        Dim pattern = "(?:Total|TOTAL|Grand Total|Amount Due).*?(\d+\.\d{2})"
        Dim match = System.Text.RegularExpressions.Regex.Match(fullText, pattern)

        Dim total As Decimal
        If match.Success AndAlso Decimal.TryParse(match.Groups(1).Value, total) Then
            Return total
        End If

        Return Nothing
    End Function
End Class
$vbLabelText   $csharpLabel

關於收據掃描 API,我應該記住哪些關鍵要點?

IronOCR 等收據掃描 API 為自動從收據中提取資料提供了可靠的解決方案。透過使用先進的 OCR 技術,企業可以自動提取供應商名稱、購買日期、商品明細、價格、稅金和總計等資訊。 支援多種語言、貨幣和條碼,企業可以簡化收據管理,節省時間,並做出數據驅動的決策。

IronOCR 為開發人員提供準確且高效的文字擷取工具,從而實現任務自動化並提高效率。 該庫的完整功能集包括對各種文件類型的支持,以及最近的改進,例如記憶體減少 98%

滿足先決條件並整合 IronOCR 後,即可享受自動收據處理帶來的好處。 該程式庫的文檔範例故障排除指南可確保順利實施。

有關更多信息,請訪問許可頁面或瀏覽C# Tesseract OCR 教程

常見問題解答

如何在 C# 中使用 OCR 自動擷取收據資料?

您可以使用 IronOCR 在 C# 中自動擷取收據資料,IronOCR 可讓您從收據影像中高準確度地擷取關鍵細節,例如細列項目、價格、稅金和總金額。

以 C# 建立收據掃描專案的先決條件為何?

要以 C# 建立收據掃描專案,您需要 Visual Studio、基本的 C# 程式設計知識,以及在專案中安裝 IronOCR 函式庫。

如何在 Visual Studio 中使用 NuGet Package Manager 安裝 OCR 函式庫?

開啟 Visual Studio,前往「工具」>「NuGet 套件管理員」>「管理解決方案的 NuGet 套件」,搜尋 IronOCR,並將其安裝在您的專案中。

我可以使用 Visual Studio 指令行安裝 OCR 函式庫嗎?

是的,您可以在 Visual Studio 中開啟套件管理員控制台並執行指令來安裝 IronOCR:Install-Package IronOcr

如何使用 OCR 從整張收據中擷取文字?

若要從整張收據中抽取文字,請使用 IronOCR 對整張收據影像執行 OCR,然後再使用 C# 程式碼輸出抽取的文字。

收據掃描 API 有哪些優點?

IronOCR 之類的收據掃描 API 可自動進行資料擷取、將手動錯誤減至最低、提高生產力,並可深入瞭解消費模式,以做出更好的商業決策。

OCR 函式庫是否支援多種語言和貨幣?

是的,IronOCR 支援多種語言、貨幣和收據格式,是全球應用程式的理想選擇。

OCR 函式庫從影像中萃取文字的精確度如何?

IronOCR 採用先進的 OCR 演算法、電腦視覺和機器學習模型,即使在具挑戰性的情況下,也能確保高準確度。

使用 OCR 可以從收據中擷取哪些類型的資料?

IronOCR 可以擷取資料,例如細列項目、定價、稅額、總金額以及其他收據細節。

自動化收據解析如何改善業務流程?

使用 IronOCR 進行自動化收據解析,可減少手動輸入、允許精確的資料收集,並實現資料驅動的決策,從而改善業務流程。

Kannaopat Udonpant
軟體工程師
在成為軟體工程師之前,Kannapat 完成了日本北海道大學的環境資源博士學位。在攻讀學位期間,Kannapat 也成為生物製造工程系車輛機器人實驗室的成員。2022 年,他利用自己的 C# 技能加入 Iron Software 的工程團隊,主要負責 IronPDF 的開發。Kannapat 非常重視他的工作,因為他可以直接向撰寫 IronPDF 使用的大部分程式碼的開發者學習。除了同儕學習之外,Kannapat 也很享受在 Iron Software 工作的社交生活。不寫程式碼或文件時,Kannapat 通常會用 PS5 玩遊戲或重看《最後的我們》。