收據掃描 API:使用 C# 和 IronOCR 從收據中提取數據
收據掃描 API 使用 OCR 技術自動從收據中提取數據,從而顯著減少人工輸入錯誤並加快處理速度。 本指南展示如何在 C# 中使用 IronOCR 從收據影像中準確提取供應商名稱、日期、商品、價格和總計,並內建影像預處理功能,支援多種格式。
為什麼選擇 IronOCR 進行收據掃描?
IronOCR 是一個靈活的OCR 庫,可從掃描文件、圖像和 PDF 中可靠地提取文字。 IronOCR 憑藉先進的演算法、電腦視覺和機器學習模型,即使在具有挑戰性的場景下也能確保高精度。 該庫支援多種語言和字體樣式,使其適用於全球應用。 透過將 IronOCR 整合到您的應用程式中,您可以自動執行資料輸入和文字分析,從而提高生產力。
IronOCR 如何從收據圖像中提取文字?
IronOCR 可以從文件、照片、螢幕截圖和即時相機視訊串流中檢索文本,並以 JSON 回應的形式返回。 IronOCR 利用複雜的演算法和機器學習技術,分析影像數據,辨識字符,並將其轉換為機器可讀的文字。 該程式庫採用Tesseract 5 技術,並結合專有改進技術,以實現更高的精度。
IronOCR為何是收據處理的理想選擇?
IronOCR 擅長處理低品質掃描、各種收據格式和不同方向的影像。 內建影像預處理濾鏡可在處理前自動改善影像質量,即使是皺巴巴或褪色的收據也能確保最佳效果。
使用 IronOCR 需要哪些條件?
在使用 IronOCR 之前,請確保滿足以下先決條件:
支援哪些開發環境?
1.開發環境:安裝適當的 IDE,例如 Visual Studio。 IronOCR 支援Windows 、 Linux 、 macOS 、 Azure和AWS 。
需要哪些程式設計技能?
需要哪些軟體相依性?
- IronOCR 安裝:透過NuGet 套件管理器安裝。 可能需要特定於平台的依賴項。
是否需要許可證密鑰?
4.許可證金鑰(可選) :提供免費試用; 生產用途需要獲得許可。
如何建立一個用於收據掃描的新 Visual Studio 專案?
如何在 Visual Studio 中建立一個新專案?
開啟 Visual Studio,轉到"檔案",然後將滑鼠懸停在"新建"上,然後按一下"專案"。
Visual Studio IDE 中,"檔案"功能表已展開,"新 > 專案"選項已高亮顯示,程式碼編輯器中正在顯示用於載入 Excel 工作簿的 C# 程式碼。 新項目圖片
我應該選擇哪個項目模板?
選擇"控制台應用程式",然後按一下"下一步"。 此模板非常適合在將 IronOCR 應用於 Web 應用程式之前進行學習。
Visual Studio 的"建立新專案"對話方塊顯示已選擇"控制台應用程式"模板,並包含 Windows、Linux 和 macOS 平台選項。 控制台應用程式
我的收據掃描器專案應該如何命名?
請填寫項目名稱和地點,然後按一下"下一步"。 選擇一個描述性的名稱,例如"ReceiptScannerAPI"。
Visual Studio 新專案配置畫面,用於建立名為"IronOCR"的控制台應用程序,已選擇 C# 並顯示解決方案設定。 專案配置
我應該選擇哪個 .NET Framework 版本?
為了獲得最佳相容性,請選擇 .NET 5.0 或更高版本,然後按一下"建立"。
Visual Studio 的"附加資訊"對話方塊顯示了控制台應用程式的配置,其中目標框架選擇為 .NET 5.0,平台選項包括 Linux、macOS、Windows 和控制台。 目標框架
如何在我的專案中安裝 IronOCR?
有兩種簡單的安裝方法:
如何使用 NuGet 套件管理器?
前往"工具" > "NuGet 套件管理員" > "管理解決方案的 NuGet 套件"。
Visual Studio NuGet 套件管理器設定對話框,其中包含套件來源配置,以及解決方案資源管理器中的 C# 專案結構。 NuGet 套件管理器
搜尋 IronOCR 並安裝該軟體包。 對於非英文收據,請安裝特定語言的軟體包。
Visual Studio 中的 NuGet 套件管理器顯示已安裝的 IronOCR 套件,包括主庫以及阿拉伯語、希伯來語和西班牙語的特定語言 OCR 套件。 IronOCR。
如何使用命令列安裝?
- 前往"工具" > "NuGet 套件管理器" > "套件管理器控制台" 。
-
輸入以下命令:
Install-Package IronOcr
Visual Studio 套件管理器控制台視窗顯示正在為名為"Create PDF"的專案執行 NuGet 命令"PM> Install-Package IronOCR"。 軟體包管理器控制台
如何使用 IronOCR 快速擷取收據資料?
只需幾行程式碼即可提取收據資料:
-
使用NuGet套件管理器安裝https://www.nuget.org/packages/IronOcr
PM > Install-Package IronOcr -
複製並運行這段程式碼。
using IronOcr; using System; var ocr = new IronTesseract(); // Configure for receipt scanning ocr.Configuration.ReadBarCodes = true; ocr.Configuration.WhiteListCharacters = "0123456789.$,ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz% "; using (var input = new OcrInput(@"receipt.jpg")) { // Apply automatic image enhancement input.DeNoise(); input.Deskew(); input.EnhanceResolution(225); // Extract text from receipt var result = ocr.Read(input); // Display extracted text and confidence Console.WriteLine($"Extracted Text:\n{result.Text}"); Console.WriteLine($"\nConfidence: {result.Confidence}%"); } -
部署到您的生產環境進行測試
今天就在您的專案中開始使用免費試用IronOCR
如何從收據影像中提取結構化資料?
IronOCR 可以從各種文件類型中提取商品明細、價格、稅費和總計。該庫支援PDF 、多頁 TIFF和各種圖像格式。
using IronOcr;
using System;
using System.Collections.Generic;
using System.Text.RegularExpressions;
class ReceiptScanner
{
static void Main()
{
var ocr = new IronTesseract();
// Configure OCR for optimal receipt reading
ocr.Configuration.WhiteListCharacters = "0123456789.$,ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz% ";
ocr.Configuration.BlackListCharacters = "~`@#*_}{][|\\";
ocr.Configuration.TesseractVersion = TesseractVersion.Tesseract5;
// Load the image of the receipt
using (var input = new OcrInput(@"r2.png"))
{
// Apply image enhancement filters
input.Deskew(); // Fix image rotation
input.EnhanceResolution(225); // Optimal DPI for receipts
input.DeNoise(); // Remove background noise
input.Sharpen(); // Improve text clarity
// Perform OCR on the input image
var result = ocr.Read(input);
// Regular expression patterns to extract relevant details from the OCR result
var descriptionPattern = @"\w+\s+(.*?)\s+(\d+\.\d+)\s+Units\s+(\d+\.\d+)\s+Tax15%\s+\$(\d+\.\d+)";
var pricePattern = @"\$\d+(\.\d{2})?";
var datePattern = @"\d{1,2}[/-]\d{1,2}[/-]\d{2,4}";
// Variables to store extracted data
var descriptions = new List<string>();
var unitPrices = new List<decimal>();
var taxes = new List<decimal>();
var amounts = new List<decimal>();
var lines = result.Text.Split('\n');
foreach (var line in lines)
{
// Match each line against the description pattern
var descriptionMatch = Regex.Match(line, descriptionPattern);
if (descriptionMatch.Success)
{
descriptions.Add(descriptionMatch.Groups[1].Value.Trim());
unitPrices.Add(decimal.Parse(descriptionMatch.Groups[2].Value));
// Calculate tax and total amount for each item
var tax = unitPrices[unitPrices.Count - 1] * 0.15m;
taxes.Add(tax);
amounts.Add(unitPrices[unitPrices.Count - 1] + tax);
}
// Extract date if found
var dateMatch = Regex.Match(line, datePattern);
if (dateMatch.Success)
{
Console.WriteLine($"Receipt Date: {dateMatch.Value}");
}
}
// Output the extracted data
for (int i = 0; i < descriptions.Count; i++)
{
Console.WriteLine($"Description: {descriptions[i]}");
Console.WriteLine($"Quantity: 1.00 Units");
Console.WriteLine($"Unit Price: ${unitPrices[i]:0.00}");
Console.WriteLine($"Taxes: ${taxes[i]:0.00}");
Console.WriteLine($"Amount: ${amounts[i]:0.00}");
Console.WriteLine("-----------------------");
}
// Calculate and display totals
var subtotal = unitPrices.Sum();
var totalTax = taxes.Sum();
var grandTotal = amounts.Sum();
Console.WriteLine($"\nSubtotal: ${subtotal:0.00}");
Console.WriteLine($"Total Tax: ${totalTax:0.00}");
Console.WriteLine($"Grand Total: ${grandTotal:0.00}");
}
}
}
using IronOcr;
using System;
using System.Collections.Generic;
using System.Text.RegularExpressions;
class ReceiptScanner
{
static void Main()
{
var ocr = new IronTesseract();
// Configure OCR for optimal receipt reading
ocr.Configuration.WhiteListCharacters = "0123456789.$,ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz% ";
ocr.Configuration.BlackListCharacters = "~`@#*_}{][|\\";
ocr.Configuration.TesseractVersion = TesseractVersion.Tesseract5;
// Load the image of the receipt
using (var input = new OcrInput(@"r2.png"))
{
// Apply image enhancement filters
input.Deskew(); // Fix image rotation
input.EnhanceResolution(225); // Optimal DPI for receipts
input.DeNoise(); // Remove background noise
input.Sharpen(); // Improve text clarity
// Perform OCR on the input image
var result = ocr.Read(input);
// Regular expression patterns to extract relevant details from the OCR result
var descriptionPattern = @"\w+\s+(.*?)\s+(\d+\.\d+)\s+Units\s+(\d+\.\d+)\s+Tax15%\s+\$(\d+\.\d+)";
var pricePattern = @"\$\d+(\.\d{2})?";
var datePattern = @"\d{1,2}[/-]\d{1,2}[/-]\d{2,4}";
// Variables to store extracted data
var descriptions = new List<string>();
var unitPrices = new List<decimal>();
var taxes = new List<decimal>();
var amounts = new List<decimal>();
var lines = result.Text.Split('\n');
foreach (var line in lines)
{
// Match each line against the description pattern
var descriptionMatch = Regex.Match(line, descriptionPattern);
if (descriptionMatch.Success)
{
descriptions.Add(descriptionMatch.Groups[1].Value.Trim());
unitPrices.Add(decimal.Parse(descriptionMatch.Groups[2].Value));
// Calculate tax and total amount for each item
var tax = unitPrices[unitPrices.Count - 1] * 0.15m;
taxes.Add(tax);
amounts.Add(unitPrices[unitPrices.Count - 1] + tax);
}
// Extract date if found
var dateMatch = Regex.Match(line, datePattern);
if (dateMatch.Success)
{
Console.WriteLine($"Receipt Date: {dateMatch.Value}");
}
}
// Output the extracted data
for (int i = 0; i < descriptions.Count; i++)
{
Console.WriteLine($"Description: {descriptions[i]}");
Console.WriteLine($"Quantity: 1.00 Units");
Console.WriteLine($"Unit Price: ${unitPrices[i]:0.00}");
Console.WriteLine($"Taxes: ${taxes[i]:0.00}");
Console.WriteLine($"Amount: ${amounts[i]:0.00}");
Console.WriteLine("-----------------------");
}
// Calculate and display totals
var subtotal = unitPrices.Sum();
var totalTax = taxes.Sum();
var grandTotal = amounts.Sum();
Console.WriteLine($"\nSubtotal: ${subtotal:0.00}");
Console.WriteLine($"Total Tax: ${totalTax:0.00}");
Console.WriteLine($"Grand Total: ${grandTotal:0.00}");
}
}
}
Imports IronOcr
Imports System
Imports System.Collections.Generic
Imports System.Text.RegularExpressions
Imports System.Linq
Class ReceiptScanner
Shared Sub Main()
Dim ocr = New IronTesseract()
' Configure OCR for optimal receipt reading
ocr.Configuration.WhiteListCharacters = "0123456789.$,ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz% "
ocr.Configuration.BlackListCharacters = "~`@#*_}{][|\"
ocr.Configuration.TesseractVersion = TesseractVersion.Tesseract5
' Load the image of the receipt
Using input = New OcrInput("r2.png")
' Apply image enhancement filters
input.Deskew() ' Fix image rotation
input.EnhanceResolution(225) ' Optimal DPI for receipts
input.DeNoise() ' Remove background noise
input.Sharpen() ' Improve text clarity
' Perform OCR on the input image
Dim result = ocr.Read(input)
' Regular expression patterns to extract relevant details from the OCR result
Dim descriptionPattern = "\w+\s+(.*?)\s+(\d+\.\d+)\s+Units\s+(\d+\.\d+)\s+Tax15%\s+\$(\d+\.\d+)"
Dim pricePattern = "\$\d+(\.\d{2})?"
Dim datePattern = "\d{1,2}[/-]\d{1,2}[/-]\d{2,4}"
' Variables to store extracted data
Dim descriptions = New List(Of String)()
Dim unitPrices = New List(Of Decimal)()
Dim taxes = New List(Of Decimal)()
Dim amounts = New List(Of Decimal)()
Dim lines = result.Text.Split(ControlChars.Lf)
For Each line In lines
' Match each line against the description pattern
Dim descriptionMatch = Regex.Match(line, descriptionPattern)
If descriptionMatch.Success Then
descriptions.Add(descriptionMatch.Groups(1).Value.Trim())
unitPrices.Add(Decimal.Parse(descriptionMatch.Groups(2).Value))
' Calculate tax and total amount for each item
Dim tax = unitPrices(unitPrices.Count - 1) * 0.15D
taxes.Add(tax)
amounts.Add(unitPrices(unitPrices.Count - 1) + tax)
End If
' Extract date if found
Dim dateMatch = Regex.Match(line, datePattern)
If dateMatch.Success Then
Console.WriteLine($"Receipt Date: {dateMatch.Value}")
End If
Next
' Output the extracted data
For i As Integer = 0 To descriptions.Count - 1
Console.WriteLine($"Description: {descriptions(i)}")
Console.WriteLine("Quantity: 1.00 Units")
Console.WriteLine($"Unit Price: ${unitPrices(i):0.00}")
Console.WriteLine($"Taxes: ${taxes(i):0.00}")
Console.WriteLine($"Amount: ${amounts(i):0.00}")
Console.WriteLine("-----------------------")
Next
' Calculate and display totals
Dim subtotal = unitPrices.Sum()
Dim totalTax = taxes.Sum()
Dim grandTotal = amounts.Sum()
Console.WriteLine(vbCrLf & $"Subtotal: ${subtotal:0.00}")
Console.WriteLine($"Total Tax: ${totalTax:0.00}")
Console.WriteLine($"Grand Total: ${grandTotal:0.00}")
End Using
End Sub
End Class
哪些技術可以提高收據掃描準確度?
準確掃描收據的關鍵技巧: -字符白名單:將識別限制為預期字符 影像預處理:包括去斜、解析度增強和去雜訊。 -模式匹配:使用正規表示式擷取結構化資料 -置信度評分:基於識別置信度驗證結果
如何擷取完整的收據內容?
提取完整收據內容並保留格式:
using IronOcr;
using System;
using System.Linq;
class WholeReceiptExtractor
{
static void Main()
{
var ocr = new IronTesseract();
// Configure for receipt scanning
ocr.Configuration.ReadBarCodes = true; // Enable barcode detection
ocr.Configuration.TesseractVersion = TesseractVersion.Tesseract5; // Use latest engine
ocr.Configuration.EngineMode = TesseractEngineMode.TesseractAndLstm; // Best accuracy
using (var input = new OcrInput(@"r3.png"))
{
// Apply automatic image correction
input.WithTitle("Receipt Scan");
// Use computer vision to find text regions
var textRegions = input.FindTextRegions();
Console.WriteLine($"Found {textRegions.Count()} text regions");
// Apply optimal filters for receipt processing
input.ApplyOcrInputFilters();
// Perform OCR on the entire receipt
var result = ocr.Read(input);
// Display extracted text
Console.WriteLine("=== EXTRACTED RECEIPT TEXT ===");
Console.WriteLine(result.Text);
// Get detailed results
Console.WriteLine($"\n=== OCR STATISTICS ===");
Console.WriteLine($"OCR Confidence: {result.Confidence:F2}%");
Console.WriteLine($"Pages Processed: {result.Pages.Length}");
Console.WriteLine($"Paragraphs Found: {result.Paragraphs.Length}");
Console.WriteLine($"Lines Detected: {result.Lines.Length}");
Console.WriteLine($"Words Recognized: {result.Words.Length}");
// Extract any barcodes found
if (result.Barcodes.Any())
{
Console.WriteLine("\n=== BARCODES DETECTED ===");
foreach(var barcode in result.Barcodes)
{
Console.WriteLine($"Type: {barcode.Type}");
Console.WriteLine($"Value: {barcode.Value}");
Console.WriteLine($"Location: X={barcode.X}, Y={barcode.Y}");
}
}
// Save as searchable PDF
result.SaveAsSearchablePdf("receipt_searchable.pdf");
Console.WriteLine("\nSearchable PDF saved as: receipt_searchable.pdf");
// Export as hOCR for preservation
result.SaveAsHocrFile("receipt_hocr.html");
Console.WriteLine("hOCR file saved as: receipt_hocr.html");
}
}
}
using IronOcr;
using System;
using System.Linq;
class WholeReceiptExtractor
{
static void Main()
{
var ocr = new IronTesseract();
// Configure for receipt scanning
ocr.Configuration.ReadBarCodes = true; // Enable barcode detection
ocr.Configuration.TesseractVersion = TesseractVersion.Tesseract5; // Use latest engine
ocr.Configuration.EngineMode = TesseractEngineMode.TesseractAndLstm; // Best accuracy
using (var input = new OcrInput(@"r3.png"))
{
// Apply automatic image correction
input.WithTitle("Receipt Scan");
// Use computer vision to find text regions
var textRegions = input.FindTextRegions();
Console.WriteLine($"Found {textRegions.Count()} text regions");
// Apply optimal filters for receipt processing
input.ApplyOcrInputFilters();
// Perform OCR on the entire receipt
var result = ocr.Read(input);
// Display extracted text
Console.WriteLine("=== EXTRACTED RECEIPT TEXT ===");
Console.WriteLine(result.Text);
// Get detailed results
Console.WriteLine($"\n=== OCR STATISTICS ===");
Console.WriteLine($"OCR Confidence: {result.Confidence:F2}%");
Console.WriteLine($"Pages Processed: {result.Pages.Length}");
Console.WriteLine($"Paragraphs Found: {result.Paragraphs.Length}");
Console.WriteLine($"Lines Detected: {result.Lines.Length}");
Console.WriteLine($"Words Recognized: {result.Words.Length}");
// Extract any barcodes found
if (result.Barcodes.Any())
{
Console.WriteLine("\n=== BARCODES DETECTED ===");
foreach(var barcode in result.Barcodes)
{
Console.WriteLine($"Type: {barcode.Type}");
Console.WriteLine($"Value: {barcode.Value}");
Console.WriteLine($"Location: X={barcode.X}, Y={barcode.Y}");
}
}
// Save as searchable PDF
result.SaveAsSearchablePdf("receipt_searchable.pdf");
Console.WriteLine("\nSearchable PDF saved as: receipt_searchable.pdf");
// Export as hOCR for preservation
result.SaveAsHocrFile("receipt_hocr.html");
Console.WriteLine("hOCR file saved as: receipt_hocr.html");
}
}
}
Imports IronOcr
Imports System
Imports System.Linq
Class WholeReceiptExtractor
Shared Sub Main()
Dim ocr = New IronTesseract()
' Configure for receipt scanning
ocr.Configuration.ReadBarCodes = True ' Enable barcode detection
ocr.Configuration.TesseractVersion = TesseractVersion.Tesseract5 ' Use latest engine
ocr.Configuration.EngineMode = TesseractEngineMode.TesseractAndLstm ' Best accuracy
Using input = New OcrInput("r3.png")
' Apply automatic image correction
input.WithTitle("Receipt Scan")
' Use computer vision to find text regions
Dim textRegions = input.FindTextRegions()
Console.WriteLine($"Found {textRegions.Count()} text regions")
' Apply optimal filters for receipt processing
input.ApplyOcrInputFilters()
' Perform OCR on the entire receipt
Dim result = ocr.Read(input)
' Display extracted text
Console.WriteLine("=== EXTRACTED RECEIPT TEXT ===")
Console.WriteLine(result.Text)
' Get detailed results
Console.WriteLine(vbCrLf & "=== OCR STATISTICS ===")
Console.WriteLine($"OCR Confidence: {result.Confidence:F2}%")
Console.WriteLine($"Pages Processed: {result.Pages.Length}")
Console.WriteLine($"Paragraphs Found: {result.Paragraphs.Length}")
Console.WriteLine($"Lines Detected: {result.Lines.Length}")
Console.WriteLine($"Words Recognized: {result.Words.Length}")
' Extract any barcodes found
If result.Barcodes.Any() Then
Console.WriteLine(vbCrLf & "=== BARCODES DETECTED ===")
For Each barcode In result.Barcodes
Console.WriteLine($"Type: {barcode.Type}")
Console.WriteLine($"Value: {barcode.Value}")
Console.WriteLine($"Location: X={barcode.X}, Y={barcode.Y}")
Next
End If
' Save as searchable PDF
result.SaveAsSearchablePdf("receipt_searchable.pdf")
Console.WriteLine(vbCrLf & "Searchable PDF saved as: receipt_searchable.pdf")
' Export as hOCR for preservation
result.SaveAsHocrFile("receipt_hocr.html")
Console.WriteLine("hOCR file saved as: receipt_hocr.html")
End Using
End Sub
End Class
Visual Studio 偵錯控制台顯示從 PDF 提取的發票數據,包括項目描述、數量、價格、稅額和總計。 掃描收據 API 輸出
哪些進階功能可以提升收據掃描體驗?
IronOCR 提供多項進階功能,可顯著提高收據掃描準確率:
IronOCR 支援哪些語言?
1.多語言支援:處理125 多種語言的收據,或在一個文件中處理多種語言的收據。
IronOCR 能讀取收據上的條碼嗎?
2.條碼讀取:自動偵測和讀取條碼和二維碼。
電腦視覺如何幫助處理收據?
3.電腦視覺:在 OCR 之前使用高階文字偵測來定位文字區域。
我可以針對獨特的收據格式訓練自訂模型嗎?
4.自訂培訓:為特殊收據格式培訓自訂字體。
如何提高批量處理的效能?
// Example: Async receipt processing for high-volume scenarios
using IronOcr;
using System;
using System.Threading.Tasks;
using System.Collections.Generic;
using System.IO;
class BulkReceiptProcessor
{
static async Task Main()
{
var ocr = new IronTesseract();
// Configure for optimal performance
ocr.Configuration.TesseractVersion = TesseractVersion.Tesseract5;
ocr.Configuration.UseMultiThreading = true;
ocr.Configuration.ProcessorCount = Environment.ProcessorCount;
// Process multiple receipts asynchronously
var receiptFiles = Directory.GetFiles(@"C:\Receipts\", "*.jpg");
var tasks = new List<Task<OcrResult>>();
foreach (var file in receiptFiles)
{
tasks.Add(ProcessReceiptAsync(ocr, file));
}
// Wait for all receipts to be processed
var results = await Task.WhenAll(tasks);
// Aggregate results
decimal totalAmount = 0;
foreach (var result in results)
{
// Extract total from each receipt
var match = System.Text.RegularExpressions.Regex.Match(
result.Text, @"Total:?\s*\$?(\d+\.\d{2})");
if (match.Success && decimal.TryParse(match.Groups[1].Value, out var amount))
{
totalAmount += amount;
}
}
Console.WriteLine($"Processed {results.Length} receipts");
Console.WriteLine($"Combined total: ${totalAmount:F2}");
}
static async Task<OcrResult> ProcessReceiptAsync(IronTesseract ocr, string filePath)
{
using (var input = new OcrInput(filePath))
{
// Apply preprocessing
input.DeNoise();
input.Deskew();
input.EnhanceResolution(200);
// Process asynchronously
return await ocr.ReadAsync(input);
}
}
}
// Example: Async receipt processing for high-volume scenarios
using IronOcr;
using System;
using System.Threading.Tasks;
using System.Collections.Generic;
using System.IO;
class BulkReceiptProcessor
{
static async Task Main()
{
var ocr = new IronTesseract();
// Configure for optimal performance
ocr.Configuration.TesseractVersion = TesseractVersion.Tesseract5;
ocr.Configuration.UseMultiThreading = true;
ocr.Configuration.ProcessorCount = Environment.ProcessorCount;
// Process multiple receipts asynchronously
var receiptFiles = Directory.GetFiles(@"C:\Receipts\", "*.jpg");
var tasks = new List<Task<OcrResult>>();
foreach (var file in receiptFiles)
{
tasks.Add(ProcessReceiptAsync(ocr, file));
}
// Wait for all receipts to be processed
var results = await Task.WhenAll(tasks);
// Aggregate results
decimal totalAmount = 0;
foreach (var result in results)
{
// Extract total from each receipt
var match = System.Text.RegularExpressions.Regex.Match(
result.Text, @"Total:?\s*\$?(\d+\.\d{2})");
if (match.Success && decimal.TryParse(match.Groups[1].Value, out var amount))
{
totalAmount += amount;
}
}
Console.WriteLine($"Processed {results.Length} receipts");
Console.WriteLine($"Combined total: ${totalAmount:F2}");
}
static async Task<OcrResult> ProcessReceiptAsync(IronTesseract ocr, string filePath)
{
using (var input = new OcrInput(filePath))
{
// Apply preprocessing
input.DeNoise();
input.Deskew();
input.EnhanceResolution(200);
// Process asynchronously
return await ocr.ReadAsync(input);
}
}
}
Imports IronOcr
Imports System
Imports System.Threading.Tasks
Imports System.Collections.Generic
Imports System.IO
Imports System.Text.RegularExpressions
Module BulkReceiptProcessor
Sub Main()
MainAsync().GetAwaiter().GetResult()
End Sub
Private Async Function MainAsync() As Task
Dim ocr As New IronTesseract()
' Configure for optimal performance
ocr.Configuration.TesseractVersion = TesseractVersion.Tesseract5
ocr.Configuration.UseMultiThreading = True
ocr.Configuration.ProcessorCount = Environment.ProcessorCount
' Process multiple receipts asynchronously
Dim receiptFiles = Directory.GetFiles("C:\Receipts\", "*.jpg")
Dim tasks As New List(Of Task(Of OcrResult))()
For Each file In receiptFiles
tasks.Add(ProcessReceiptAsync(ocr, file))
Next
' Wait for all receipts to be processed
Dim results = Await Task.WhenAll(tasks)
' Aggregate results
Dim totalAmount As Decimal = 0
For Each result In results
' Extract total from each receipt
Dim match = Regex.Match(result.Text, "Total:?\s*\$?(\d+\.\d{2})")
If match.Success AndAlso Decimal.TryParse(match.Groups(1).Value, totalAmount) Then
totalAmount += totalAmount
End If
Next
Console.WriteLine($"Processed {results.Length} receipts")
Console.WriteLine($"Combined total: ${totalAmount:F2}")
End Function
Private Async Function ProcessReceiptAsync(ocr As IronTesseract, filePath As String) As Task(Of OcrResult)
Using input As New OcrInput(filePath)
' Apply preprocessing
input.DeNoise()
input.Deskew()
input.EnhanceResolution(200)
' Process asynchronously
Return Await ocr.ReadAsync(input)
End Using
End Function
End Module
如何應對常見的收據掃描難題?
收據掃描面臨一些獨特的挑戰,而 IronOCR 可以幫助解決這些挑戰:
如何處理品質差的收據圖片?
-影像品質不佳:使用濾鏡精靈自動尋找最佳預處理設定。
如果收據傾斜或旋轉怎麼辦?
-傾斜或旋轉的收據:自動頁面旋轉偵測可確保正確的方向。
如何處理褪色或對比度低的收據?
IronOCR 可以辨識皺巴巴或破損的收據嗎?
-皺巴巴或破損的收據:進階預處理可從難以辨認的影像中恢復文字。
如何管理不同的收據格式和版面?
不同零售商的收據格式差異很大。 IronOCR 提供靈活的解決方案:
using IronOcr;
using System;
using System.Collections.Generic;
using System.Linq;
class ReceiptLayoutHandler
{
static void Main()
{
var ocr = new IronTesseract();
// Configure for different receipt layouts
ocr.Configuration.PageSegmentationMode = TesseractPageSegmentationMode.AutoOsd;
ocr.Configuration.EngineMode = TesseractEngineMode.TesseractAndLstm;
using (var input = new OcrInput(@"complex_receipt.jpg"))
{
// Apply region-specific processing
var cropRegion = new CropRectangle(x: 0, y: 100, width: 400, height: 800);
input.AddImage(@"complex_receipt.jpg", cropRegion);
// Process with confidence tracking
var result = ocr.Read(input);
// Parse using confidence scores
var highConfidenceLines = result.Lines
.Where(line => line.Confidence > 85)
.Select(line => line.Text)
.ToList();
// Extract data with fallback strategies
var total = ExtractTotal(highConfidenceLines)
?? ExtractTotalAlternative(result.Text);
Console.WriteLine($"Receipt Total: {total}");
}
}
static decimal? ExtractTotal(List<string> lines)
{
// Primary extraction method
foreach (var line in lines)
{
if (line.Contains("TOTAL") &&
System.Text.RegularExpressions.Regex.IsMatch(line, @"\d+\.\d{2}"))
{
var match = System.Text.RegularExpressions.Regex.Match(line, @"(\d+\.\d{2})");
if (decimal.TryParse(match.Value, out var total))
return total;
}
}
return null;
}
static decimal? ExtractTotalAlternative(string fullText)
{
// Fallback extraction method
var pattern = @"(?:Total|TOTAL|Grand Total|Amount Due).*?(\d+\.\d{2})";
var match = System.Text.RegularExpressions.Regex.Match(fullText, pattern);
if (match.Success && decimal.TryParse(match.Groups[1].Value, out var total))
return total;
return null;
}
}
using IronOcr;
using System;
using System.Collections.Generic;
using System.Linq;
class ReceiptLayoutHandler
{
static void Main()
{
var ocr = new IronTesseract();
// Configure for different receipt layouts
ocr.Configuration.PageSegmentationMode = TesseractPageSegmentationMode.AutoOsd;
ocr.Configuration.EngineMode = TesseractEngineMode.TesseractAndLstm;
using (var input = new OcrInput(@"complex_receipt.jpg"))
{
// Apply region-specific processing
var cropRegion = new CropRectangle(x: 0, y: 100, width: 400, height: 800);
input.AddImage(@"complex_receipt.jpg", cropRegion);
// Process with confidence tracking
var result = ocr.Read(input);
// Parse using confidence scores
var highConfidenceLines = result.Lines
.Where(line => line.Confidence > 85)
.Select(line => line.Text)
.ToList();
// Extract data with fallback strategies
var total = ExtractTotal(highConfidenceLines)
?? ExtractTotalAlternative(result.Text);
Console.WriteLine($"Receipt Total: {total}");
}
}
static decimal? ExtractTotal(List<string> lines)
{
// Primary extraction method
foreach (var line in lines)
{
if (line.Contains("TOTAL") &&
System.Text.RegularExpressions.Regex.IsMatch(line, @"\d+\.\d{2}"))
{
var match = System.Text.RegularExpressions.Regex.Match(line, @"(\d+\.\d{2})");
if (decimal.TryParse(match.Value, out var total))
return total;
}
}
return null;
}
static decimal? ExtractTotalAlternative(string fullText)
{
// Fallback extraction method
var pattern = @"(?:Total|TOTAL|Grand Total|Amount Due).*?(\d+\.\d{2})";
var match = System.Text.RegularExpressions.Regex.Match(fullText, pattern);
if (match.Success && decimal.TryParse(match.Groups[1].Value, out var total))
return total;
return null;
}
}
Imports IronOcr
Imports System
Imports System.Collections.Generic
Imports System.Linq
Class ReceiptLayoutHandler
Shared Sub Main()
Dim ocr = New IronTesseract()
' Configure for different receipt layouts
ocr.Configuration.PageSegmentationMode = TesseractPageSegmentationMode.AutoOsd
ocr.Configuration.EngineMode = TesseractEngineMode.TesseractAndLstm
Using input = New OcrInput("complex_receipt.jpg")
' Apply region-specific processing
Dim cropRegion = New CropRectangle(x:=0, y:=100, width:=400, height:=800)
input.AddImage("complex_receipt.jpg", cropRegion)
' Process with confidence tracking
Dim result = ocr.Read(input)
' Parse using confidence scores
Dim highConfidenceLines = result.Lines _
.Where(Function(line) line.Confidence > 85) _
.Select(Function(line) line.Text) _
.ToList()
' Extract data with fallback strategies
Dim total = ExtractTotal(highConfidenceLines) _
OrElse ExtractTotalAlternative(result.Text)
Console.WriteLine($"Receipt Total: {total}")
End Using
End Sub
Shared Function ExtractTotal(lines As List(Of String)) As Decimal?
' Primary extraction method
For Each line In lines
If line.Contains("TOTAL") AndAlso _
System.Text.RegularExpressions.Regex.IsMatch(line, "\d+\.\d{2}") Then
Dim match = System.Text.RegularExpressions.Regex.Match(line, "(\d+\.\d{2})")
Dim total As Decimal
If Decimal.TryParse(match.Value, total) Then
Return total
End If
End If
Next
Return Nothing
End Function
Shared Function ExtractTotalAlternative(fullText As String) As Decimal?
' Fallback extraction method
Dim pattern = "(?:Total|TOTAL|Grand Total|Amount Due).*?(\d+\.\d{2})"
Dim match = System.Text.RegularExpressions.Regex.Match(fullText, pattern)
Dim total As Decimal
If match.Success AndAlso Decimal.TryParse(match.Groups(1).Value, total) Then
Return total
End If
Return Nothing
End Function
End Class
關於收據掃描 API,我應該記住哪些關鍵要點?
IronOCR 等收據掃描 API 為自動從收據中提取資料提供了可靠的解決方案。透過使用先進的 OCR 技術,企業可以自動提取供應商名稱、購買日期、商品明細、價格、稅金和總計等資訊。 支援多種語言、貨幣和條碼,企業可以簡化收據管理,節省時間,並做出數據驅動的決策。
IronOCR 為開發人員提供準確且高效的文字擷取工具,從而實現任務自動化並提高效率。 該庫的完整功能集包括對各種文件類型的支持,以及最近的改進,例如記憶體減少 98% 。
滿足先決條件並整合 IronOCR 後,即可享受自動收據處理帶來的好處。 該程式庫的文檔、範例和故障排除指南可確保順利實施。
有關更多信息,請訪問許可頁面或瀏覽C# Tesseract OCR 教程。
常見問題解答
我如何在 C# 中使用 OCR 自動化收據數據提取?
您可以在 C# 中使用 IronOCR 自動化收據資料擷取,從而以高精度從收據影像中擷取關鍵細節,如項目列、價格、稅項和總金額。
在 C# 中設置收據掃描專案的先決條件是什麼?
要在 C# 中設置收據掃描專案,您需要 Visual Studio、基本的 C# 編程知識以及在專案中安裝的 IronOCR 程式庫。
如何在 Visual Studio 中使用 NuGet 套件管理器安裝 OCR 程式庫?
開啟 Visual Studio,然後轉到工具 > NuGet 套件管理器 > 管理解決方案的 NuGet 套件,搜索 IronOCR 並在專案中安裝。
我可以使用 Visual Studio 命令行安裝 OCR 程式庫嗎?
是的,您可以通過打開 Visual Studio 中的套件管理器控制台,然後運行命令:Install-Package IronOcr來安裝 IronOCR。
如何使用 OCR 從整個收據中提取文本?
要從整個收據中提取文本,請使用 IronOCR 對整張收據圖像進行 OCR,然後使用 C# 代碼輸出提取的文本。
收據掃描 API 的好處是什麼?
像 IronOCR 這樣的收據掃描 API 自動化資料擷取,最小化手動錯誤,提升生產力,並為更好的商業決策提供支出模式洞察。
OCR 程式庫是否支持多語言和多貨幣?
是的,IronOCR 支援多語言、多貨幣和收據格式,這使其非常適合全球應用。
OCR 程式庫在從影像中擷取文本方面的準確性如何?
IronOCR 通過使用先進的 OCR 演算法、計算機視覺和機器學習模型,甚至在挑戰性情況下也能保證高準確性。
可以用 OCR 從收據中提取哪些類型的數據?
IronOCR 可以提取數據如項目條目、價格、稅款金額、總金額和其他收據細節。
自動化收據解析如何改善業務流程?
通過 IronOCR 自動化收據解析可以改善業務流程,從而減少手動輸入,允許準確的資料收集,並能夠做出數據驅動的決策。

