收据扫描 API:使用 C# 和IronOCR从收据中提取数据
收据扫描 API 使用 OCR 技术自动从收据中提取数据,从而显著减少人工输入错误并加快处理速度。 本指南展示了如何在 C# 中使用IronOCR从收据图像中准确提取供应商名称、日期、商品、价格和总计,并内置图像预处理功能,支持多种格式。
为什么选择IronOCR进行收据扫描?
IronOCR是一个灵活的OCR 库,可从扫描文档、图像和 PDF 中可靠地提取文本。 IronOCR凭借先进的算法、计算机视觉和机器学习模型,即使在具有挑战性的场景下也能确保高精度。 该库支持多种语言和字体样式,使其适用于全球应用。 通过将IronOCR集成到您的应用程序中,您可以自动执行数据输入和文本分析,从而提高生产力。
IronOCR如何从收据图像中提取文本?
IronOCR可以从文档、照片、屏幕截图和实时摄像头视频流中检索文本,并以 JSON 响应的形式返回。 IronOCR利用复杂的算法和机器学习技术,分析图像数据,识别字符,并将其转换为机器可读的文本。 该库采用Tesseract 5 技术,并结合专有改进技术,以实现更高的精度。
为什么IronOCR非常适合收据处理?
IronOCR擅长处理低质量扫描、各种收据格式和不同方向的图像。 内置图像预处理滤镜可在处理前自动改善图像质量,即使是皱巴巴或褪色的收据也能确保获得最佳效果。
使用IronOCR需要哪些条件?
在使用IronOCR之前,请确保满足以下先决条件:
支持哪些开发环境?
1.开发环境:安装合适的 IDE,例如 Visual Studio。 IronOCR支持Windows 、 Linux 、 macOS 、 Azure和AWS 。
需要哪些编程技能?
需要哪些软件依赖项?
- IronOCR安装:通过NuGet包管理器安装。 可能需要特定于平台的依赖项。
是否需要许可证密钥?
4.许可证密钥(可选) :提供免费试用; 生产用途需要获得许可。
如何创建一个用于收据扫描的新 Visual Studio 项目?
如何在 Visual Studio 中创建一个新项目?
打开 Visual Studio 并转到 "文件",然后悬停在 "新建 "上,点击 "项目"。
Visual Studio IDE 中,"文件"菜单已展开,"新建 > 项目"选项已高亮显示,代码编辑器中正在显示用于加载 Excel 工作簿的 C# 代码。
安装 IronOCR
我应该选择哪个项目模板?
选择"控制台应用程序",然后单击"下一步"。 此模板非常适合在将IronOCR应用于 Web 应用程序之前进行学习。
Visual Studio 的"创建新项目"对话框显示已选择"控制台应用程序"模板,并包含 Windows、Linux 和 macOS 平台选项。 控制台应用程序
我的收据扫描仪项目应该如何命名?
请填写项目名称和地点,然后单击"下一步"。 选择一个描述性的名称,例如"ReceiptScannerAPI"。
Visual Studio 新建项目配置屏幕,用于创建名为"IronOCR"的控制台应用程序,已选择 C# 并显示解决方案设置。 项目配置
我应该选择哪个 .NET Framework 版本?
为了获得最佳兼容性,请选择.NET 5.0 或更高版本,然后单击"创建"。
Visual Studio 的"附加信息"对话框显示了控制台应用程序的配置,其中目标框架选择为 .NET 5.0,平台选项包括 Linux、macOS、Windows 和控制台。 目标框架
如何在我的项目中安装IronOCR ?
有两种简便的安装方法:
如何使用NuGet包管理器?
转到"工具" > "NuGet程序包管理器" > "管理解决方案的NuGet程序包"。
Visual Studio NuGet 包管理器设置对话框,其中包含包源配置,以及解决方案资源管理器中的 C# 项目结构。 NuGet 软件包管理器
搜索IronOCR并安装该软件包。 对于非英文收据,请安装特定语言的软件包。
Visual Studio 中的 NuGet 包管理器显示已安装的 IronOCR 包,包括主库以及阿拉伯语、希伯来语和西班牙语的特定语言 OCR 包。 IronOCR
如何使用命令行安装?
- 转到"工具" > "NuGet包管理器" > "包管理器控制台" 。
-
输入以下命令:
Install-Package IronOcr
Visual Studio 包管理器控制台窗口显示正在为名为"Create PDF"的项目执行 NuGet 命令"PM> Install-Package IronOCR"。 软件包管理器控制台
如何使用IronOCR快速提取收据数据?
只需几行代码即可提取收据数据:
-
使用 NuGet 包管理器安装 https://www.nuget.org/packages/IronOcr
PM > Install-Package IronOcr -
复制并运行这段代码。
using IronOcr; using System; var ocr = new IronTesseract(); // Configure for receipt scanning ocr.Configuration.ReadBarCodes = true; ocr.Configuration.WhiteListCharacters = "0123456789.$,ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz% "; using (var input = new OcrInput(@"receipt.jpg")) { // Apply automatic image enhancement input.DeNoise(); input.Deskew(); input.EnhanceResolution(225); // Extract text from receipt var result = ocr.Read(input); // Display extracted text and confidence Console.WriteLine($"Extracted Text:\n{result.Text}"); Console.WriteLine($"\nConfidence: {result.Confidence}%"); } -
部署到您的生产环境中进行测试
通过免费试用立即在您的项目中开始使用IronOCR
如何从收据图像中提取结构化数据?
IronOCR可以从各种文档类型中提取商品明细、价格、税费和总计。该库支持PDF 、多页 TIFF和各种图像格式。
using IronOcr;
using System;
using System.Collections.Generic;
using System.Text.RegularExpressions;
class ReceiptScanner
{
static void Main()
{
var ocr = new IronTesseract();
// Configure OCR for optimal receipt reading
ocr.Configuration.WhiteListCharacters = "0123456789.$,ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz% ";
ocr.Configuration.BlackListCharacters = "~`@#*_}{][|\\";
ocr.Configuration.TesseractVersion = TesseractVersion.Tesseract5;
// Load the image of the receipt
using (var input = new OcrInput(@"r2.png"))
{
// Apply image enhancement filters
input.Deskew(); // Fix image rotation
input.EnhanceResolution(225); // Optimal DPI for receipts
input.DeNoise(); // Remove background noise
input.Sharpen(); // Improve text clarity
// Perform OCR on the input image
var result = ocr.Read(input);
// Regular expression patterns to extract relevant details from the OCR result
var descriptionPattern = @"\w+\s+(.*?)\s+(\d+\.\d+)\s+Units\s+(\d+\.\d+)\s+Tax15%\s+\$(\d+\.\d+)";
var pricePattern = @"\$\d+(\.\d{2})?";
var datePattern = @"\d{1,2}[/-]\d{1,2}[/-]\d{2,4}";
// Variables to store extracted data
var descriptions = new List<string>();
var unitPrices = new List<decimal>();
var taxes = new List<decimal>();
var amounts = new List<decimal>();
var lines = result.Text.Split('\n');
foreach (var line in lines)
{
// Match each line against the description pattern
var descriptionMatch = Regex.Match(line, descriptionPattern);
if (descriptionMatch.Success)
{
descriptions.Add(descriptionMatch.Groups[1].Value.Trim());
unitPrices.Add(decimal.Parse(descriptionMatch.Groups[2].Value));
// Calculate tax and total amount for each item
var tax = unitPrices[unitPrices.Count - 1] * 0.15m;
taxes.Add(tax);
amounts.Add(unitPrices[unitPrices.Count - 1] + tax);
}
// Extract date if found
var dateMatch = Regex.Match(line, datePattern);
if (dateMatch.Success)
{
Console.WriteLine($"Receipt Date: {dateMatch.Value}");
}
}
// Output the extracted data
for (int i = 0; i < descriptions.Count; i++)
{
Console.WriteLine($"Description: {descriptions[i]}");
Console.WriteLine($"Quantity: 1.00 Units");
Console.WriteLine($"Unit Price: ${unitPrices[i]:0.00}");
Console.WriteLine($"Taxes: ${taxes[i]:0.00}");
Console.WriteLine($"Amount: ${amounts[i]:0.00}");
Console.WriteLine("-----------------------");
}
// Calculate and display totals
var subtotal = unitPrices.Sum();
var totalTax = taxes.Sum();
var grandTotal = amounts.Sum();
Console.WriteLine($"\nSubtotal: ${subtotal:0.00}");
Console.WriteLine($"Total Tax: ${totalTax:0.00}");
Console.WriteLine($"Grand Total: ${grandTotal:0.00}");
}
}
}
using IronOcr;
using System;
using System.Collections.Generic;
using System.Text.RegularExpressions;
class ReceiptScanner
{
static void Main()
{
var ocr = new IronTesseract();
// Configure OCR for optimal receipt reading
ocr.Configuration.WhiteListCharacters = "0123456789.$,ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz% ";
ocr.Configuration.BlackListCharacters = "~`@#*_}{][|\\";
ocr.Configuration.TesseractVersion = TesseractVersion.Tesseract5;
// Load the image of the receipt
using (var input = new OcrInput(@"r2.png"))
{
// Apply image enhancement filters
input.Deskew(); // Fix image rotation
input.EnhanceResolution(225); // Optimal DPI for receipts
input.DeNoise(); // Remove background noise
input.Sharpen(); // Improve text clarity
// Perform OCR on the input image
var result = ocr.Read(input);
// Regular expression patterns to extract relevant details from the OCR result
var descriptionPattern = @"\w+\s+(.*?)\s+(\d+\.\d+)\s+Units\s+(\d+\.\d+)\s+Tax15%\s+\$(\d+\.\d+)";
var pricePattern = @"\$\d+(\.\d{2})?";
var datePattern = @"\d{1,2}[/-]\d{1,2}[/-]\d{2,4}";
// Variables to store extracted data
var descriptions = new List<string>();
var unitPrices = new List<decimal>();
var taxes = new List<decimal>();
var amounts = new List<decimal>();
var lines = result.Text.Split('\n');
foreach (var line in lines)
{
// Match each line against the description pattern
var descriptionMatch = Regex.Match(line, descriptionPattern);
if (descriptionMatch.Success)
{
descriptions.Add(descriptionMatch.Groups[1].Value.Trim());
unitPrices.Add(decimal.Parse(descriptionMatch.Groups[2].Value));
// Calculate tax and total amount for each item
var tax = unitPrices[unitPrices.Count - 1] * 0.15m;
taxes.Add(tax);
amounts.Add(unitPrices[unitPrices.Count - 1] + tax);
}
// Extract date if found
var dateMatch = Regex.Match(line, datePattern);
if (dateMatch.Success)
{
Console.WriteLine($"Receipt Date: {dateMatch.Value}");
}
}
// Output the extracted data
for (int i = 0; i < descriptions.Count; i++)
{
Console.WriteLine($"Description: {descriptions[i]}");
Console.WriteLine($"Quantity: 1.00 Units");
Console.WriteLine($"Unit Price: ${unitPrices[i]:0.00}");
Console.WriteLine($"Taxes: ${taxes[i]:0.00}");
Console.WriteLine($"Amount: ${amounts[i]:0.00}");
Console.WriteLine("-----------------------");
}
// Calculate and display totals
var subtotal = unitPrices.Sum();
var totalTax = taxes.Sum();
var grandTotal = amounts.Sum();
Console.WriteLine($"\nSubtotal: ${subtotal:0.00}");
Console.WriteLine($"Total Tax: ${totalTax:0.00}");
Console.WriteLine($"Grand Total: ${grandTotal:0.00}");
}
}
}
Imports IronOcr
Imports System
Imports System.Collections.Generic
Imports System.Text.RegularExpressions
Imports System.Linq
Class ReceiptScanner
Shared Sub Main()
Dim ocr = New IronTesseract()
' Configure OCR for optimal receipt reading
ocr.Configuration.WhiteListCharacters = "0123456789.$,ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz% "
ocr.Configuration.BlackListCharacters = "~`@#*_}{][|\"
ocr.Configuration.TesseractVersion = TesseractVersion.Tesseract5
' Load the image of the receipt
Using input = New OcrInput("r2.png")
' Apply image enhancement filters
input.Deskew() ' Fix image rotation
input.EnhanceResolution(225) ' Optimal DPI for receipts
input.DeNoise() ' Remove background noise
input.Sharpen() ' Improve text clarity
' Perform OCR on the input image
Dim result = ocr.Read(input)
' Regular expression patterns to extract relevant details from the OCR result
Dim descriptionPattern = "\w+\s+(.*?)\s+(\d+\.\d+)\s+Units\s+(\d+\.\d+)\s+Tax15%\s+\$(\d+\.\d+)"
Dim pricePattern = "\$\d+(\.\d{2})?"
Dim datePattern = "\d{1,2}[/-]\d{1,2}[/-]\d{2,4}"
' Variables to store extracted data
Dim descriptions = New List(Of String)()
Dim unitPrices = New List(Of Decimal)()
Dim taxes = New List(Of Decimal)()
Dim amounts = New List(Of Decimal)()
Dim lines = result.Text.Split(ControlChars.Lf)
For Each line In lines
' Match each line against the description pattern
Dim descriptionMatch = Regex.Match(line, descriptionPattern)
If descriptionMatch.Success Then
descriptions.Add(descriptionMatch.Groups(1).Value.Trim())
unitPrices.Add(Decimal.Parse(descriptionMatch.Groups(2).Value))
' Calculate tax and total amount for each item
Dim tax = unitPrices(unitPrices.Count - 1) * 0.15D
taxes.Add(tax)
amounts.Add(unitPrices(unitPrices.Count - 1) + tax)
End If
' Extract date if found
Dim dateMatch = Regex.Match(line, datePattern)
If dateMatch.Success Then
Console.WriteLine($"Receipt Date: {dateMatch.Value}")
End If
Next
' Output the extracted data
For i As Integer = 0 To descriptions.Count - 1
Console.WriteLine($"Description: {descriptions(i)}")
Console.WriteLine("Quantity: 1.00 Units")
Console.WriteLine($"Unit Price: ${unitPrices(i):0.00}")
Console.WriteLine($"Taxes: ${taxes(i):0.00}")
Console.WriteLine($"Amount: ${amounts(i):0.00}")
Console.WriteLine("-----------------------")
Next
' Calculate and display totals
Dim subtotal = unitPrices.Sum()
Dim totalTax = taxes.Sum()
Dim grandTotal = amounts.Sum()
Console.WriteLine(vbCrLf & $"Subtotal: ${subtotal:0.00}")
Console.WriteLine($"Total Tax: ${totalTax:0.00}")
Console.WriteLine($"Grand Total: ${grandTotal:0.00}")
End Using
End Sub
End Class
哪些技术可以提高收据扫描准确率?
准确扫描收据的关键技巧: -字符白名单:将识别限制为预期字符 图像预处理:包括去斜、分辨率增强和去噪。 -模式匹配:使用正则表达式提取结构化数据 -置信度评分:基于识别置信度验证结果
如何提取完整的收据内容?
提取完整收据内容并保留格式:
using IronOcr;
using System;
using System.Linq;
class WholeReceiptExtractor
{
static void Main()
{
var ocr = new IronTesseract();
// Configure for receipt scanning
ocr.Configuration.ReadBarCodes = true; // Enable barcode detection
ocr.Configuration.TesseractVersion = TesseractVersion.Tesseract5; // Use latest engine
ocr.Configuration.EngineMode = TesseractEngineMode.TesseractAndLstm; // Best accuracy
using (var input = new OcrInput(@"r3.png"))
{
// Apply automatic image correction
input.WithTitle("Receipt Scan");
// Use computer vision to find text regions
var textRegions = input.FindTextRegions();
Console.WriteLine($"Found {textRegions.Count()} text regions");
// Apply optimal filters for receipt processing
input.ApplyOcrInputFilters();
// Perform OCR on the entire receipt
var result = ocr.Read(input);
// Display extracted text
Console.WriteLine("=== EXTRACTED RECEIPT TEXT ===");
Console.WriteLine(result.Text);
// Get detailed results
Console.WriteLine($"\n=== OCR STATISTICS ===");
Console.WriteLine($"OCR Confidence: {result.Confidence:F2}%");
Console.WriteLine($"Pages Processed: {result.Pages.Length}");
Console.WriteLine($"Paragraphs Found: {result.Paragraphs.Length}");
Console.WriteLine($"Lines Detected: {result.Lines.Length}");
Console.WriteLine($"Words Recognized: {result.Words.Length}");
// Extract any barcodes found
if (result.Barcodes.Any())
{
Console.WriteLine("\n=== BARCODES DETECTED ===");
foreach(var barcode in result.Barcodes)
{
Console.WriteLine($"Type: {barcode.Type}");
Console.WriteLine($"Value: {barcode.Value}");
Console.WriteLine($"Location: X={barcode.X}, Y={barcode.Y}");
}
}
// Save as searchable PDF
result.SaveAsSearchablePdf("receipt_searchable.pdf");
Console.WriteLine("\nSearchable PDF saved as: receipt_searchable.pdf");
// Export as hOCR for preservation
result.SaveAsHocrFile("receipt_hocr.html");
Console.WriteLine("hOCR file saved as: receipt_hocr.html");
}
}
}
using IronOcr;
using System;
using System.Linq;
class WholeReceiptExtractor
{
static void Main()
{
var ocr = new IronTesseract();
// Configure for receipt scanning
ocr.Configuration.ReadBarCodes = true; // Enable barcode detection
ocr.Configuration.TesseractVersion = TesseractVersion.Tesseract5; // Use latest engine
ocr.Configuration.EngineMode = TesseractEngineMode.TesseractAndLstm; // Best accuracy
using (var input = new OcrInput(@"r3.png"))
{
// Apply automatic image correction
input.WithTitle("Receipt Scan");
// Use computer vision to find text regions
var textRegions = input.FindTextRegions();
Console.WriteLine($"Found {textRegions.Count()} text regions");
// Apply optimal filters for receipt processing
input.ApplyOcrInputFilters();
// Perform OCR on the entire receipt
var result = ocr.Read(input);
// Display extracted text
Console.WriteLine("=== EXTRACTED RECEIPT TEXT ===");
Console.WriteLine(result.Text);
// Get detailed results
Console.WriteLine($"\n=== OCR STATISTICS ===");
Console.WriteLine($"OCR Confidence: {result.Confidence:F2}%");
Console.WriteLine($"Pages Processed: {result.Pages.Length}");
Console.WriteLine($"Paragraphs Found: {result.Paragraphs.Length}");
Console.WriteLine($"Lines Detected: {result.Lines.Length}");
Console.WriteLine($"Words Recognized: {result.Words.Length}");
// Extract any barcodes found
if (result.Barcodes.Any())
{
Console.WriteLine("\n=== BARCODES DETECTED ===");
foreach(var barcode in result.Barcodes)
{
Console.WriteLine($"Type: {barcode.Type}");
Console.WriteLine($"Value: {barcode.Value}");
Console.WriteLine($"Location: X={barcode.X}, Y={barcode.Y}");
}
}
// Save as searchable PDF
result.SaveAsSearchablePdf("receipt_searchable.pdf");
Console.WriteLine("\nSearchable PDF saved as: receipt_searchable.pdf");
// Export as hOCR for preservation
result.SaveAsHocrFile("receipt_hocr.html");
Console.WriteLine("hOCR file saved as: receipt_hocr.html");
}
}
}
Imports IronOcr
Imports System
Imports System.Linq
Class WholeReceiptExtractor
Shared Sub Main()
Dim ocr = New IronTesseract()
' Configure for receipt scanning
ocr.Configuration.ReadBarCodes = True ' Enable barcode detection
ocr.Configuration.TesseractVersion = TesseractVersion.Tesseract5 ' Use latest engine
ocr.Configuration.EngineMode = TesseractEngineMode.TesseractAndLstm ' Best accuracy
Using input = New OcrInput("r3.png")
' Apply automatic image correction
input.WithTitle("Receipt Scan")
' Use computer vision to find text regions
Dim textRegions = input.FindTextRegions()
Console.WriteLine($"Found {textRegions.Count()} text regions")
' Apply optimal filters for receipt processing
input.ApplyOcrInputFilters()
' Perform OCR on the entire receipt
Dim result = ocr.Read(input)
' Display extracted text
Console.WriteLine("=== EXTRACTED RECEIPT TEXT ===")
Console.WriteLine(result.Text)
' Get detailed results
Console.WriteLine(vbCrLf & "=== OCR STATISTICS ===")
Console.WriteLine($"OCR Confidence: {result.Confidence:F2}%")
Console.WriteLine($"Pages Processed: {result.Pages.Length}")
Console.WriteLine($"Paragraphs Found: {result.Paragraphs.Length}")
Console.WriteLine($"Lines Detected: {result.Lines.Length}")
Console.WriteLine($"Words Recognized: {result.Words.Length}")
' Extract any barcodes found
If result.Barcodes.Any() Then
Console.WriteLine(vbCrLf & "=== BARCODES DETECTED ===")
For Each barcode In result.Barcodes
Console.WriteLine($"Type: {barcode.Type}")
Console.WriteLine($"Value: {barcode.Value}")
Console.WriteLine($"Location: X={barcode.X}, Y={barcode.Y}")
Next
End If
' Save as searchable PDF
result.SaveAsSearchablePdf("receipt_searchable.pdf")
Console.WriteLine(vbCrLf & "Searchable PDF saved as: receipt_searchable.pdf")
' Export as hOCR for preservation
result.SaveAsHocrFile("receipt_hocr.html")
Console.WriteLine("hOCR file saved as: receipt_hocr.html")
End Using
End Sub
End Class
Visual Studio 调试控制台显示从 PDF 中提取的发票数据,包括项目描述、数量、价格、税额和总计。 扫描接收 API 输出
哪些高级功能可以提升收据扫描体验?
IronOCR提供多项高级功能,可显著提高收据扫描准确率:
IronOCR支持哪些语言?
1.多语言支持:处理125 多种语言的收据,或在一个文档中处理多种语言的收据。
IronOCR能读取收据上的条形码吗?
2.条形码读取:自动检测和读取条形码和二维码。
计算机视觉如何帮助处理收据?
3.计算机视觉:在 OCR 之前使用高级文本检测来定位文本区域。
我可以针对特殊的收据格式训练自定义模型吗?
4.自定义培训:为特殊收据格式培训自定义字体。
如何提高批量处理的性能?
// Example: Async receipt processing for high-volume scenarios
using IronOcr;
using System;
using System.Threading.Tasks;
using System.Collections.Generic;
using System.IO;
class BulkReceiptProcessor
{
static async Task Main()
{
var ocr = new IronTesseract();
// Configure for optimal performance
ocr.Configuration.TesseractVersion = TesseractVersion.Tesseract5;
ocr.Configuration.UseMultiThreading = true;
ocr.Configuration.ProcessorCount = Environment.ProcessorCount;
// Process multiple receipts asynchronously
var receiptFiles = Directory.GetFiles(@"C:\Receipts\", "*.jpg");
var tasks = new List<Task<OcrResult>>();
foreach (var file in receiptFiles)
{
tasks.Add(ProcessReceiptAsync(ocr, file));
}
// Wait for all receipts to be processed
var results = await Task.WhenAll(tasks);
// Aggregate results
decimal totalAmount = 0;
foreach (var result in results)
{
// Extract total from each receipt
var match = System.Text.RegularExpressions.Regex.Match(
result.Text, @"Total:?\s*\$?(\d+\.\d{2})");
if (match.Success && decimal.TryParse(match.Groups[1].Value, out var amount))
{
totalAmount += amount;
}
}
Console.WriteLine($"Processed {results.Length} receipts");
Console.WriteLine($"Combined total: ${totalAmount:F2}");
}
static async Task<OcrResult> ProcessReceiptAsync(IronTesseract ocr, string filePath)
{
using (var input = new OcrInput(filePath))
{
// Apply preprocessing
input.DeNoise();
input.Deskew();
input.EnhanceResolution(200);
// Process asynchronously
return await ocr.ReadAsync(input);
}
}
}
// Example: Async receipt processing for high-volume scenarios
using IronOcr;
using System;
using System.Threading.Tasks;
using System.Collections.Generic;
using System.IO;
class BulkReceiptProcessor
{
static async Task Main()
{
var ocr = new IronTesseract();
// Configure for optimal performance
ocr.Configuration.TesseractVersion = TesseractVersion.Tesseract5;
ocr.Configuration.UseMultiThreading = true;
ocr.Configuration.ProcessorCount = Environment.ProcessorCount;
// Process multiple receipts asynchronously
var receiptFiles = Directory.GetFiles(@"C:\Receipts\", "*.jpg");
var tasks = new List<Task<OcrResult>>();
foreach (var file in receiptFiles)
{
tasks.Add(ProcessReceiptAsync(ocr, file));
}
// Wait for all receipts to be processed
var results = await Task.WhenAll(tasks);
// Aggregate results
decimal totalAmount = 0;
foreach (var result in results)
{
// Extract total from each receipt
var match = System.Text.RegularExpressions.Regex.Match(
result.Text, @"Total:?\s*\$?(\d+\.\d{2})");
if (match.Success && decimal.TryParse(match.Groups[1].Value, out var amount))
{
totalAmount += amount;
}
}
Console.WriteLine($"Processed {results.Length} receipts");
Console.WriteLine($"Combined total: ${totalAmount:F2}");
}
static async Task<OcrResult> ProcessReceiptAsync(IronTesseract ocr, string filePath)
{
using (var input = new OcrInput(filePath))
{
// Apply preprocessing
input.DeNoise();
input.Deskew();
input.EnhanceResolution(200);
// Process asynchronously
return await ocr.ReadAsync(input);
}
}
}
Imports IronOcr
Imports System
Imports System.Threading.Tasks
Imports System.Collections.Generic
Imports System.IO
Imports System.Text.RegularExpressions
Module BulkReceiptProcessor
Sub Main()
MainAsync().GetAwaiter().GetResult()
End Sub
Private Async Function MainAsync() As Task
Dim ocr As New IronTesseract()
' Configure for optimal performance
ocr.Configuration.TesseractVersion = TesseractVersion.Tesseract5
ocr.Configuration.UseMultiThreading = True
ocr.Configuration.ProcessorCount = Environment.ProcessorCount
' Process multiple receipts asynchronously
Dim receiptFiles = Directory.GetFiles("C:\Receipts\", "*.jpg")
Dim tasks As New List(Of Task(Of OcrResult))()
For Each file In receiptFiles
tasks.Add(ProcessReceiptAsync(ocr, file))
Next
' Wait for all receipts to be processed
Dim results = Await Task.WhenAll(tasks)
' Aggregate results
Dim totalAmount As Decimal = 0
For Each result In results
' Extract total from each receipt
Dim match = Regex.Match(result.Text, "Total:?\s*\$?(\d+\.\d{2})")
If match.Success AndAlso Decimal.TryParse(match.Groups(1).Value, totalAmount) Then
totalAmount += totalAmount
End If
Next
Console.WriteLine($"Processed {results.Length} receipts")
Console.WriteLine($"Combined total: ${totalAmount:F2}")
End Function
Private Async Function ProcessReceiptAsync(ocr As IronTesseract, filePath As String) As Task(Of OcrResult)
Using input As New OcrInput(filePath)
' Apply preprocessing
input.DeNoise()
input.Deskew()
input.EnhanceResolution(200)
' Process asynchronously
Return Await ocr.ReadAsync(input)
End Using
End Function
End Module
如何应对常见的收据扫描难题?
收据扫描面临着一些独特的挑战,而IronOCR可以帮助解决这些挑战:
如何处理质量差的收据图片?
-图像质量差:使用滤镜向导自动查找最佳预处理设置。
如果收据出现倾斜或旋转,该怎么办?
-倾斜或旋转的收据:自动页面旋转检测可确保正确的方向。
如何处理褪色或对比度低的收据?
IronOCR可以识别皱巴巴或破损的收据吗?
-皱巴巴或破损的收据:高级预处理可从难以辨认的图像中恢复文本。
如何管理不同的收据格式和布局?
不同零售商的收据格式差异很大。 IronOCR提供灵活的解决方案:
using IronOcr;
using System;
using System.Collections.Generic;
using System.Linq;
class ReceiptLayoutHandler
{
static void Main()
{
var ocr = new IronTesseract();
// Configure for different receipt layouts
ocr.Configuration.PageSegmentationMode = TesseractPageSegmentationMode.AutoOsd;
ocr.Configuration.EngineMode = TesseractEngineMode.TesseractAndLstm;
using (var input = new OcrInput(@"complex_receipt.jpg"))
{
// Apply region-specific processing
var cropRegion = new CropRectangle(x: 0, y: 100, width: 400, height: 800);
input.AddImage(@"complex_receipt.jpg", cropRegion);
// Process with confidence tracking
var result = ocr.Read(input);
// Parse using confidence scores
var highConfidenceLines = result.Lines
.Where(line => line.Confidence > 85)
.Select(line => line.Text)
.ToList();
// Extract data with fallback strategies
var total = ExtractTotal(highConfidenceLines)
?? ExtractTotalAlternative(result.Text);
Console.WriteLine($"Receipt Total: {total}");
}
}
static decimal? ExtractTotal(List<string> lines)
{
// Primary extraction method
foreach (var line in lines)
{
if (line.Contains("TOTAL") &&
System.Text.RegularExpressions.Regex.IsMatch(line, @"\d+\.\d{2}"))
{
var match = System.Text.RegularExpressions.Regex.Match(line, @"(\d+\.\d{2})");
if (decimal.TryParse(match.Value, out var total))
return total;
}
}
return null;
}
static decimal? ExtractTotalAlternative(string fullText)
{
// Fallback extraction method
var pattern = @"(?:Total|TOTAL|Grand Total|Amount Due).*?(\d+\.\d{2})";
var match = System.Text.RegularExpressions.Regex.Match(fullText, pattern);
if (match.Success && decimal.TryParse(match.Groups[1].Value, out var total))
return total;
return null;
}
}
using IronOcr;
using System;
using System.Collections.Generic;
using System.Linq;
class ReceiptLayoutHandler
{
static void Main()
{
var ocr = new IronTesseract();
// Configure for different receipt layouts
ocr.Configuration.PageSegmentationMode = TesseractPageSegmentationMode.AutoOsd;
ocr.Configuration.EngineMode = TesseractEngineMode.TesseractAndLstm;
using (var input = new OcrInput(@"complex_receipt.jpg"))
{
// Apply region-specific processing
var cropRegion = new CropRectangle(x: 0, y: 100, width: 400, height: 800);
input.AddImage(@"complex_receipt.jpg", cropRegion);
// Process with confidence tracking
var result = ocr.Read(input);
// Parse using confidence scores
var highConfidenceLines = result.Lines
.Where(line => line.Confidence > 85)
.Select(line => line.Text)
.ToList();
// Extract data with fallback strategies
var total = ExtractTotal(highConfidenceLines)
?? ExtractTotalAlternative(result.Text);
Console.WriteLine($"Receipt Total: {total}");
}
}
static decimal? ExtractTotal(List<string> lines)
{
// Primary extraction method
foreach (var line in lines)
{
if (line.Contains("TOTAL") &&
System.Text.RegularExpressions.Regex.IsMatch(line, @"\d+\.\d{2}"))
{
var match = System.Text.RegularExpressions.Regex.Match(line, @"(\d+\.\d{2})");
if (decimal.TryParse(match.Value, out var total))
return total;
}
}
return null;
}
static decimal? ExtractTotalAlternative(string fullText)
{
// Fallback extraction method
var pattern = @"(?:Total|TOTAL|Grand Total|Amount Due).*?(\d+\.\d{2})";
var match = System.Text.RegularExpressions.Regex.Match(fullText, pattern);
if (match.Success && decimal.TryParse(match.Groups[1].Value, out var total))
return total;
return null;
}
}
Imports IronOcr
Imports System
Imports System.Collections.Generic
Imports System.Linq
Class ReceiptLayoutHandler
Shared Sub Main()
Dim ocr = New IronTesseract()
' Configure for different receipt layouts
ocr.Configuration.PageSegmentationMode = TesseractPageSegmentationMode.AutoOsd
ocr.Configuration.EngineMode = TesseractEngineMode.TesseractAndLstm
Using input = New OcrInput("complex_receipt.jpg")
' Apply region-specific processing
Dim cropRegion = New CropRectangle(x:=0, y:=100, width:=400, height:=800)
input.AddImage("complex_receipt.jpg", cropRegion)
' Process with confidence tracking
Dim result = ocr.Read(input)
' Parse using confidence scores
Dim highConfidenceLines = result.Lines _
.Where(Function(line) line.Confidence > 85) _
.Select(Function(line) line.Text) _
.ToList()
' Extract data with fallback strategies
Dim total = ExtractTotal(highConfidenceLines) _
OrElse ExtractTotalAlternative(result.Text)
Console.WriteLine($"Receipt Total: {total}")
End Using
End Sub
Shared Function ExtractTotal(lines As List(Of String)) As Decimal?
' Primary extraction method
For Each line In lines
If line.Contains("TOTAL") AndAlso _
System.Text.RegularExpressions.Regex.IsMatch(line, "\d+\.\d{2}") Then
Dim match = System.Text.RegularExpressions.Regex.Match(line, "(\d+\.\d{2})")
Dim total As Decimal
If Decimal.TryParse(match.Value, total) Then
Return total
End If
End If
Next
Return Nothing
End Function
Shared Function ExtractTotalAlternative(fullText As String) As Decimal?
' Fallback extraction method
Dim pattern = "(?:Total|TOTAL|Grand Total|Amount Due).*?(\d+\.\d{2})"
Dim match = System.Text.RegularExpressions.Regex.Match(fullText, pattern)
Dim total As Decimal
If match.Success AndAlso Decimal.TryParse(match.Groups(1).Value, total) Then
Return total
End If
Return Nothing
End Function
End Class
关于收据扫描 API,我应该记住哪些关键要点?
IronOCR等收据扫描 API 为自动从收据中提取数据提供了可靠的解决方案。通过使用先进的 OCR 技术,企业可以自动提取供应商名称、购买日期、商品明细、价格、税费和总计。 支持多种语言、货币和条形码,企业可以简化收据管理,节省时间,并做出数据驱动的决策。
IronOCR为开发人员提供准确高效的文本提取工具,从而实现任务自动化并提高效率。 该库的完整功能集包括对各种文档类型的支持,以及最近的改进,例如内存减少 98% 。
满足先决条件并集成IronOCR后,即可享受自动收据处理带来的好处。 该库的文档、示例和故障排除指南可确保顺利实施。
有关更多信息,请访问许可页面或浏览C# Tesseract OCR 教程。
常见问题解答
如何在 C# 中使用 OCR 自动化收据数据提取?
您可以在 C# 中使用 IronOCR 自动化收据数据提取,它允许高效地从收据图像中提取项目明细、价格、税费和总金额等关键信息。
在 C# 中设置收据扫描项目的前提条件是什么?
在 C# 中设置收据扫描项目,您需要 Visual Studio、基本的 C# 编程知识以及在项目中安装 IronOCR 库。
如何使用 Visual Studio 的 NuGet 包管理器安装 OCR 库?
打开 Visual Studio,依次转到工具 > NuGet 包管理器 > 管理解决方案的 NuGet 包,搜索 IronOCR 并安装到您的项目中。
我可以使用 Visual Studio 命令行安装 OCR 库吗?
是的,您可以通过打开 Visual Studio 的包管理器控制台并运行命令:Install-Package IronOcr来安装 IronOCR。
如何使用 OCR 从整个收据提取文本?
要从整个收据提取文本,使用 IronOCR 对完整收据图像执行 OCR,然后使用 C# 代码输出提取的文本。
收据扫描 API 提供了哪些好处?
像 IronOCR 这样的收据扫描 API 自动化数据提取,减少手动错误,提高生产力,并为更好的商业决策提供消费模式的见解。
OCR 库是否支持多种语言和货币?
是的,IronOCR 支持多种语言、货币和收据格式,非常适合全球应用。
OCR 库在从图像中提取文本方面的准确性如何?
IronOCR 通过使用先进的 OCR 算法、计算机视觉和机器学习模型,确保即使在复杂场景下也能实现高准确性。
使用 OCR 可以从收据中提取哪些类型的数据?
IronOCR 可以提取项目明细、价格、税费、总金额以及其他收据详情等数据。
自动化收据解析如何改善业务流程?
通过 IronOCR 自动化收据解析,可以减少手动输入,实现准确的数据采集,并促进数据驱动的决策制定,从而改善业务流程。

