使用 IRONOCR 如何使用 IronOCR 创建 .NET OCR SDK Kannapat Udonpant 已发布:九月 29, 2025 Download IronOCR NuGet 下载 DLL 下载 Windows 安装程序 Start Free Trial Copy for LLMs Copy for LLMs Copy page as Markdown for LLMs Open in ChatGPT Ask ChatGPT about this page Open in Gemini Ask Gemini about this page Open in Grok Ask Grok about this page Open in Perplexity Ask Perplexity about this page Share Share on Facebook Share on X (Twitter) Share on LinkedIn Copy URL Email article Suppose you’ve ever needed to extract text from scanned documents, PDFs, or images. In that case, you know how tricky it can be to handle different file formats, multiple languages, and low-quality scans. That’s where OCR (optical character recognition) comes in, turning scanned images and document files into editable text you can work with programmatically. In this guide, we’ll explore how to build a high-performance .NET OCR SDK using IronOCR, showing you how to perform OCR, extract structured data, and generate searchable PDFs across multiple document types. You’ll learn how to process scanned PDFs, images, and other text files in a way that’s fast, reliable, and integrates seamlessly into .NET applications on desktop, web, or mobile devices. What Makes IronOCR the Ideal .NET OCR SDK? Building an OCR library from scratch requires months of development, image preprocessing, and extensive testing. IronOCR eliminates this overhead by providing a comprehensive .NET OCR SDK that supports various formats and integrates seamlessly into .NET applications. The SDK handles the heavy lifting of text recognition while offering features typically found only in enterprise solutions: High performance across various document formats and scanned images Support for 125+ languages and handwritten text recognition Adaptive binarization, font information, and bounding box support for zonal OCR Ability to process scanned PDFs, image formats, and text blocks Instant searchable document creation with hidden text layers Unlike raw Tesseract implementations, IronOCR works immediately across Windows, Linux, macOS, and cloud platforms, supporting OCR APIs, AI-assisted recognition, and seamless integration without additional configuration. Getting Started with IronOCR Installation takes seconds through NuGet Package Manager. Run: Install-Package IronOcr For detailed installation instructions, refer to the IronOCR documentation. Once installed, extracting text from scanned documents becomes straightforward: using IronOcr; public class OcrService { private readonly IronTesseract _ocr; public OcrService() { _ocr = new IronTesseract(); } public string ExtractText(string imagePath) { using var input = new OcrInput(); input.LoadImage(imagePath); var result = _ocr.Read(input); return result.Text; } } using IronOcr; public class OcrService { private readonly IronTesseract _ocr; public OcrService() { _ocr = new IronTesseract(); } public string ExtractText(string imagePath) { using var input = new OcrInput(); input.LoadImage(imagePath); var result = _ocr.Read(input); return result.Text; } } IRON VB CONVERTER ERROR developers@ironsoftware.com $vbLabelText $csharpLabel This code creates a reusable OCR service that handles various image formats, including JPEG, PNG, TIFF, and BMP, as well as PDF documents and other document formats, all automatically. To test it, we'll run it through our main class with this example image: class Program { static void Main(string[] args) { var ocrService = new OcrService(); string imagePath = "test.png"; // Replace with your image path string extractedText = ocrService.ExtractText(imagePath); Console.WriteLine(extractedText); } } class Program { static void Main(string[] args) { var ocrService = new OcrService(); string imagePath = "test.png"; // Replace with your image path string extractedText = ocrService.ExtractText(imagePath); Console.WriteLine(extractedText); } } IRON VB CONVERTER ERROR developers@ironsoftware.com $vbLabelText $csharpLabel Output Building Core OCR Functionality Real-world applications need more than basic text extraction. IronOCR provides comprehensive document processing capabilities: // Async document processing with barcodes public async Task<ProcessedDocument> ProcessDocumentAsync(string filePath) { using var input = new OcrInput(); LoadFile(input, filePath); input.DeNoise(); input.Deskew(); var result = await _ocr.ReadAsync(input); return new ProcessedDocument { Text = result.Text, Confidence = result.Confidence, Barcodes = result.Barcodes.Select(b => b.Value).ToList() }; } // Helper to load image or PDF private void LoadFile(OcrInput input, string filePath) { if (filePath.EndsWith(".pdf", StringComparison.OrdinalIgnoreCase)) input.LoadPdf(filePath); else input.LoadImage(filePath); } // Model for processed documents with barcodes public class ProcessedDocument { public string Text { get; set; } public double Confidence { get; set; } public List<string> Barcodes { get; set; } } // Async document processing with barcodes public async Task<ProcessedDocument> ProcessDocumentAsync(string filePath) { using var input = new OcrInput(); LoadFile(input, filePath); input.DeNoise(); input.Deskew(); var result = await _ocr.ReadAsync(input); return new ProcessedDocument { Text = result.Text, Confidence = result.Confidence, Barcodes = result.Barcodes.Select(b => b.Value).ToList() }; } // Helper to load image or PDF private void LoadFile(OcrInput input, string filePath) { if (filePath.EndsWith(".pdf", StringComparison.OrdinalIgnoreCase)) input.LoadPdf(filePath); else input.LoadImage(filePath); } // Model for processed documents with barcodes public class ProcessedDocument { public string Text { get; set; } public double Confidence { get; set; } public List<string> Barcodes { get; set; } } IRON VB CONVERTER ERROR developers@ironsoftware.com $vbLabelText $csharpLabel This implementation handles multiple documents, applies image preprocessing, and extracts barcodes and text from the same document. The async pattern ensures high performance in .NET applications. Output Enhancing Accuracy with Built-in Features IronOCR's preprocessing capabilities significantly improve recognition accuracy on real-world documents: // OCR optimized for low-quality images public string ProcessLowQualityDocument(string filePath) { using var input = new OcrInput(); LoadFile(input, filePath); // Preprocessing for low-quality documents input.DeNoise(); input.Deskew(); input.Scale(150); input.Binarize(); input.EnhanceResolution(300); var result = _ocr.Read(input); return result.Text; } // OCR optimized for low-quality images public string ProcessLowQualityDocument(string filePath) { using var input = new OcrInput(); LoadFile(input, filePath); // Preprocessing for low-quality documents input.DeNoise(); input.Deskew(); input.Scale(150); input.Binarize(); input.EnhanceResolution(300); var result = _ocr.Read(input); return result.Text; } IRON VB CONVERTER ERROR developers@ironsoftware.com $vbLabelText $csharpLabel Each filter targets specific issues with document quality. DeNoise() removes artifacts from scanning, Deskew() corrects tilted pages, and EnhanceResolution() sharpens blurry text. These filters work together to achieve accurate text extraction even from poor-quality sources. According to discussions on Stack Overflow, proper preprocessing can improve OCR accuracy by up to 40%. Advanced Data Extraction SDK Capabilities IronOCR extends beyond basic text extraction with features essential for modern .NET OCR SDK applications: // Create a searchable PDF from an image or PDF public void CreateSearchablePdf(string inputPath, string outputPath) { using var input = new OcrInput(); LoadFile(input, inputPath); _ocr.Read(input).SaveAsSearchablePdf(outputPath); } // Extract structured data (phone numbers, emails, amounts) from text public List<string> ExtractStructuredData(string filePath) { using var input = new OcrInput(); LoadFile(input, filePath); var result = _ocr.Read(input); var text = result.Text; var phoneNumbers = Regex.Matches(text, @"\+?\d[\d\s\-]{7,}\d") .Select(m => m.Value).ToList(); var emails = Regex.Matches(text, @"[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-z]{2,}") .Select(m => m.Value).ToList(); var amounts = Regex.Matches(text, @"\$\d+(?:\.\d{2})?") .Select(m => m.Value).ToList(); return phoneNumbers.Concat(emails).Concat(amounts).ToList(); } // Create a searchable PDF from an image or PDF public void CreateSearchablePdf(string inputPath, string outputPath) { using var input = new OcrInput(); LoadFile(input, inputPath); _ocr.Read(input).SaveAsSearchablePdf(outputPath); } // Extract structured data (phone numbers, emails, amounts) from text public List<string> ExtractStructuredData(string filePath) { using var input = new OcrInput(); LoadFile(input, filePath); var result = _ocr.Read(input); var text = result.Text; var phoneNumbers = Regex.Matches(text, @"\+?\d[\d\s\-]{7,}\d") .Select(m => m.Value).ToList(); var emails = Regex.Matches(text, @"[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-z]{2,}") .Select(m => m.Value).ToList(); var amounts = Regex.Matches(text, @"\$\d+(?:\.\d{2})?") .Select(m => m.Value).ToList(); return phoneNumbers.Concat(emails).Concat(amounts).ToList(); } IRON VB CONVERTER ERROR developers@ironsoftware.com $vbLabelText $csharpLabel The code we've written here shows two key OCR operations. CreateSearchablePdf converts an input scanned PDF or image into a searchable document with editable text for easy text recognition across multiple document formats. ExtractStructuredData processes the same scanned document to extract data, such as phone numbers, emails, and amounts, from diverse document types, enabling .NET applications to handle scanned images, text files, and PDF documents efficiently. Production-Ready Implementation Deploy IronOCR confidently with built-in production features: public class ProductionOcrService { private readonly IronTesseract _ocr; private readonly ILogger _logger; public ProductionOcrService(ILogger logger) { _logger = logger; _ocr = new IronTesseract(); // Production configuration _ocr.Configuration.RenderSearchablePdfsAndHocr = true; _ocr.Configuration.ReadBarCodes = true; } public async Task<string> ProcessBatchAsync(string[] documents) { var results = new List<string>(); // Parallel processing for performance await Parallel.ForEachAsync(documents, async (doc, ct) => { try { var text = await ExtractTextAsync(doc); results.Add(text); _logger.LogInformation($"Processed: {doc}"); } catch (Exception ex) { _logger.LogError(ex, $"Failed: {doc}"); } }); return string.Join("\n", results); } } public class ProductionOcrService { private readonly IronTesseract _ocr; private readonly ILogger _logger; public ProductionOcrService(ILogger logger) { _logger = logger; _ocr = new IronTesseract(); // Production configuration _ocr.Configuration.RenderSearchablePdfsAndHocr = true; _ocr.Configuration.ReadBarCodes = true; } public async Task<string> ProcessBatchAsync(string[] documents) { var results = new List<string>(); // Parallel processing for performance await Parallel.ForEachAsync(documents, async (doc, ct) => { try { var text = await ExtractTextAsync(doc); results.Add(text); _logger.LogInformation($"Processed: {doc}"); } catch (Exception ex) { _logger.LogError(ex, $"Failed: {doc}"); } }); return string.Join("\n", results); } } IRON VB CONVERTER ERROR developers@ironsoftware.com $vbLabelText $csharpLabel This pattern demonstrates parallel processing for batch operations, structured logging for monitoring, and graceful error handling that prevents single-document failures from stopping entire batches. Real-World Application: Invoice Processing Here's how organizations use IronOCR as their .NET OCR SDK to automate invoice processing: // Extract structured invoice data public Invoice ExtractInvoiceData(string invoicePath) { using var input = new OcrInput(); LoadFile(input, invoicePath); // Preprocessing for documents input.DeNoise(); input.Deskew(); var result = _ocr.Read(input); var text = result.Text; return new Invoice { InvoiceNumber = ExtractInvoiceNumber(text), Date = ExtractDate(text), TotalAmount = ExtractAmount(text), RawText = text }; } // --- Helper methods for invoice parsing --- private string ExtractInvoiceNumber(string text) { // Example: Invoice #: 12345 var match = Regex.Match(text, @"Invoice\s*#?:?\s*(\S+)"); return match.Success ? match.Groups[1].Value : null; } private DateOnly? ExtractDate(string text) { // Numeric dates var numericMatch = Regex.Match(text, @"\b(\d{1,2}/\d{1,2}/\d{2,4})\b"); if (numericMatch.Success && DateTime.TryParse(numericMatch.Groups[1].Value, out var numericDate)) return DateOnly.FromDateTime(numericDate); // Written-out dates var writtenMatch = Regex.Match(text, @"\b(January|February|March|April|May|June|July|August|September|October|November|December)\s+(\d{1,2}),?\s+(\d{4})\b", RegexOptions.IgnoreCase); if (writtenMatch.Success && DateTime.TryParse(writtenMatch.Value, out var writtenDate)) return DateOnly.FromDateTime(writtenDate); return null; } private decimal? ExtractAmount(string text) { var match = Regex.Match(text, @"\$\s*(\d+(?:\.\d{2})?)"); if (match.Success && decimal.TryParse(match.Groups[1].Value, out var amount)) return amount; return null; } // Extract structured invoice data public Invoice ExtractInvoiceData(string invoicePath) { using var input = new OcrInput(); LoadFile(input, invoicePath); // Preprocessing for documents input.DeNoise(); input.Deskew(); var result = _ocr.Read(input); var text = result.Text; return new Invoice { InvoiceNumber = ExtractInvoiceNumber(text), Date = ExtractDate(text), TotalAmount = ExtractAmount(text), RawText = text }; } // --- Helper methods for invoice parsing --- private string ExtractInvoiceNumber(string text) { // Example: Invoice #: 12345 var match = Regex.Match(text, @"Invoice\s*#?:?\s*(\S+)"); return match.Success ? match.Groups[1].Value : null; } private DateOnly? ExtractDate(string text) { // Numeric dates var numericMatch = Regex.Match(text, @"\b(\d{1,2}/\d{1,2}/\d{2,4})\b"); if (numericMatch.Success && DateTime.TryParse(numericMatch.Groups[1].Value, out var numericDate)) return DateOnly.FromDateTime(numericDate); // Written-out dates var writtenMatch = Regex.Match(text, @"\b(January|February|March|April|May|June|July|August|September|October|November|December)\s+(\d{1,2}),?\s+(\d{4})\b", RegexOptions.IgnoreCase); if (writtenMatch.Success && DateTime.TryParse(writtenMatch.Value, out var writtenDate)) return DateOnly.FromDateTime(writtenDate); return null; } private decimal? ExtractAmount(string text) { var match = Regex.Match(text, @"\$\s*(\d+(?:\.\d{2})?)"); if (match.Success && decimal.TryParse(match.Groups[1].Value, out var amount)) return amount; return null; } IRON VB CONVERTER ERROR developers@ironsoftware.com $vbLabelText $csharpLabel This approach processes thousands of invoices daily, extracting key fields for automatic entry into accounting systems. Output Conclusion IronOCR transforms .NET applications into sophisticated document processing solutions without the complexity of building OCR from scratch. With extensive language support, superior accuracy, and production-ready features, it's the complete .NET OCR SDK that developers trust for enterprise applications. IronOCR offers flexible licensing options starting at $liteLicense for single-developer use, with options scaling to enterprise deployments. The royalty-free model means no additional costs when distributing your OCR SDK applications to customers. Ready to build your .NET OCR SDK? Start your free trial to begin building production applications today. 使用 NuGet 安装 PM > Install-Package IronOcr 在 IronOCR 上查看 NuGet 快速安装。超过 1000 万次下载,它正以 C# 改变 PDF 开发。 您也可以下载 DLL 或 Windows 安装程序。 常见问题解答 .NET OCR SDK是什么? IronOCR的.NET OCR SDK是一个设计用于将光学字符识别功能集成到C#应用中的库,使开发人员能够从图像、PDF和扫描文档中提取文本。 IronOCR的.NET SDK的关键特性是什么? IronOCR的.NET SDK提供简单的API、支持多种语言、跨平台兼容性,以及处理各种文件格式和低质量扫描的高级功能。 IronOCR如何处理不同语言? IronOCR的.NET SDK支持多种语言,从而能够从各种语言的文档中提取和识别文本,而无需额外配置。 IronOCR能处理低质量扫描吗? 是的,IronOCR设计用于有效处理低质量扫描,采用高级算法来增强文本识别精度,即使在具有挑战性的情况下也是如此。 IronOCR的.NET SDK是跨平台的吗? IronOCR的.NET SDK是跨平台的,这意味着它可以在不同的操作系统上使用,使其适用于各种开发环境。 IronOCR支持哪些文件格式? IronOCR支持包括图像、PDF和扫描文档在内的多种文件格式,为不同媒体的文本识别任务提供灵活性。 开发人员如何将IronOCR集成到他们的项目中? 开发人员可以轻松地将IronOCR集成到他们的C#项目中,其直观的API简化了为应用添加OCR功能的过程。 IronOCR有哪些使用案例? IronOCR可以用于文档管理系统、自动数据录入、内容数字化,以及任何需要从图像或PDF中提取文本的应用程序。 Kannapat Udonpant 立即与工程团队聊天 软件工程师 在成为软件工程师之前,Kannapat 在日本北海道大学完成了环境资源博士学位。在攻读学位期间,Kannapat 还成为了车辆机器人实验室的成员,隶属于生物生产工程系。2022 年,他利用自己的 C# 技能加入 Iron Software 的工程团队,专注于 IronPDF。Kannapat 珍视他的工作,因为他可以直接从编写大多数 IronPDF 代码的开发者那里学习。除了同行学习外,Kannapat 还喜欢在 Iron Software 工作的社交方面。不撰写代码或文档时,Kannapat 通常可以在他的 PS5 上玩游戏或重温《最后生还者》。 相关文章 已发布九月 29, 2025 如何在 C# GitHub 项目中集成 OCR 使用 IronOCR OCR C# GitHub 教程:使用 IronOCR 在您的 GitHub 项目中实施文本识别。包括代码示例和版本控制技巧。 阅读更多 已更新九月 4, 2025 我们如何将文档处理内存减少 98%:IronOCR 工程突破 IronOCR 2025.9 通过流架构将 TIFF 处理内存减少 98%,消除崩溃并提高企业工作流的速度。 阅读更多 已更新八月 5, 2025 使用 IronOCR 解锁可搜索 PDF 的强大功能:网络研讨会回顾 在这个以开发者为中心的会议中,Iron Software 的 Chipego Kalinda 和 Darren Steddy 演示了如何使用 IronOCR 将扫描的 PDF 转换为可搜索的合规文档。观看 PDF/UA 无障碍、全文搜索和目标数据提取的实时示例 - 只需几行 C# 代码。 阅读更多 如何在 C# GitHub 项目中集成...
已发布九月 29, 2025 如何在 C# GitHub 项目中集成 OCR 使用 IronOCR OCR C# GitHub 教程:使用 IronOCR 在您的 GitHub 项目中实施文本识别。包括代码示例和版本控制技巧。 阅读更多
已更新九月 4, 2025 我们如何将文档处理内存减少 98%:IronOCR 工程突破 IronOCR 2025.9 通过流架构将 TIFF 处理内存减少 98%,消除崩溃并提高企业工作流的速度。 阅读更多
已更新八月 5, 2025 使用 IronOCR 解锁可搜索 PDF 的强大功能:网络研讨会回顾 在这个以开发者为中心的会议中,Iron Software 的 Chipego Kalinda 和 Darren Steddy 演示了如何使用 IronOCR 将扫描的 PDF 转换为可搜索的合规文档。观看 PDF/UA 无障碍、全文搜索和目标数据提取的实时示例 - 只需几行 C# 代码。 阅读更多