Saltar al pie de página
USANDO IRONOCR

Cómo crear un SDK de OCR en .NET con IronOCR

Suppose you’ve ever needed to extract text from scanned documents, PDFs, or images. In that case, you know how tricky it can be to handle different file formats, multiple languages, and low-quality scans. That’s where OCR (optical character recognition) comes in, turning scanned images and document files into editable text you can work with programmatically.

In this guide, we’ll explore how to build a high-performance .NET OCR SDK using IronOCR, showing you how to perform OCR, extract structured data, and generate searchable PDFs across multiple document types. You’ll learn how to process scanned PDFs, images, and other text files in a way that’s fast, reliable, and integrates seamlessly into .NET applications on desktop, web, or mobile devices.

What Makes IronOCR the Ideal .NET OCR SDK?

Building an OCR library from scratch requires months of development, image preprocessing, and extensive testing. IronOCR eliminates this overhead by providing a comprehensive .NET OCR SDK that supports various formats and integrates seamlessly into .NET applications.

The SDK handles the heavy lifting of text recognition while offering features typically found only in enterprise solutions:

  • High performance across various document formats and scanned images
  • Support for 125+ languages and handwritten text recognition
  • Adaptive binarization, font information, and bounding box support for zonal OCR
  • Ability to process scanned PDFs, image formats, and text blocks
  • Instant searchable document creation with hidden text layers

Unlike raw Tesseract implementations, IronOCR works immediately across Windows, Linux, macOS, and cloud platforms, supporting OCR APIs, AI-assisted recognition, and seamless integration without additional configuration.

Getting Started with IronOCR

Installation takes seconds through NuGet Package Manager. Run:

Install-Package IronOcr

For detailed installation instructions, refer to the IronOCR documentation. Once installed, extracting text from scanned documents becomes straightforward:

using IronOcr;
public class OcrService
{
    private readonly IronTesseract _ocr;
    public OcrService()
    {
        _ocr = new IronTesseract();
    }
    public string ExtractText(string imagePath)
    {
        using var input = new OcrInput();
        input.LoadImage(imagePath);
        var result = _ocr.Read(input);
        return result.Text;
    }
}
using IronOcr;
public class OcrService
{
    private readonly IronTesseract _ocr;
    public OcrService()
    {
        _ocr = new IronTesseract();
    }
    public string ExtractText(string imagePath)
    {
        using var input = new OcrInput();
        input.LoadImage(imagePath);
        var result = _ocr.Read(input);
        return result.Text;
    }
}
IRON VB CONVERTER ERROR developers@ironsoftware.com
$vbLabelText   $csharpLabel

This code creates a reusable OCR service that handles various image formats, including JPEG, PNG, TIFF, and BMP, as well as PDF documents and other document formats, all automatically.

To test it, we'll run it through our main class with this example image:

class Program
{
    static void Main(string[] args)
    {
        var ocrService = new OcrService();
        string imagePath = "test.png"; // Replace with your image path
        string extractedText = ocrService.ExtractText(imagePath);
        Console.WriteLine(extractedText);
    }
}
class Program
{
    static void Main(string[] args)
    {
        var ocrService = new OcrService();
        string imagePath = "test.png"; // Replace with your image path
        string extractedText = ocrService.ExtractText(imagePath);
        Console.WriteLine(extractedText);
    }
}
IRON VB CONVERTER ERROR developers@ironsoftware.com
$vbLabelText   $csharpLabel

Output

How to Create a .NET OCR SDK with IronOCR: Figure 2 - Example console output

Building Core OCR Functionality

Real-world applications need more than basic text extraction. IronOCR provides comprehensive document processing capabilities:

// Async document processing with barcodes
 public async Task<ProcessedDocument> ProcessDocumentAsync(string filePath)
 {
     using var input = new OcrInput();
     LoadFile(input, filePath);
     input.DeNoise();
     input.Deskew();
     var result = await _ocr.ReadAsync(input);
     return new ProcessedDocument
     {
         Text = result.Text,
         Confidence = result.Confidence,
         Barcodes = result.Barcodes.Select(b => b.Value).ToList()
     };
 }
// Helper to load image or PDF
private void LoadFile(OcrInput input, string filePath)
{
    if (filePath.EndsWith(".pdf", StringComparison.OrdinalIgnoreCase))
        input.LoadPdf(filePath);
    else
        input.LoadImage(filePath);
}
// Model for processed documents with barcodes
public class ProcessedDocument
{
    public string Text { get; set; }
    public double Confidence { get; set; }
    public List<string> Barcodes { get; set; }
}
// Async document processing with barcodes
 public async Task<ProcessedDocument> ProcessDocumentAsync(string filePath)
 {
     using var input = new OcrInput();
     LoadFile(input, filePath);
     input.DeNoise();
     input.Deskew();
     var result = await _ocr.ReadAsync(input);
     return new ProcessedDocument
     {
         Text = result.Text,
         Confidence = result.Confidence,
         Barcodes = result.Barcodes.Select(b => b.Value).ToList()
     };
 }
// Helper to load image or PDF
private void LoadFile(OcrInput input, string filePath)
{
    if (filePath.EndsWith(".pdf", StringComparison.OrdinalIgnoreCase))
        input.LoadPdf(filePath);
    else
        input.LoadImage(filePath);
}
// Model for processed documents with barcodes
public class ProcessedDocument
{
    public string Text { get; set; }
    public double Confidence { get; set; }
    public List<string> Barcodes { get; set; }
}
IRON VB CONVERTER ERROR developers@ironsoftware.com
$vbLabelText   $csharpLabel

This implementation handles multiple documents, applies image preprocessing, and extracts barcodes and text from the same document. The async pattern ensures high performance in .NET applications.

Output

How to Create a .NET OCR SDK with IronOCR: Figure 3 - OCR input image vs. output text

Enhancing Accuracy with Built-in Features

IronOCR's preprocessing capabilities significantly improve recognition accuracy on real-world documents:

// OCR optimized for low-quality images
    public string ProcessLowQualityDocument(string filePath)
    {
        using var input = new OcrInput();
        LoadFile(input, filePath);
        // Preprocessing for low-quality documents
        input.DeNoise();
        input.Deskew();
        input.Scale(150);
        input.Binarize();
        input.EnhanceResolution(300);
        var result = _ocr.Read(input);
        return result.Text;
    }
// OCR optimized for low-quality images
    public string ProcessLowQualityDocument(string filePath)
    {
        using var input = new OcrInput();
        LoadFile(input, filePath);
        // Preprocessing for low-quality documents
        input.DeNoise();
        input.Deskew();
        input.Scale(150);
        input.Binarize();
        input.EnhanceResolution(300);
        var result = _ocr.Read(input);
        return result.Text;
    }
IRON VB CONVERTER ERROR developers@ironsoftware.com
$vbLabelText   $csharpLabel

Each filter targets specific issues with document quality. DeNoise() removes artifacts from scanning, Deskew() corrects tilted pages, and EnhanceResolution() sharpens blurry text.

These filters work together to achieve accurate text extraction even from poor-quality sources. According to discussions on Stack Overflow, proper preprocessing can improve OCR accuracy by up to 40%.

Advanced Data Extraction SDK Capabilities

IronOCR extends beyond basic text extraction with features essential for modern .NET OCR SDK applications:

// Create a searchable PDF from an image or PDF
  public void CreateSearchablePdf(string inputPath, string outputPath)
  {
      using var input = new OcrInput();
      LoadFile(input, inputPath);
      _ocr.Read(input).SaveAsSearchablePdf(outputPath);
  }
  // Extract structured data (phone numbers, emails, amounts) from text
  public List<string> ExtractStructuredData(string filePath)
  {
      using var input = new OcrInput();
      LoadFile(input, filePath);
      var result = _ocr.Read(input);
      var text = result.Text;
      var phoneNumbers = Regex.Matches(text, @"\+?\d[\d\s\-]{7,}\d")
                              .Select(m => m.Value).ToList();
      var emails = Regex.Matches(text, @"[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-z]{2,}")
                        .Select(m => m.Value).ToList();
      var amounts = Regex.Matches(text, @"\$\d+(?:\.\d{2})?")
                         .Select(m => m.Value).ToList();
      return phoneNumbers.Concat(emails).Concat(amounts).ToList();
  }
// Create a searchable PDF from an image or PDF
  public void CreateSearchablePdf(string inputPath, string outputPath)
  {
      using var input = new OcrInput();
      LoadFile(input, inputPath);
      _ocr.Read(input).SaveAsSearchablePdf(outputPath);
  }
  // Extract structured data (phone numbers, emails, amounts) from text
  public List<string> ExtractStructuredData(string filePath)
  {
      using var input = new OcrInput();
      LoadFile(input, filePath);
      var result = _ocr.Read(input);
      var text = result.Text;
      var phoneNumbers = Regex.Matches(text, @"\+?\d[\d\s\-]{7,}\d")
                              .Select(m => m.Value).ToList();
      var emails = Regex.Matches(text, @"[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-z]{2,}")
                        .Select(m => m.Value).ToList();
      var amounts = Regex.Matches(text, @"\$\d+(?:\.\d{2})?")
                         .Select(m => m.Value).ToList();
      return phoneNumbers.Concat(emails).Concat(amounts).ToList();
  }
IRON VB CONVERTER ERROR developers@ironsoftware.com
$vbLabelText   $csharpLabel

The code we've written here shows two key OCR operations. CreateSearchablePdf converts an input scanned PDF or image into a searchable document with editable text for easy text recognition across multiple document formats.

ExtractStructuredData processes the same scanned document to extract data, such as phone numbers, emails, and amounts, from diverse document types, enabling .NET applications to handle scanned images, text files, and PDF documents efficiently.

Production-Ready Implementation

Deploy IronOCR confidently with built-in production features:

public class ProductionOcrService
{
    private readonly IronTesseract _ocr;
    private readonly ILogger _logger;
    public ProductionOcrService(ILogger logger)
    {
        _logger = logger;
        _ocr = new IronTesseract();
        // Production configuration
        _ocr.Configuration.RenderSearchablePdfsAndHocr = true;
        _ocr.Configuration.ReadBarCodes = true;
    }
    public async Task<string> ProcessBatchAsync(string[] documents)
    {
        var results = new List<string>();
        // Parallel processing for performance
        await Parallel.ForEachAsync(documents, async (doc, ct) =>
        {
            try
            {
                var text = await ExtractTextAsync(doc);
                results.Add(text);
                _logger.LogInformation($"Processed: {doc}");
            }
            catch (Exception ex)
            {
                _logger.LogError(ex, $"Failed: {doc}");
            }
        });
        return string.Join("\n", results);
    }
}
public class ProductionOcrService
{
    private readonly IronTesseract _ocr;
    private readonly ILogger _logger;
    public ProductionOcrService(ILogger logger)
    {
        _logger = logger;
        _ocr = new IronTesseract();
        // Production configuration
        _ocr.Configuration.RenderSearchablePdfsAndHocr = true;
        _ocr.Configuration.ReadBarCodes = true;
    }
    public async Task<string> ProcessBatchAsync(string[] documents)
    {
        var results = new List<string>();
        // Parallel processing for performance
        await Parallel.ForEachAsync(documents, async (doc, ct) =>
        {
            try
            {
                var text = await ExtractTextAsync(doc);
                results.Add(text);
                _logger.LogInformation($"Processed: {doc}");
            }
            catch (Exception ex)
            {
                _logger.LogError(ex, $"Failed: {doc}");
            }
        });
        return string.Join("\n", results);
    }
}
IRON VB CONVERTER ERROR developers@ironsoftware.com
$vbLabelText   $csharpLabel

This pattern demonstrates parallel processing for batch operations, structured logging for monitoring, and graceful error handling that prevents single-document failures from stopping entire batches.

Real-World Application: Invoice Processing

Here's how organizations use IronOCR as their .NET OCR SDK to automate invoice processing:

// Extract structured invoice data
    public Invoice ExtractInvoiceData(string invoicePath)
    {
        using var input = new OcrInput();
        LoadFile(input, invoicePath);
        // Preprocessing for documents
        input.DeNoise();
        input.Deskew();
        var result = _ocr.Read(input);
        var text = result.Text;
        return new Invoice
        {
            InvoiceNumber = ExtractInvoiceNumber(text),
            Date = ExtractDate(text),
            TotalAmount = ExtractAmount(text),
            RawText = text
        };
    }
    // --- Helper methods for invoice parsing ---
    private string ExtractInvoiceNumber(string text)
    {
        // Example: Invoice #: 12345
        var match = Regex.Match(text, @"Invoice\s*#?:?\s*(\S+)");
        return match.Success ? match.Groups[1].Value : null;
    }
    private DateOnly? ExtractDate(string text)
    {
        // Numeric dates
        var numericMatch = Regex.Match(text, @"\b(\d{1,2}/\d{1,2}/\d{2,4})\b");
        if (numericMatch.Success && DateTime.TryParse(numericMatch.Groups[1].Value, out var numericDate))
            return DateOnly.FromDateTime(numericDate);
        // Written-out dates
        var writtenMatch = Regex.Match(text,
            @"\b(January|February|March|April|May|June|July|August|September|October|November|December)\s+(\d{1,2}),?\s+(\d{4})\b",
            RegexOptions.IgnoreCase);
        if (writtenMatch.Success && DateTime.TryParse(writtenMatch.Value, out var writtenDate))
            return DateOnly.FromDateTime(writtenDate);
        return null;
    }
    private decimal? ExtractAmount(string text)
    {
        var match = Regex.Match(text, @"\$\s*(\d+(?:\.\d{2})?)");
        if (match.Success && decimal.TryParse(match.Groups[1].Value, out var amount))
            return amount;
        return null;
    }
// Extract structured invoice data
    public Invoice ExtractInvoiceData(string invoicePath)
    {
        using var input = new OcrInput();
        LoadFile(input, invoicePath);
        // Preprocessing for documents
        input.DeNoise();
        input.Deskew();
        var result = _ocr.Read(input);
        var text = result.Text;
        return new Invoice
        {
            InvoiceNumber = ExtractInvoiceNumber(text),
            Date = ExtractDate(text),
            TotalAmount = ExtractAmount(text),
            RawText = text
        };
    }
    // --- Helper methods for invoice parsing ---
    private string ExtractInvoiceNumber(string text)
    {
        // Example: Invoice #: 12345
        var match = Regex.Match(text, @"Invoice\s*#?:?\s*(\S+)");
        return match.Success ? match.Groups[1].Value : null;
    }
    private DateOnly? ExtractDate(string text)
    {
        // Numeric dates
        var numericMatch = Regex.Match(text, @"\b(\d{1,2}/\d{1,2}/\d{2,4})\b");
        if (numericMatch.Success && DateTime.TryParse(numericMatch.Groups[1].Value, out var numericDate))
            return DateOnly.FromDateTime(numericDate);
        // Written-out dates
        var writtenMatch = Regex.Match(text,
            @"\b(January|February|March|April|May|June|July|August|September|October|November|December)\s+(\d{1,2}),?\s+(\d{4})\b",
            RegexOptions.IgnoreCase);
        if (writtenMatch.Success && DateTime.TryParse(writtenMatch.Value, out var writtenDate))
            return DateOnly.FromDateTime(writtenDate);
        return null;
    }
    private decimal? ExtractAmount(string text)
    {
        var match = Regex.Match(text, @"\$\s*(\d+(?:\.\d{2})?)");
        if (match.Success && decimal.TryParse(match.Groups[1].Value, out var amount))
            return amount;
        return null;
    }
IRON VB CONVERTER ERROR developers@ironsoftware.com
$vbLabelText   $csharpLabel

This approach processes thousands of invoices daily, extracting key fields for automatic entry into accounting systems.

Output

How to Create a .NET OCR SDK with IronOCR: Figure 4 - Invoice OCR output

Conclusion

IronOCR transforms .NET applications into sophisticated document processing solutions without the complexity of building OCR from scratch. With extensive language support, superior accuracy, and production-ready features, it's the complete .NET OCR SDK that developers trust for enterprise applications.

IronOCR offers flexible licensing options starting at $liteLicense for single-developer use, with options scaling to enterprise deployments. The royalty-free model means no additional costs when distributing your OCR SDK applications to customers.

Ready to build your .NET OCR SDK? Start your free trial to begin building production applications today.

NuGet Instalar con NuGet

PM >  Install-Package IronOcr

Echa un vistazo a IronOCR en NuGet para una instalación rápida. Con más de 10 millones de descargas, está transformando el desarrollo de PDF con C#. También puede descargar el DLL o el instalador de Windows.

Preguntas Frecuentes

¿Qué es el SDK de .NET OCR?

El SDK de .NET OCR de IronOCR es una biblioteca diseñada para integrar capacidades de reconocimiento óptico de caracteres en aplicaciones C#, permitiendo a los desarrolladores extraer texto de imágenes, PDFs y documentos escaneados.

¿Cuáles son las características clave del SDK de .NET de IronOCR?

El SDK de .NET de IronOCR ofrece una API simple, soporte para múltiples idiomas, compatibilidad multiplataforma y características avanzadas para manejar varios formatos de archivo y escaneos de baja calidad.

¿Cómo maneja IronOCR diferentes idiomas?

El SDK de .NET de IronOCR admite múltiples idiomas, permitiendo la extracción y reconocimiento de texto de documentos en varios idiomas sin requerir configuraciones adicionales.

¿Puede IronOCR procesar escaneos de baja calidad?

Sí, IronOCR está diseñado para manejar eficazmente escaneos de baja calidad, empleando algoritmos avanzados para mejorar la precisión del reconocimiento de texto incluso en escenarios desafiantes.

¿Es el SDK de .NET de IronOCR multiplataforma?

El SDK de .NET de IronOCR es multiplataforma, lo que significa que se puede usar en diferentes sistemas operativos, haciéndolo versátil para varios entornos de desarrollo.

¿Qué formatos de archivo admite IronOCR?

IronOCR admite una amplia gama de formatos de archivo, incluyendo imágenes, PDFs y documentos escaneados, proporcionando flexibilidad para tareas de reconocimiento de texto en diferentes medios.

¿Cómo pueden los desarrolladores integrar IronOCR en sus proyectos?

Los desarrolladores pueden integrar fácilmente IronOCR en sus proyectos C# utilizando su API sencilla, que simplifica el proceso de agregar funcionalidad OCR a las aplicaciones.

¿Cuáles son algunos casos de uso para IronOCR?

IronOCR se puede usar en sistemas de gestión de documentos, entrada de datos automatizada, digitalización de contenido y cualquier aplicación que requiera extracción de texto de imágenes o PDFs.

Kannaopat Udonpant
Ingeniero de Software
Antes de convertirse en Ingeniero de Software, Kannapat completó un doctorado en Recursos Ambientales de la Universidad de Hokkaido en Japón. Mientras perseguía su grado, Kannapat también se convirtió en miembro del Laboratorio de Robótica de Vehículos, que es parte del Departamento de Ingeniería ...
Leer más