.NET OCR SDK: A Text Recognition Library for C#
A .NET OCR SDK is a software development kit that lets C# and .NET applications extract text from images, scanned PDFs, and other document formats programmatically. IronOCR is a production-ready .NET OCR SDK that wraps a tuned Tesseract 5 engine with preprocessing filters, barcode reading, searchable PDF output, and support for 125+ languages -- all accessible through a clean C# API that works on Windows, Linux, macOS, and cloud platforms.
What Makes IronOCR the Right .NET OCR SDK for Your Project?
Building text recognition from scratch means managing image preprocessing pipelines, language data files, threading models, and output parsing -- months of work before you extract your first word. IronOCR eliminates that overhead by shipping a battle-tested engine that your team can drop into a project in minutes.
Key capabilities that set it apart from raw Tesseract bindings:
- Recognition of 125+ languages and scripts including handwritten text
- Built-in filters: noise removal, deskewing, binarization, resolution enhancement, and contrast correction
- Barcode and QR code detection within the same read pass
- Searchable PDF generation with invisible text layers for archiving workflows
- Async and parallel batch processing for high-throughput pipelines
- Zonal OCR for targeting specific page regions to cut processing time
- Cross-platform support on Windows, Linux, macOS, Docker, and Azure
According to the Tesseract OCR project documentation, raw Tesseract requires manual configuration for language packs, DPI settings, and output modes. IronOCR handles all of this automatically, letting you focus on what the extracted text means rather than how to extract it.
How Does IronOCR Compare to Raw Tesseract?
Raw Tesseract via a P/Invoke wrapper or the Tesseract NuGet package leaves you responsible for: downloading and placing tessdata language files, selecting the correct page segmentation mode, handling multi-page TIFF and PDF splitting yourself, and wiring up threading if you want parallel processing. None of those details are unique to your business problem.
IronOCR wraps all of that plumbing. You get a typed API surface, automatic tessdata management, built-in PDF split-and-recombine, and a thread-safe engine that you can reuse across requests. The tradeoff is a paid license for production use -- the licensing page shows current pricing tiers including a free development license.
For teams that need open-source-only dependencies, raw Tesseract plus custom preprocessing is a viable path. For teams that need to ship reliable OCR quickly, IronOCR reduces the integration surface to a few lines of C#.
How Do You Install the IronOCR .NET SDK?
Installation comes through NuGet, the standard .NET package manager. Run the following command in your project directory:
Install-Package IronOcr
For Visual Studio users, search for IronOcr in the NuGet Package Manager GUI and install from there. For full installation options including manual DLL references, see the IronOCR installation documentation.
After installation, add the license key to your application startup or appsettings.json. You can start a free trial to get a trial key that unlocks all features during evaluation.
Verifying the Installation
A quick sanity check after installation confirms everything is wired up correctly. Create a console application targeting .NET 10:
using IronOcr;
// Minimal smoke test -- reads a single image and prints extracted text
var ocr = new IronTesseract();
using var input = new OcrInput();
input.LoadImage("sample.png");
var result = ocr.Read(input);
Console.WriteLine(result.Text);
using IronOcr;
// Minimal smoke test -- reads a single image and prints extracted text
var ocr = new IronTesseract();
using var input = new OcrInput();
input.LoadImage("sample.png");
var result = ocr.Read(input);
Console.WriteLine(result.Text);
Imports IronOcr
' Minimal smoke test -- reads a single image and prints extracted text
Dim ocr As New IronTesseract()
Using input As New OcrInput()
input.LoadImage("sample.png")
Dim result = ocr.Read(input)
Console.WriteLine(result.Text)
End Using
If text appears in the console, the SDK is installed and the license key is valid. You are ready to build production workflows.
How Do You Extract Text From Images and PDFs in C#?
The core extraction pattern is consistent across all input types. You create an IronTesseract instance, load content into an OcrInput object, and call Read(). IronOCR auto-detects file format from the extension, so the same code path handles JPEG, PNG, TIFF, BMP, and multi-page PDFs.
using IronOcr;
// Reusable OCR service encapsulating the IronTesseract engine
public class OcrService
{
private readonly IronTesseract _ocr = new IronTesseract();
public string ExtractText(string filePath)
{
using var input = new OcrInput();
// LoadPdf for PDF files; LoadImage for raster formats
if (filePath.EndsWith(".pdf", StringComparison.OrdinalIgnoreCase))
input.LoadPdf(filePath);
else
input.LoadImage(filePath);
return _ocr.Read(input).Text;
}
public async Task<string> ExtractTextAsync(string filePath)
{
using var input = new OcrInput();
if (filePath.EndsWith(".pdf", StringComparison.OrdinalIgnoreCase))
input.LoadPdf(filePath);
else
input.LoadImage(filePath);
var result = await _ocr.ReadAsync(input);
return result.Text;
}
}
using IronOcr;
// Reusable OCR service encapsulating the IronTesseract engine
public class OcrService
{
private readonly IronTesseract _ocr = new IronTesseract();
public string ExtractText(string filePath)
{
using var input = new OcrInput();
// LoadPdf for PDF files; LoadImage for raster formats
if (filePath.EndsWith(".pdf", StringComparison.OrdinalIgnoreCase))
input.LoadPdf(filePath);
else
input.LoadImage(filePath);
return _ocr.Read(input).Text;
}
public async Task<string> ExtractTextAsync(string filePath)
{
using var input = new OcrInput();
if (filePath.EndsWith(".pdf", StringComparison.OrdinalIgnoreCase))
input.LoadPdf(filePath);
else
input.LoadImage(filePath);
var result = await _ocr.ReadAsync(input);
return result.Text;
}
}
Imports IronOcr
' Reusable OCR service encapsulating the IronTesseract engine
Public Class OcrService
Private ReadOnly _ocr As New IronTesseract()
Public Function ExtractText(filePath As String) As String
Using input As New OcrInput()
' LoadPdf for PDF files; LoadImage for raster formats
If filePath.EndsWith(".pdf", StringComparison.OrdinalIgnoreCase) Then
input.LoadPdf(filePath)
Else
input.LoadImage(filePath)
End If
Return _ocr.Read(input).Text
End Using
End Function
Public Async Function ExtractTextAsync(filePath As String) As Task(Of String)
Using input As New OcrInput()
If filePath.EndsWith(".pdf", StringComparison.OrdinalIgnoreCase) Then
input.LoadPdf(filePath)
Else
input.LoadImage(filePath)
End If
Dim result = Await _ocr.ReadAsync(input)
Return result.Text
End Using
End Function
End Class
Top-level entry point to exercise the service:
using IronOcr;
var service = new OcrService();
string text = await service.ExtractTextAsync("invoice.pdf");
Console.WriteLine(text);
using IronOcr;
var service = new OcrService();
string text = await service.ExtractTextAsync("invoice.pdf");
Console.WriteLine(text);
Imports IronOcr
Dim service = New OcrService()
Dim text As String = Await service.ExtractTextAsync("invoice.pdf")
Console.WriteLine(text)
The IronTesseract instance is thread-safe and designed for reuse. Create it once at application startup (via dependency injection in ASP.NET Core, for example) rather than instantiating it per request.
For multi-page PDFs, result.Pages gives you per-page access to the text, confidence score, and bounding boxes. See the multi-page PDF OCR guide for details on page-by-page iteration.
How Do You Improve OCR Accuracy With Preprocessing Filters?
Raw scans from flatbed scanners, smartphone cameras, or fax machines frequently suffer from noise, rotation, low contrast, and insufficient resolution. IronOCR's image quality correction pipeline addresses each issue with targeted filters you chain before the read call.
using IronOcr;
public class AccuracyOptimizedOcr
{
private readonly IronTesseract _ocr = new IronTesseract();
public string ProcessLowQualityDocument(string filePath)
{
using var input = new OcrInput();
if (filePath.EndsWith(".pdf", StringComparison.OrdinalIgnoreCase))
input.LoadPdf(filePath);
else
input.LoadImage(filePath);
// Chain preprocessing filters in order of operation
input.DeNoise(); // Remove scan artifacts and speckling
input.Deskew(); // Correct page tilt up to 35 degrees
input.Scale(150); // Enlarge small text for better recognition
input.Binarize(); // Convert to black/white for cleaner edges
input.EnhanceResolution(300); // Sharpen blurry or low-DPI input
var result = _ocr.Read(input);
// Confidence below 70 often signals a preprocessing mismatch
if (result.Confidence < 70)
Console.WriteLine($"Warning: low confidence ({result.Confidence:F1}%)");
return result.Text;
}
}
using IronOcr;
public class AccuracyOptimizedOcr
{
private readonly IronTesseract _ocr = new IronTesseract();
public string ProcessLowQualityDocument(string filePath)
{
using var input = new OcrInput();
if (filePath.EndsWith(".pdf", StringComparison.OrdinalIgnoreCase))
input.LoadPdf(filePath);
else
input.LoadImage(filePath);
// Chain preprocessing filters in order of operation
input.DeNoise(); // Remove scan artifacts and speckling
input.Deskew(); // Correct page tilt up to 35 degrees
input.Scale(150); // Enlarge small text for better recognition
input.Binarize(); // Convert to black/white for cleaner edges
input.EnhanceResolution(300); // Sharpen blurry or low-DPI input
var result = _ocr.Read(input);
// Confidence below 70 often signals a preprocessing mismatch
if (result.Confidence < 70)
Console.WriteLine($"Warning: low confidence ({result.Confidence:F1}%)");
return result.Text;
}
}
Imports IronOcr
Public Class AccuracyOptimizedOcr
Private ReadOnly _ocr As New IronTesseract()
Public Function ProcessLowQualityDocument(filePath As String) As String
Using input As New OcrInput()
If filePath.EndsWith(".pdf", StringComparison.OrdinalIgnoreCase) Then
input.LoadPdf(filePath)
Else
input.LoadImage(filePath)
End If
' Chain preprocessing filters in order of operation
input.DeNoise() ' Remove scan artifacts and speckling
input.Deskew() ' Correct page tilt up to 35 degrees
input.Scale(150) ' Enlarge small text for better recognition
input.Binarize() ' Convert to black/white for cleaner edges
input.EnhanceResolution(300) ' Sharpen blurry or low-DPI input
Dim result = _ocr.Read(input)
' Confidence below 70 often signals a preprocessing mismatch
If result.Confidence < 70 Then
Console.WriteLine($"Warning: low confidence ({result.Confidence:F1}%)")
End If
Return result.Text
End Using
End Function
End Class
Filter selection guidance:
DeNoise()-- use for scans with heavy speckling or compression artifactsDeskew()-- use when documents are photographed at an angle; see page rotation detection for auto-detectionScale()-- use for small print or sub-150 DPI input; values of 150-200 typically yield the best resultsBinarize()-- use for colored or gradient backgrounds; converts image to strict black/whiteEnhanceResolution()-- use for blurry or low-contrast text; targets 300 DPI as the Tesseract sweet spot
Research published in the International Journal on Document Analysis and Recognition consistently shows that binarization and deskewing are the two highest-impact preprocessing steps for improving character recognition rates. Apply both as a baseline for any production pipeline.
| Filter | Problem Solved | When to Apply |
|---|---|---|
DeNoise() |
Scanner artifacts, speckle noise | Any flatbed or fax scan |
Deskew() |
Page tilt and rotation | Photographed or misaligned documents |
Scale() |
Small text or low DPI | Input below 150 DPI |
Binarize() |
Color backgrounds, gradients | Colored paper or watermarked forms |
EnhanceResolution() |
Blur and low contrast | Camera captures and compressed JPEGs |
How Do You Build a Production Batch Processing Pipeline?
Single-document extraction is straightforward, but production scenarios involve hundreds or thousands of files arriving in queues, shared folders, or cloud storage. IronOCR's async API and thread-safe engine make it suitable for parallel workloads.
using IronOcr;
using Microsoft.Extensions.Logging;
public class ProductionOcrService
{
private readonly IronTesseract _ocr;
private readonly ILogger<ProductionOcrService> _logger;
public ProductionOcrService(ILogger<ProductionOcrService> logger)
{
_logger = logger;
_ocr = new IronTesseract
{
Configuration =
{
RenderSearchablePdfsAndHocr = true,
ReadBarCodes = true
}
};
}
public async Task<IReadOnlyList<string>> ProcessBatchAsync(
IEnumerable<string> filePaths,
int maxDegreeOfParallelism = 4)
{
var results = new System.Collections.Concurrent.ConcurrentBag<string>();
var options = new ParallelOptions
{
MaxDegreeOfParallelism = maxDegreeOfParallelism
};
await Parallel.ForEachAsync(filePaths, options, async (filePath, ct) =>
{
try
{
using var input = new OcrInput();
if (filePath.EndsWith(".pdf", StringComparison.OrdinalIgnoreCase))
input.LoadPdf(filePath);
else
input.LoadImage(filePath);
var result = await _ocr.ReadAsync(input);
results.Add(result.Text);
_logger.LogInformation("Processed {FilePath} at {Confidence:F1}% confidence",
filePath, result.Confidence);
}
catch (Exception ex)
{
_logger.LogError(ex, "OCR failed for {FilePath}", filePath);
results.Add(string.Empty);
}
});
return results.ToList();
}
public void CreateSearchablePdf(string inputPath, string outputPath)
{
using var input = new OcrInput();
if (inputPath.EndsWith(".pdf", StringComparison.OrdinalIgnoreCase))
input.LoadPdf(inputPath);
else
input.LoadImage(inputPath);
_ocr.Read(input).SaveAsSearchablePdf(outputPath);
_logger.LogInformation("Searchable PDF written to {OutputPath}", outputPath);
}
}
using IronOcr;
using Microsoft.Extensions.Logging;
public class ProductionOcrService
{
private readonly IronTesseract _ocr;
private readonly ILogger<ProductionOcrService> _logger;
public ProductionOcrService(ILogger<ProductionOcrService> logger)
{
_logger = logger;
_ocr = new IronTesseract
{
Configuration =
{
RenderSearchablePdfsAndHocr = true,
ReadBarCodes = true
}
};
}
public async Task<IReadOnlyList<string>> ProcessBatchAsync(
IEnumerable<string> filePaths,
int maxDegreeOfParallelism = 4)
{
var results = new System.Collections.Concurrent.ConcurrentBag<string>();
var options = new ParallelOptions
{
MaxDegreeOfParallelism = maxDegreeOfParallelism
};
await Parallel.ForEachAsync(filePaths, options, async (filePath, ct) =>
{
try
{
using var input = new OcrInput();
if (filePath.EndsWith(".pdf", StringComparison.OrdinalIgnoreCase))
input.LoadPdf(filePath);
else
input.LoadImage(filePath);
var result = await _ocr.ReadAsync(input);
results.Add(result.Text);
_logger.LogInformation("Processed {FilePath} at {Confidence:F1}% confidence",
filePath, result.Confidence);
}
catch (Exception ex)
{
_logger.LogError(ex, "OCR failed for {FilePath}", filePath);
results.Add(string.Empty);
}
});
return results.ToList();
}
public void CreateSearchablePdf(string inputPath, string outputPath)
{
using var input = new OcrInput();
if (inputPath.EndsWith(".pdf", StringComparison.OrdinalIgnoreCase))
input.LoadPdf(inputPath);
else
input.LoadImage(inputPath);
_ocr.Read(input).SaveAsSearchablePdf(outputPath);
_logger.LogInformation("Searchable PDF written to {OutputPath}", outputPath);
}
}
Imports IronOcr
Imports Microsoft.Extensions.Logging
Imports System.Collections.Concurrent
Imports System.Threading.Tasks
Public Class ProductionOcrService
Private ReadOnly _ocr As IronTesseract
Private ReadOnly _logger As ILogger(Of ProductionOcrService)
Public Sub New(logger As ILogger(Of ProductionOcrService))
_logger = logger
_ocr = New IronTesseract With {
.Configuration = New TesseractConfiguration With {
.RenderSearchablePdfsAndHocr = True,
.ReadBarCodes = True
}
}
End Sub
Public Async Function ProcessBatchAsync(filePaths As IEnumerable(Of String), Optional maxDegreeOfParallelism As Integer = 4) As Task(Of IReadOnlyList(Of String))
Dim results = New ConcurrentBag(Of String)()
Dim options = New ParallelOptions With {
.MaxDegreeOfParallelism = maxDegreeOfParallelism
}
Await Parallel.ForEachAsync(filePaths, options, Async Function(filePath, ct)
Try
Using input As New OcrInput()
If filePath.EndsWith(".pdf", StringComparison.OrdinalIgnoreCase) Then
input.LoadPdf(filePath)
Else
input.LoadImage(filePath)
End If
Dim result = Await _ocr.ReadAsync(input)
results.Add(result.Text)
_logger.LogInformation("Processed {FilePath} at {Confidence:F1}% confidence", filePath, result.Confidence)
End Using
Catch ex As Exception
_logger.LogError(ex, "OCR failed for {FilePath}", filePath)
results.Add(String.Empty)
End Try
End Function)
Return results.ToList()
End Function
Public Sub CreateSearchablePdf(inputPath As String, outputPath As String)
Using input As New OcrInput()
If inputPath.EndsWith(".pdf", StringComparison.OrdinalIgnoreCase) Then
input.LoadPdf(inputPath)
Else
input.LoadImage(inputPath)
End If
_ocr.Read(input).SaveAsSearchablePdf(outputPath)
_logger.LogInformation("Searchable PDF written to {OutputPath}", outputPath)
End Using
End Sub
End Class
The MaxDegreeOfParallelism cap prevents memory exhaustion when files are large. A value of 4 works well on a four-core server; increase it only after profiling memory usage. For Azure Functions or AWS Lambda deployments, set concurrency to 1 per function instance and scale horizontally instead.
CreateSearchablePdf generates a PDF where the original image is preserved as a visible layer and recognized text is embedded invisibly beneath it. This allows full-text search in PDF viewers and indexing by search engines -- a common requirement in document management systems.
Monitoring Confidence Scores in Production
Every OcrResult exposes a Confidence property (0-100) that reflects how certain the engine is about the recognized text. Tracking this metric in your logging infrastructure gives you an early warning signal when document quality degrades -- for example, if a scanner's calibration drifts or a new document supplier sends lower-DPI scans than expected.
A practical threshold strategy: log a warning at confidence below 80, trigger a preprocessing-retry pass at below 70, and flag documents for human review at below 60. This tiered approach catches quality issues before they produce silent data corruption in downstream systems.
The Microsoft .NET logging documentation covers the ILogger patterns used in the batch service above for teams integrating with ASP.NET Core's built-in DI container.
How Do You Extract Structured Data From Scanned Documents?
Text extraction is the first step. The second step is parsing that text into typed fields your application can act on. This pattern combines IronOCR's read pass with .NET's Regex to pull structured data from invoices, forms, and reports.
using IronOcr;
using System.Text.RegularExpressions;
public record Invoice(
string? InvoiceNumber,
DateOnly? Date,
decimal? TotalAmount,
string RawText
);
public class InvoiceOcrService
{
private readonly IronTesseract _ocr = new IronTesseract();
public Invoice ExtractInvoiceData(string invoicePath)
{
using var input = new OcrInput();
if (invoicePath.EndsWith(".pdf", StringComparison.OrdinalIgnoreCase))
input.LoadPdf(invoicePath);
else
input.LoadImage(invoicePath);
input.DeNoise();
input.Deskew();
var result = _ocr.Read(input);
string text = result.Text;
return new Invoice(
InvoiceNumber: ExtractInvoiceNumber(text),
Date: ExtractDate(text),
TotalAmount: ExtractAmount(text),
RawText: text
);
}
private static string? ExtractInvoiceNumber(string text)
{
var match = Regex.Match(text, @"Invoice\s*#?:?\s*(\S+)", RegexOptions.IgnoreCase);
return match.Success ? match.Groups[1].Value : null;
}
private static DateOnly? ExtractDate(string text)
{
// Numeric format: MM/DD/YYYY
var numeric = Regex.Match(text, @"\b(\d{1,2}/\d{1,2}/\d{2,4})\b");
if (numeric.Success && DateTime.TryParse(numeric.Groups[1].Value, out var d1))
return DateOnly.FromDateTime(d1);
// Written format: January 15, 2025
var written = Regex.Match(text,
@"\b(January|February|March|April|May|June|July|August|September|October|November|December)\s+(\d{1,2}),?\s+(\d{4})\b",
RegexOptions.IgnoreCase);
if (written.Success && DateTime.TryParse(written.Value, out var d2))
return DateOnly.FromDateTime(d2);
return null;
}
private static decimal? ExtractAmount(string text)
{
var match = Regex.Match(text, @"\$\s*(\d+(?:\.\d{2})?)");
return match.Success && decimal.TryParse(match.Groups[1].Value, out var amt)
? amt
: null;
}
}
using IronOcr;
using System.Text.RegularExpressions;
public record Invoice(
string? InvoiceNumber,
DateOnly? Date,
decimal? TotalAmount,
string RawText
);
public class InvoiceOcrService
{
private readonly IronTesseract _ocr = new IronTesseract();
public Invoice ExtractInvoiceData(string invoicePath)
{
using var input = new OcrInput();
if (invoicePath.EndsWith(".pdf", StringComparison.OrdinalIgnoreCase))
input.LoadPdf(invoicePath);
else
input.LoadImage(invoicePath);
input.DeNoise();
input.Deskew();
var result = _ocr.Read(input);
string text = result.Text;
return new Invoice(
InvoiceNumber: ExtractInvoiceNumber(text),
Date: ExtractDate(text),
TotalAmount: ExtractAmount(text),
RawText: text
);
}
private static string? ExtractInvoiceNumber(string text)
{
var match = Regex.Match(text, @"Invoice\s*#?:?\s*(\S+)", RegexOptions.IgnoreCase);
return match.Success ? match.Groups[1].Value : null;
}
private static DateOnly? ExtractDate(string text)
{
// Numeric format: MM/DD/YYYY
var numeric = Regex.Match(text, @"\b(\d{1,2}/\d{1,2}/\d{2,4})\b");
if (numeric.Success && DateTime.TryParse(numeric.Groups[1].Value, out var d1))
return DateOnly.FromDateTime(d1);
// Written format: January 15, 2025
var written = Regex.Match(text,
@"\b(January|February|March|April|May|June|July|August|September|October|November|December)\s+(\d{1,2}),?\s+(\d{4})\b",
RegexOptions.IgnoreCase);
if (written.Success && DateTime.TryParse(written.Value, out var d2))
return DateOnly.FromDateTime(d2);
return null;
}
private static decimal? ExtractAmount(string text)
{
var match = Regex.Match(text, @"\$\s*(\d+(?:\.\d{2})?)");
return match.Success && decimal.TryParse(match.Groups[1].Value, out var amt)
? amt
: null;
}
}
Imports IronOcr
Imports System.Text.RegularExpressions
Public Class Invoice
Public Property InvoiceNumber As String
Public Property Date As DateOnly?
Public Property TotalAmount As Decimal?
Public Property RawText As String
Public Sub New(invoiceNumber As String, [date] As DateOnly?, totalAmount As Decimal?, rawText As String)
Me.InvoiceNumber = invoiceNumber
Me.Date = [date]
Me.TotalAmount = totalAmount
Me.RawText = rawText
End Sub
End Class
Public Class InvoiceOcrService
Private ReadOnly _ocr As New IronTesseract()
Public Function ExtractInvoiceData(invoicePath As String) As Invoice
Using input As New OcrInput()
If invoicePath.EndsWith(".pdf", StringComparison.OrdinalIgnoreCase) Then
input.LoadPdf(invoicePath)
Else
input.LoadImage(invoicePath)
End If
input.DeNoise()
input.Deskew()
Dim result = _ocr.Read(input)
Dim text As String = result.Text
Return New Invoice(
InvoiceNumber:=ExtractInvoiceNumber(text),
[Date]:=ExtractDate(text),
TotalAmount:=ExtractAmount(text),
RawText:=text
)
End Using
End Function
Private Shared Function ExtractInvoiceNumber(text As String) As String
Dim match = Regex.Match(text, "Invoice\s*#?:?\s*(\S+)", RegexOptions.IgnoreCase)
Return If(match.Success, match.Groups(1).Value, Nothing)
End Function
Private Shared Function ExtractDate(text As String) As DateOnly?
' Numeric format: MM/DD/YYYY
Dim numeric = Regex.Match(text, "\b(\d{1,2}/\d{1,2}/\d{2,4})\b")
If numeric.Success AndAlso DateTime.TryParse(numeric.Groups(1).Value, Nothing) Then
Return DateOnly.FromDateTime(DateTime.Parse(numeric.Groups(1).Value))
End If
' Written format: January 15, 2025
Dim written = Regex.Match(text,
"\b(January|February|March|April|May|June|July|August|September|October|November|December)\s+(\d{1,2}),?\s+(\d{4})\b",
RegexOptions.IgnoreCase)
If written.Success AndAlso DateTime.TryParse(written.Value, Nothing) Then
Return DateOnly.FromDateTime(DateTime.Parse(written.Value))
End If
Return Nothing
End Function
Private Shared Function ExtractAmount(text As String) As Decimal?
Dim match = Regex.Match(text, "\$\s*(\d+(?:\.\d{2})?)")
Dim amt As Decimal
Return If(match.Success AndAlso Decimal.TryParse(match.Groups(1).Value, amt), amt, Nothing)
End Function
End Class
This approach pairs well with zonal OCR when you know exactly where each field appears on a form. By supplying a bounding rectangle, you skip full-page recognition and target only the region containing the invoice number or total -- dramatically reducing processing time for fixed-layout documents.
For more advanced extraction scenarios including tables and structured forms, review the IronOCR data extraction examples on the product site.
How Do You Handle Multi-Language OCR in .NET?
Many organizations process documents in more than one language -- import/export forms, international contracts, or multilingual customer submissions. IronOCR handles this by allowing you to configure the language pack before the read call.
using IronOcr;
// Configure multi-language recognition
var ocr = new IronTesseract();
ocr.Language = OcrLanguage.EnglishBest; // Swap for any of 125+ supported languages
// For mixed-language documents, combine language packs
ocr.AddSecondaryLanguage(OcrLanguage.German);
using var input = new OcrInput();
input.LoadPdf("multilingual-contract.pdf");
var result = ocr.Read(input);
Console.WriteLine(result.Text);
using IronOcr;
// Configure multi-language recognition
var ocr = new IronTesseract();
ocr.Language = OcrLanguage.EnglishBest; // Swap for any of 125+ supported languages
// For mixed-language documents, combine language packs
ocr.AddSecondaryLanguage(OcrLanguage.German);
using var input = new OcrInput();
input.LoadPdf("multilingual-contract.pdf");
var result = ocr.Read(input);
Console.WriteLine(result.Text);
Imports IronOcr
' Configure multi-language recognition
Dim ocr As New IronTesseract()
ocr.Language = OcrLanguage.EnglishBest ' Swap for any of 125+ supported languages
' For mixed-language documents, combine language packs
ocr.AddSecondaryLanguage(OcrLanguage.German)
Using input As New OcrInput()
input.LoadPdf("multilingual-contract.pdf")
Dim result = ocr.Read(input)
Console.WriteLine(result.Text)
End Using
The IronOCR language support page lists all 125+ available language packs with download instructions. Language packs ship as NuGet packages (for example, IronOcr.Languages.German) so they integrate with the same package management workflow you already use.
For character sets outside the Latin alphabet -- Arabic, Chinese, Japanese, Korean -- IronOCR provides optimized models that handle right-to-left text direction and ideographic scripts. See the CJK OCR guide for configuration specifics.
What Are Your Next Steps?
You now have the patterns needed to add production-grade OCR to any .NET 10 application: basic text extraction, preprocessing for difficult scans, async batch processing, structured data parsing, and multi-language support.
From here, explore these areas based on your project needs:
- Barcode and QR code reading -- extract machine-readable codes from the same image pass
- HOCR output format -- get word-level bounding boxes for layout-aware downstream processing
- IronOCR licensing options -- royalty-free distribution model with SaaS, OEM, and enterprise tiers
- IronOCR code examples library -- over 30 working examples covering common scenarios
- Azure Functions deployment guide -- serverless OCR on Microsoft cloud infrastructure
Start with the free trial license to evaluate the full feature set on your own documents before committing to a tier.
Frequently Asked Questions
What is the .NET OCR SDK?
The .NET OCR SDK by IronOCR is a library designed to integrate optical character recognition capabilities into C# applications, allowing developers to extract text from images, PDFs, and scanned documents.
What are the key features of IronOCR's .NET SDK?
IronOCR's .NET SDK offers a simple API, support for multiple languages, cross-platform compatibility, and advanced features for handling various file formats and low-quality scans.
How does IronOCR handle different languages?
IronOCR's .NET SDK supports multiple languages, enabling text extraction and recognition from documents in various languages without requiring additional configurations.
Can IronOCR process low-quality scans?
Yes, IronOCR is designed to effectively handle low-quality scans, employing advanced algorithms to enhance text recognition accuracy even in challenging scenarios.
Is IronOCR's .NET SDK cross-platform?
IronOCR's .NET SDK is cross-platform, meaning it can be used on different operating systems, making it versatile for various development environments.
What file formats does IronOCR support?
IronOCR supports a wide range of file formats including images, PDFs, and scanned documents, providing flexibility for text recognition tasks across different media.
How can developers integrate IronOCR into their projects?
Developers can integrate IronOCR into their C# projects using its typed API, which simplifies the process of adding OCR functionality to applications.
What are some use cases for IronOCR?
IronOCR can be used in document management systems, automated data entry, content digitization, and any application requiring text extraction from images or PDFs.




