Test in a live environment
Test in production without watermarks.
Works wherever you need it to.
The technology of Optical Character Recognition (OCR) allows for the conversion of printed or handwritten text into digital formats readable by machine. When a document is scanned (such as an invoice or receipt), it is saved by your computer as an image file. However, the text within the scanned image cannot be edited, searched, or counted using a regular text editor.
However, OCR can process the image, extract text, and transform it into a text format that can be read by computers. This enables the extraction of text from various sources, including PDF files and other scanned images. Furthermore, OCR capabilities extend beyond simple text extraction to include major image formats and PDF documents, converting them into searchable OCR data.
In C#, developers can leverage the power of OCR through various libraries, and one of which is the powerful library IronOCR from Iron Software. In this tutorial, we'll explore the basics of OCR and demonstrate how to use IronOCR to perform Character Recognition efficiently in C#.
IronOCR a C# library developed by Iron Software is that provides advanced OCR capabilities. It offers accurate text extraction from images, PDFs, and scanned documents. Before we dive into the code, make sure you have IronOCR installed in your project.
IronOCR elevates the capabilities of the widely used Tesseract OCR engine by enhancing both accuracy and speed. It serves as a robust solution for extracting text from various sources, including images, PDFs, and diverse document formats.
With support for over 127 languages, IronOCR is adept at handling multilingual requirements, making it an ideal choice for applications demanding linguistic versatility.
Extracted text can be conveniently outputted as plain text or structured data for seamless integration into further processing pipelines. Additionally, IronOCR facilitates the creation of searchable PDFs directly from image inputs.
Engineered for compatibility with C#, F#, and VB.NET, IronOCR seamlessly operates across various.NET environments including versions 8, 7, 6, Core, Standard, and Framework.
IronOCR harnesses the power of Tesseract 5, finely tailored for optimal performance within the.NET ecosystem.
With IronOCR, users can precisely define specific zones within documents, enabling targeted OCR processing. This feature enhances accuracy and efficiency by focusing processing power where it's needed most.
The library offers a suite of image preprocessing functionalities such as de-skewing and noise reduction. These tools ensure superior results even when dealing with imperfect source images, ultimately enhancing the overall OCR experience.
Now, we will develop a demo application that utilizes IronOCR to read Text from images.
To start with, let us create a new console application using Visual Studio as shown below.
Provide a project name and location below.
Select the required .NET Version for the project.
Click the Create button to create the new project.
IronOCR can be found in the NuGet package manager console as shown below. Use the command provided to install the package.
Using the Visual Studio NuGet Package Manager, search for IronOCR and install to your project folder.
Once installed, the application is ready to make use of IronOCR to read text from images.
IronOCR stands out as the exclusive .NET library offering Tesseract 5 OCR capabilities. At present, it holds the distinction of being the most sophisticated Tesseract 5 library across all programming languages. IronOCR seamlessly integrates Tesseract 5 into various .NET environments, including Framework, Standard, Core, Xamarin, and Mono, ensuring comprehensive support across the ecosystem.
Consider the below image file as input. Now, let's see how to read the text in this image file
using IronOcr;
public class Program
{
public static void Main(String [] args)
{
var ocrTesseract = new IronTesseract();
using var ocrInput = new OcrInput();
ocrInput.LoadImage(@"sample1.png");
var ocrResult = ocrTesseract.Read(ocrInput);
Console.WriteLine(ocrResult.Text);
}
}
using IronOcr;
public class Program
{
public static void Main(String [] args)
{
var ocrTesseract = new IronTesseract();
using var ocrInput = new OcrInput();
ocrInput.LoadImage(@"sample1.png");
var ocrResult = ocrTesseract.Read(ocrInput);
Console.WriteLine(ocrResult.Text);
}
}
Imports IronOcr
Public Class Program
Public Shared Sub Main(ByVal args() As String)
Dim ocrTesseract = New IronTesseract()
Dim ocrInput As New OcrInput()
ocrInput.LoadImage("sample1.png")
Dim ocrResult = ocrTesseract.Read(ocrInput)
Console.WriteLine(ocrResult.Text)
End Sub
End Class
The IronTesseract.Configuration object grants advanced users access to the underlying Tesseract API within C#/.NET, enabling detailed setup configuration for fine-tuning and optimization. Below are some of the advanced configurations possible
You can specify the language for OCR using the Language property. For instance, to set the language to English, use:
IronTesseract ocr = new IronTesseract();
ocr.Language = OcrLanguage.English;
IronTesseract ocr = new IronTesseract();
ocr.Language = OcrLanguage.English;
Dim ocr As New IronTesseract()
ocr.Language = OcrLanguage.English
The PageSegmentationMode determines how Tesseract segments the input image. Options include AutoOsd, SingleBlock, SingleLine, and more. For example:
ocr.Configuration.PageSegmentationMode = TesseractPageSegmentationMode.AutoOsd;
ocr.Configuration.PageSegmentationMode = TesseractPageSegmentationMode.AutoOsd;
ocr.Configuration.PageSegmentationMode = TesseractPageSegmentationMode.AutoOsd
You can fine-tune Tesseract by setting specific variables. For instance, to disable parallelization:
ocr.Configuration.TesseractVariables ["tessedit_parallelize"] = false;
ocr.Configuration.TesseractVariables ["tessedit_parallelize"] = false;
ocr.Configuration.TesseractVariables ("tessedit_parallelize") = False
Use WhiteListCharacters and BlackListCharacters to control which characters Tesseract recognizes. For example:
ocr.Configuration.WhiteListCharacters = "ABCDEFGHIJKLMNOPQRSTUVWXYZ";
ocr.Configuration.BlackListCharacters = "`ë|^";
ocr.Configuration.WhiteListCharacters = "ABCDEFGHIJKLMNOPQRSTUVWXYZ";
ocr.Configuration.BlackListCharacters = "`ë|^";
ocr.Configuration.WhiteListCharacters = "ABCDEFGHIJKLMNOPQRSTUVWXYZ"
ocr.Configuration.BlackListCharacters = "`ë|^"
Explore other Tesseract configuration variables to customize behavior according to your needs. For instance:
ocr.Configuration.TesseractVariables ["classify_num_cp_levels"] = 3;
ocr.Configuration.TesseractVariables ["textord_debug_tabfind"] = 0;
// ... (more variables)
ocr.Configuration.TesseractVariables ["classify_num_cp_levels"] = 3;
ocr.Configuration.TesseractVariables ["textord_debug_tabfind"] = 0;
// ... (more variables)
ocr.Configuration.TesseractVariables ("classify_num_cp_levels") = 3
ocr.Configuration.TesseractVariables ("textord_debug_tabfind") = 0
' ... (more variables)
Now let us try to decode the same image using advanced settings
using IronOcr;
public class Program
{
public static void Main()
{
Console.WriteLine("Decoding using advanced features");
var ocrTesseract = new IronTesseract() // create instance
{
Language = OcrLanguage.EnglishBest, // configure best english language
Configuration = new TesseractConfiguration()
{
ReadBarCodes = false, // read bar codes false
BlackListCharacters = "`ë|^", // black listed characters
WhiteListCharacters = null, // no white list, allow all
PageSegmentationMode = TesseractPageSegmentationMode.AutoOsd,
TesseractVariables = null, // no custom variable used
},
MultiThreaded = false,
};
using var ocrInput = new OcrInput(); // create a disposible ocr input object
ocrInput.AddImage(@"sample1.png"); // load the sample image
var ocrResult = ocrTesseract.Read(ocrInput); // read the text from the image
Console.WriteLine(ocrResult.Text);// output the image
}
}
using IronOcr;
public class Program
{
public static void Main()
{
Console.WriteLine("Decoding using advanced features");
var ocrTesseract = new IronTesseract() // create instance
{
Language = OcrLanguage.EnglishBest, // configure best english language
Configuration = new TesseractConfiguration()
{
ReadBarCodes = false, // read bar codes false
BlackListCharacters = "`ë|^", // black listed characters
WhiteListCharacters = null, // no white list, allow all
PageSegmentationMode = TesseractPageSegmentationMode.AutoOsd,
TesseractVariables = null, // no custom variable used
},
MultiThreaded = false,
};
using var ocrInput = new OcrInput(); // create a disposible ocr input object
ocrInput.AddImage(@"sample1.png"); // load the sample image
var ocrResult = ocrTesseract.Read(ocrInput); // read the text from the image
Console.WriteLine(ocrResult.Text);// output the image
}
}
Imports IronOcr
Public Class Program
Public Shared Sub Main()
Console.WriteLine("Decoding using advanced features")
Dim ocrTesseract = New IronTesseract() With {
.Language = OcrLanguage.EnglishBest,
.Configuration = New TesseractConfiguration() With {
.ReadBarCodes = False,
.BlackListCharacters = "`ë|^",
.WhiteListCharacters = Nothing,
.PageSegmentationMode = TesseractPageSegmentationMode.AutoOsd,
.TesseractVariables = Nothing
},
.MultiThreaded = False
}
Dim ocrInput As New OcrInput() ' create a disposible ocr input object
ocrInput.AddImage("sample1.png") ' load the sample image
Dim ocrResult = ocrTesseract.Read(ocrInput) ' read the text from the image
Console.WriteLine(ocrResult.Text) ' output the image
End Sub
End Class
IronOCR Configuration: An instance of IronTesseract (the main IronOCR class) is created and assigned to the variable ocrTesseract.
Configuration settings are applied to ocrTesseract:
When working with IronOCR, you have access to various image filters that can help preprocess images before performing OCR. These filters optimize the image quality, enhance visibility, and reduce noise or artifacts. They help to improve the performance of the OCR operation.
Rotate:
The Rotate filter allows you to rotate images by a specified number of degrees clockwise. For anti-clockwise rotation, use negative numbers.
Deskew:
The Deskew filter corrects image skew, ensuring that the text is upright and orthogonal. This is particularly useful for OCR because Tesseract performs best with properly oriented scans.
Scale:
The Scale filter proportionally scales OCR input pages.
Binarize:
The Binarize filter converts every pixel to either black or white, with no middle ground. It can improve OCR performance in cases of very low contrast between text and background.
ToGrayScale:
The ToGrayScale filter converts every pixel to a shade of grayscale. While unlikely to significantly improve OCR accuracy, it may enhance speed.
Invert:
The Invert filter reverses colors—white becomes black, and black becomes white.
ReplaceColor:
The ReplaceColor filter replaces a specific color within an image with another color, considering a certain threshold.
Contrast:
The Contrast filter automatically increases contrast. It often improves OCR speed and accuracy in low-contrast scans.
Dilate and Erode:
These advanced morphology filters manipulate object boundaries in an image.
The Sharpen filter sharpens blurred OCR documents and flattens alpha channels to white.
The DeNoise filter removes digital noise. Use it where noise is expected.
This heavy background noise removal filter should be used only when extreme document background noise is known. It may reduce OCR accuracy for clean documents and is CPU-intensive.
The EnhanceResolution filter enhances the resolution of low-quality images. It’s not often needed due to automatic resolution handling.
Here’s an example of how to apply filters using IronOCR in C#:
var ocr = new IronTesseract();
var input = new OcrInput();
input.LoadImage("sample.png");
input.Deskew();
var result = ocr.Read(input);
Console.WriteLine(result.Text);
var ocr = new IronTesseract();
var input = new OcrInput();
input.LoadImage("sample.png");
input.Deskew();
var result = ocr.Read(input);
Console.WriteLine(result.Text);
Dim ocr = New IronTesseract()
Dim input = New OcrInput()
input.LoadImage("sample.png")
input.Deskew()
Dim result = ocr.Read(input)
Console.WriteLine(result.Text)
IronOCR. Provide the below details to get the key delivered to your email ID
Once the key is obtained either by purchase or free trial, follow the below steps to use the key
Setting Your License Key: Set your IronOCR license key using the code. Add the following line to your application startup (before using IronOCR):
IronOcr.License.LicenseKey = "IRONOCR-MYLICENSE-KEY-1EF01";
IronOcr.License.LicenseKey = "IRONOCR-MYLICENSE-KEY-1EF01";
IronOcr.License.LicenseKey = "IRONOCR-MYLICENSE-KEY-1EF01"
Global Application Key (Web.Config or App.Config): To apply a key globally across your application, use the configuration file (Web.Config or App.Config). Add the following key to your appSettings:
<configuration>
<!-- Other settings -->
<appSettings>
<add key="IronOcr.LicenseKey" value="IRONOCR-MYLICENSE-KEY-1EF01"/>
</appSettings>
</configuration>
<configuration>
<!-- Other settings -->
<appSettings>
<add key="IronOcr.LicenseKey" value="IRONOCR-MYLICENSE-KEY-1EF01"/>
</appSettings>
</configuration>
'INSTANT VB TODO TASK: The following line uses invalid syntax:
'<configuration> <!-- Other settings -- > <appSettings> <add key="IronOcr.LicenseKey" value="IRONOCR-MYLICENSE-KEY-1EF01"/> </appSettings> </configuration>
Using .NET Core appsettings.json: For .NET Core applications, create an appsettings.json file in your project’s root directory. Replace the "IronOcr.LicenseKey" key with your license value:
{
"IronOcr.LicenseKey": "IRONOCR-MYLICENSE-KEY-1EF01"
}
{
"IronOcr.LicenseKey": "IRONOCR-MYLICENSE-KEY-1EF01"
}
If True Then
"IronOcr.LicenseKey": "IRONOCR-MYLICENSE-KEY-1EF01"
End If
Testing Your License Key: Verify that your key has been installed correctly by testing it:
bool result = IronOcr.License.IsValidLicense("IRONOCR-MYLICENSE-KEY-1EF01");
bool result = IronOcr.License.IsValidLicense("IRONOCR-MYLICENSE-KEY-1EF01");
Dim result As Boolean = IronOcr.License.IsValidLicense("IRONOCR-MYLICENSE-KEY-1EF01")
In conclusion, IronOCR, which starts at $749. Embrace the power of OCR with IronOCR and unlock a world of possibilities in your C# projects.
9 .NET API products for your office documents