Test in a live environment
Test in production without watermarks.
Works wherever you need it to.
Leveraging libraries such as IronOCR and Tesseract grants developers access to advanced algorithms and machine learning techniques for extracting textual information from images and scanned documents. This tutorial will show readers how to use the Tesseract library to perform text extraction from images, and will then conclude by introducing IronOCR's unique approach.
Using the NuGet Package Manager Console, enter the following command.
Install-Package Tesseract
Or download the package via the NuGet Package Manager.
Install Tesseract
package in the NuGet Package Manager
You must manually install and save the language files in the project folder after installing the NuGet Package. This can be considered a shortcoming of this specific library.
Visit the following website to download the language files. Once downloaded, unzip the files, and add the "tessdata" folder to your project's debug folder.
OCR on a given image can be performed using the source code below:
using Tesseract;
var ocrEngine = new TesseractEngine(@"tessdata", "eng", EngineMode.Default);
var img = Pix.LoadFromFile("Demo.png");
var res = ocrEngine.Process(img);
Console.WriteLine(res.GetText());
Console.ReadKey();
using Tesseract;
var ocrEngine = new TesseractEngine(@"tessdata", "eng", EngineMode.Default);
var img = Pix.LoadFromFile("Demo.png");
var res = ocrEngine.Process(img);
Console.WriteLine(res.GetText());
Console.ReadKey();
Imports Tesseract
Private ocrEngine = New TesseractEngine("tessdata", "eng", EngineMode.Default)
Private img = Pix.LoadFromFile("Demo.png")
Private res = ocrEngine.Process(img)
Console.WriteLine(res.GetText())
Console.ReadKey()
First, a TerreractEngine
object must be created and load the language data into the Engine. Then, the desired image file is loaded with the help of Tesseract Pix. Then this image is passed into the TerreractEngine
to extract the correct recognized text by using the GetText
method available in the TesseractEngine
. This is the output from the code.
Extracted text from the image
To learn more about Tesseract in C#, please visit the Tesseract tutorial.
Enter the next command into the NuGet Package Manager Console.
Install-Package IronOcr
Or install the IronOCR library via the NuGet Package Manager, along with additional packages for other languages, which are simple and convenient to use.
Install IronOcr and languages packages via NuGet Package Manager
Below is the sample code to recognize the text from the given image.
var ocr = new IronTesseract();
ocr.Language = OcrLanguage.EnglishBest;
ocr.Configuration.TesseractVersion = TesseractVersion.Tesseract5;
using (var input = new OcrInput())
{
input.LoadImage(@"Demo.png");
var result = ocr.Read(input);
Console.WriteLine(result.Text);
Console.ReadKey();
}
var ocr = new IronTesseract();
ocr.Language = OcrLanguage.EnglishBest;
ocr.Configuration.TesseractVersion = TesseractVersion.Tesseract5;
using (var input = new OcrInput())
{
input.LoadImage(@"Demo.png");
var result = ocr.Read(input);
Console.WriteLine(result.Text);
Console.ReadKey();
}
Dim ocr = New IronTesseract()
ocr.Language = OcrLanguage.EnglishBest
ocr.Configuration.TesseractVersion = TesseractVersion.Tesseract5
Using input = New OcrInput()
input.LoadImage("Demo.png")
Dim result = ocr.Read(input)
Console.WriteLine(result.Text)
Console.ReadKey()
End Using
The code above instantiates a IronTesseract
object. Additionally, a OcrInput
object is being created to add one or more image files, proving the local file path with LoadImage
method. You are free to upload as many pictures as you want. The functionality Read
in the Object IronTesseract
will parse the image file and extract the result into the OCR result.
Extracted text output using IronOCR library
For a detailed IronOCR tutorial, refer to this article to read text from an image in C#.
9 .NET API products for your office documents