Test in production without watermarks.
Works wherever you need it to.
Get 30 days of fully functional product.
Have it up and running in minutes.
Full access to our support engineering team during your product trial
Leveraging libraries such as IronOCR and Tesseract grants developers access to advanced algorithms and machine learning techniques for extracting textual information from images and scanned documents. This tutorial will show readers how to use the Tesseract library to perform text extraction from images, and will then conclude by introducing IronOCR's unique approach.
Using the NuGet Package Manager Console, enter the following command:
Install-Package Tesseract
Or download the package via the NuGet Package Manager.
Install
Tesseract
package in the NuGet Package Manager
You must manually install and save the language files in the project folder after installing the NuGet Package. This can be considered a shortcoming of this specific library.
Visit the following website to download the language files. Once downloaded, unzip the files, and add the "tessdata" folder to your project's debug folder.
OCR on a given image can be performed using the source code below:
using Tesseract;
class Program
{
static void Main()
{
// Initialize Tesseract engine with English language data
using var ocrEngine = new TesseractEngine(@"tessdata", "eng", EngineMode.Default);
// Load the image to be processed
using var img = Pix.LoadFromFile("Demo.png");
// Process the image to extract text
using var res = ocrEngine.Process(img);
// Output the recognized text
Console.WriteLine(res.GetText());
Console.ReadKey();
}
}
using Tesseract;
class Program
{
static void Main()
{
// Initialize Tesseract engine with English language data
using var ocrEngine = new TesseractEngine(@"tessdata", "eng", EngineMode.Default);
// Load the image to be processed
using var img = Pix.LoadFromFile("Demo.png");
// Process the image to extract text
using var res = ocrEngine.Process(img);
// Output the recognized text
Console.WriteLine(res.GetText());
Console.ReadKey();
}
}
Imports Tesseract
Friend Class Program
Shared Sub Main()
' Initialize Tesseract engine with English language data
Dim ocrEngine = New TesseractEngine("tessdata", "eng", EngineMode.Default)
' Load the image to be processed
Dim img = Pix.LoadFromFile("Demo.png")
' Process the image to extract text
Dim res = ocrEngine.Process(img)
' Output the recognized text
Console.WriteLine(res.GetText())
Console.ReadKey()
End Sub
End Class
TesseractEngine
object must be created, loading the language data into the engine.Pix.LoadFromFile
.TesseractEngine
to extract text using the Process
method.GetText
method and printed to the console. Extracted text from the image
To learn more about Tesseract in C#, please visit the Tesseract tutorial.
Enter the next command into the NuGet Package Manager Console:
Install-Package IronOcr
Or install the IronOCR library via the NuGet Package Manager, along with additional packages for other languages, which are simple and convenient to use.
Install IronOcr and languages packages via NuGet Package Manager
Below is a sample code to recognize the text from the given image:
using IronOcr;
class Program
{
static void Main()
{
// Create an IronTesseract instance with predefined settings
var ocr = new IronTesseract()
{
Language = OcrLanguage.EnglishBest,
Configuration = { TesseractVersion = TesseractVersion.Tesseract5 }
};
// Create an OcrInput instance for image processing
using var input = new OcrInput();
// Load the image to be processed
input.AddImage("Demo.png");
// Process the image and extract text
var result = ocr.Read(input);
// Output the recognized text
Console.WriteLine(result.Text);
Console.ReadKey();
}
}
using IronOcr;
class Program
{
static void Main()
{
// Create an IronTesseract instance with predefined settings
var ocr = new IronTesseract()
{
Language = OcrLanguage.EnglishBest,
Configuration = { TesseractVersion = TesseractVersion.Tesseract5 }
};
// Create an OcrInput instance for image processing
using var input = new OcrInput();
// Load the image to be processed
input.AddImage("Demo.png");
// Process the image and extract text
var result = ocr.Read(input);
// Output the recognized text
Console.WriteLine(result.Text);
Console.ReadKey();
}
}
Imports IronOcr
Friend Class Program
Shared Sub Main()
' Create an IronTesseract instance with predefined settings
Dim ocr = New IronTesseract() With {
.Language = OcrLanguage.EnglishBest,
.Configuration = { TesseractVersion = TesseractVersion.Tesseract5 }
}
' Create an OcrInput instance for image processing
Dim input = New OcrInput()
' Load the image to be processed
input.AddImage("Demo.png")
' Process the image and extract text
Dim result = ocr.Read(input)
' Output the recognized text
Console.WriteLine(result.Text)
Console.ReadKey()
End Sub
End Class
IronTesseract
object, setting up the language and Tesseract version.OcrInput
object is then created to load image files using the AddImage
method.Read
method of IronTesseract
processes the image and extracts the text, which is then printed to the console. Extracted text output using IronOCR library
For a detailed IronOCR tutorial, refer to this article to read text from an image in C#.