Test in production without watermarks.
Works wherever you need it to.
Get 30 days of fully functional product.
Have it up and running in minutes.
Full access to our support engineering team during your product trial
In this tutorial, we explore the process of extracting text from images using Iron OCR, a powerful library for C#. The session begins with setting up a C# console application in Visual Studio and installing the Iron OCR library via the NuGet Package Manager.
Once the library is imported, an IronTesseract
object is initialized, and its configuration options are fine-tuned to enable barcode reading and set the language to English. This setup allows for accurate text recognition and enhanced performance through multi-threading. Additional features include rendering PDFs and setting page segmentation mode to Auto OSD, which automatically segments and divides lines with words.
The tutorial further explains how to use configuration variables for behavior fine-tuning, such as enabling parallelization for smooth execution and recognizing table layouts. Text inversion is disabled to improve results. The tutorial provides a link for more configuration options.
Next, an image file is loaded using the OCR input object, and the Iron OCR is used to extract text from the image. The recognized text is output to the console, demonstrating the library's high accuracy.
The tutorial concludes by highlighting Iron OCR as a powerful tool for extracting text from images and PDFs, encouraging viewers to try it with a provided trial link.
using IronOcr;
using System;
namespace IronOCRExample
{
class Program
{
static void Main(string[] args)
{
// Create a new instance of the IronTesseract class
var Ocr = new IronTesseract();
// Configuration settings for better accuracy
Ocr.Configuration.ReadBarCodes = true; // Enable barcode reading
Ocr.Configuration.Language = OcrLanguage.English; // Set language to English
Ocr.Configuration.RenderSearchablePdfsAndHocr = true; // Enable rendering to searchable PDFs
Ocr.Configuration.PageSegmentationMode = TesseractPageSegmentationMode.AutoOsd; // Auto segment lines and words
Ocr.Configuration.EngineMode = TesseractEngineMode.TesseractAndLstm; // Use a combination of Tesseract and LSTM engines
// Further fine-tuning
Ocr.Configuration.TesseractVariables["user_defined_dpi"] = "300"; // Set dpi to 300 for better resolution
Ocr.Configuration.TesseractVariables["tessedit_parallelize"] = "true"; // Enable parallel processing
Ocr.Configuration.TesseractVariables["textord_tabfind_find_tables"] = "true"; // Table recognition
Ocr.Configuration.TesseractVariables["invert"] = "false"; // Disable inversion
// Load the image into the OCR input
using (var Input = new OcrInput(@"input-image.png"))
{
// Perform OCR
var Result = Ocr.Read(Input);
// Output the recognized text to the console
Console.WriteLine(Result.Text);
}
}
}
}
using IronOcr;
using System;
namespace IronOCRExample
{
class Program
{
static void Main(string[] args)
{
// Create a new instance of the IronTesseract class
var Ocr = new IronTesseract();
// Configuration settings for better accuracy
Ocr.Configuration.ReadBarCodes = true; // Enable barcode reading
Ocr.Configuration.Language = OcrLanguage.English; // Set language to English
Ocr.Configuration.RenderSearchablePdfsAndHocr = true; // Enable rendering to searchable PDFs
Ocr.Configuration.PageSegmentationMode = TesseractPageSegmentationMode.AutoOsd; // Auto segment lines and words
Ocr.Configuration.EngineMode = TesseractEngineMode.TesseractAndLstm; // Use a combination of Tesseract and LSTM engines
// Further fine-tuning
Ocr.Configuration.TesseractVariables["user_defined_dpi"] = "300"; // Set dpi to 300 for better resolution
Ocr.Configuration.TesseractVariables["tessedit_parallelize"] = "true"; // Enable parallel processing
Ocr.Configuration.TesseractVariables["textord_tabfind_find_tables"] = "true"; // Table recognition
Ocr.Configuration.TesseractVariables["invert"] = "false"; // Disable inversion
// Load the image into the OCR input
using (var Input = new OcrInput(@"input-image.png"))
{
// Perform OCR
var Result = Ocr.Read(Input);
// Output the recognized text to the console
Console.WriteLine(Result.Text);
}
}
}
}
Imports IronOcr
Imports System
Namespace IronOCRExample
Friend Class Program
Shared Sub Main(ByVal args() As String)
' Create a new instance of the IronTesseract class
Dim Ocr = New IronTesseract()
' Configuration settings for better accuracy
Ocr.Configuration.ReadBarCodes = True ' Enable barcode reading
Ocr.Configuration.Language = OcrLanguage.English ' Set language to English
Ocr.Configuration.RenderSearchablePdfsAndHocr = True ' Enable rendering to searchable PDFs
Ocr.Configuration.PageSegmentationMode = TesseractPageSegmentationMode.AutoOsd ' Auto segment lines and words
Ocr.Configuration.EngineMode = TesseractEngineMode.TesseractAndLstm ' Use a combination of Tesseract and LSTM engines
' Further fine-tuning
Ocr.Configuration.TesseractVariables("user_defined_dpi") = "300" ' Set dpi to 300 for better resolution
Ocr.Configuration.TesseractVariables("tessedit_parallelize") = "true" ' Enable parallel processing
Ocr.Configuration.TesseractVariables("textord_tabfind_find_tables") = "true" ' Table recognition
Ocr.Configuration.TesseractVariables("invert") = "false" ' Disable inversion
' Load the image into the OCR input
Using Input = New OcrInput("input-image.png")
' Perform OCR
Dim Result = Ocr.Read(Input)
' Output the recognized text to the console
Console.WriteLine(Result.Text)
End Using
End Sub
End Class
End Namespace
The above C# code demonstrates how to set up and use Iron OCR in a console application to read text from an image file. The configuration is optimized for accuracy by enabling barcode reading, setting language preferences, and allowing multi-threading for faster performance.