Tamil OCR in C# and .NET

Other versions of this document:

IronOCR is a C# software component allowing .NET coders to read text from images and PDF documents in 126 languages, including Tamil.

It is an advanced fork of Tesseract, built exclusively for .NET developers and regularly outperforms other Tesseract engines in both speed and accuracy.

Contents of IronOcr.Languages.Tamil

This package contains 102 OCR languages for .NET:

  • Tamil
  • TamilBest
  • TamilFast
  • TamilAlphabet
  • TamilAlphabetBest
  • TamilAlphabetFast

Download

Tamil Language Pack [தமிழ்]

Installation

The first thing we have to do is install our Tamil OCR package into your .NET project.

Install-Package IronOCR.Languages.Tamil

Code Example

This C# code example reads Tamil text from an image or PDF document.

// Ensure IronOCR.Languages.Tamil package is installed
using IronOcr;

var Ocr = new IronTesseract();

// Set the language to Tamil for OCR processing
Ocr.Language = OcrLanguage.Tamil;

using (var Input = new OcrInput(@"images\Tamil.png"))
{
    // Perform OCR on the input image
    var Result = Ocr.Read(Input);

    // Get the recognized text
    var AllText = Result.Text;

    // Display the recognized text (for example purpose)
    Console.WriteLine(AllText);
}
// Ensure IronOCR.Languages.Tamil package is installed
using IronOcr;

var Ocr = new IronTesseract();

// Set the language to Tamil for OCR processing
Ocr.Language = OcrLanguage.Tamil;

using (var Input = new OcrInput(@"images\Tamil.png"))
{
    // Perform OCR on the input image
    var Result = Ocr.Read(Input);

    // Get the recognized text
    var AllText = Result.Text;

    // Display the recognized text (for example purpose)
    Console.WriteLine(AllText);
}
' Ensure IronOCR.Languages.Tamil package is installed
Imports IronOcr

Private Ocr = New IronTesseract()

' Set the language to Tamil for OCR processing
Ocr.Language = OcrLanguage.Tamil

Using Input = New OcrInput("images\Tamil.png")
	' Perform OCR on the input image
	Dim Result = Ocr.Read(Input)

	' Get the recognized text
	Dim AllText = Result.Text

	' Display the recognized text (for example purpose)
	Console.WriteLine(AllText)
End Using
$vbLabelText   $csharpLabel
  • The IronTesseract class is used to initialize and set up the OCR engine.
  • The Ocr.Language property specifies the language pack to use for OCR.
  • The OcrInput class is used with the path to the image file containing Tamil text.
  • The Ocr.Read() method processes the image and extracts the text.
  • Finally, the recognized text is stored in AllText and can be utilized as needed.