Swahili OCR in C# and .NET

Other versions of this document:

IronOCR is a C# software component allowing .NET coders to read text from images and PDF documents in 126 languages, including Swahili. It is an advanced fork of Tesseract, built exclusively for .NET developers, and regularly outperforms other Tesseract engines for both speed and accuracy.

Contents of IronOcr.Languages.Swahili

This package contains 46 OCR languages for .NET:

  • Swahili
  • SwahiliBest
  • SwahiliFast

Download

Swahili Language Pack [Kiswahili]

Installation

The first thing we have to do is install our Swahili OCR package to your .NET project.

Install-Package IronOCR.Languages.Swahili

Code Example

This C# code example reads Swahili text from an Image or PDF document.

using IronOcr;

var Ocr = new IronTesseract();

// Set the OCR language to Swahili
Ocr.Language = OcrLanguage.Swahili;

// Create an OCR input for the image or PDF file
using (var Input = new OcrInput(@"images\Swahili.png"))
{
    // Perform OCR on the input image
    var Result = Ocr.Read(Input);

    // Retrieve the recognized text
    var AllText = Result.Text;

    // Output the recognized text to the console (optional)
    Console.WriteLine(AllText);
}
using IronOcr;

var Ocr = new IronTesseract();

// Set the OCR language to Swahili
Ocr.Language = OcrLanguage.Swahili;

// Create an OCR input for the image or PDF file
using (var Input = new OcrInput(@"images\Swahili.png"))
{
    // Perform OCR on the input image
    var Result = Ocr.Read(Input);

    // Retrieve the recognized text
    var AllText = Result.Text;

    // Output the recognized text to the console (optional)
    Console.WriteLine(AllText);
}
Imports IronOcr

Private Ocr = New IronTesseract()

' Set the OCR language to Swahili
Ocr.Language = OcrLanguage.Swahili

' Create an OCR input for the image or PDF file
Using Input = New OcrInput("images\Swahili.png")
	' Perform OCR on the input image
	Dim Result = Ocr.Read(Input)

	' Retrieve the recognized text
	Dim AllText = Result.Text

	' Output the recognized text to the console (optional)
	Console.WriteLine(AllText)
End Using
$vbLabelText   $csharpLabel

Explanation:

  1. Using IronOcr Namespace: We include the IronOcr namespace, which provides classes and methods for OCR operations.

  2. Initialize OCR Engine: We create an instance of IronTesseract, which is the OCR engine. Setting its language to Swahili allows it to recognize Swahili text.

  3. OCR Input: The OcrInput class is used to specify the file (image or PDF) from which we want to extract text.

  4. OCR Reading: The Read method processes the input and returns an OcrResult object containing the recognized text.

  5. Output: The recognized text is stored in AllText, which can be used as needed. In this example, it is printed to the console for demonstration purposes.