Multiple Languages for 1 Document

IronOCR supports 125 international languages.

It is possible to use more than one language at a time to read documents that contain words in multiple languages.

You may also use downloaded or proprietary languages and fonts by following the Tesseract .traineddata file format standard.

Here is an example of how you might implement IronOCR in C# to perform OCR on a document containing multiple languages:

// Include the necessary IronOCR library at the beginning of your file
using IronOcr;

class Program
{
    static void Main()
    {
        // Create an instance of the IronTesseract class
        var Ocr = new IronTesseract();

        // Set the languages in which your document is written
        // 'eng' is the language code for English and 'spa' for Spanish
        Ocr.Language = Ocr.Languages.English.Ocr + Ocr.Languages.Spanish.Ocr;

        // Load an image or a document from which text needs to be extracted
        using (var input = new OcrInput(@"path_to_your_file"))
        {
            // Perform OCR on the input file
            var Result = Ocr.Read(input);

            // Output the results to the console
            Console.WriteLine(Result.Text);
        }
    }
}
// Include the necessary IronOCR library at the beginning of your file
using IronOcr;

class Program
{
    static void Main()
    {
        // Create an instance of the IronTesseract class
        var Ocr = new IronTesseract();

        // Set the languages in which your document is written
        // 'eng' is the language code for English and 'spa' for Spanish
        Ocr.Language = Ocr.Languages.English.Ocr + Ocr.Languages.Spanish.Ocr;

        // Load an image or a document from which text needs to be extracted
        using (var input = new OcrInput(@"path_to_your_file"))
        {
            // Perform OCR on the input file
            var Result = Ocr.Read(input);

            // Output the results to the console
            Console.WriteLine(Result.Text);
        }
    }
}
' Include the necessary IronOCR library at the beginning of your file
Imports IronOcr

Friend Class Program
	Shared Sub Main()
		' Create an instance of the IronTesseract class
		Dim Ocr = New IronTesseract()

		' Set the languages in which your document is written
		' 'eng' is the language code for English and 'spa' for Spanish
		Ocr.Language = Ocr.Languages.English.Ocr + Ocr.Languages.Spanish.Ocr

		' Load an image or a document from which text needs to be extracted
		Using input = New OcrInput("path_to_your_file")
			' Perform OCR on the input file
			Dim Result = Ocr.Read(input)

			' Output the results to the console
			Console.WriteLine(Result.Text)
		End Using
	End Sub
End Class
$vbLabelText   $csharpLabel

Explanation of the Code:

  • The IronOCR library is included at the top of the file to utilize its OCR functionality.
  • An instance of IronTesseract is created to manage the OCR process.
  • The Language property of the IronTesseract object is set to specify which languages you expect in the document. In this example, it is set to handle both English and Spanish (eng and spa).
  • An OcrInput object is created, loading the file from which you want to extract text.
  • The Read method is called on the IronTesseract object to perform OCR, storing the result in a variable.
  • Finally, the OCR result is printed to the console.

By following the Tesseract .traineddata file format standard, you can utilize custom or proprietary language models ensuring broader support and accuracy.