Multiple Languages for 1 Document
IronOCR supports 125 international languages.
It is possible to use more than one language at a time to read documents that contain words in multiple languages.
You may also use downloaded or proprietary languages and fonts by following the Tesseract .traineddata file format standard.
Here is an example of how you might implement IronOCR in C# to perform OCR on a document containing multiple languages:
// Include the necessary IronOCR library at the beginning of your file
using IronOcr;
class Program
{
static void Main()
{
// Create an instance of the IronTesseract class
var Ocr = new IronTesseract();
// Set the languages in which your document is written
// 'eng' is the language code for English and 'spa' for Spanish
Ocr.Language = Ocr.Languages.English.Ocr + Ocr.Languages.Spanish.Ocr;
// Load an image or a document from which text needs to be extracted
using (var input = new OcrInput(@"path_to_your_file"))
{
// Perform OCR on the input file
var Result = Ocr.Read(input);
// Output the results to the console
Console.WriteLine(Result.Text);
}
}
}
// Include the necessary IronOCR library at the beginning of your file
using IronOcr;
class Program
{
static void Main()
{
// Create an instance of the IronTesseract class
var Ocr = new IronTesseract();
// Set the languages in which your document is written
// 'eng' is the language code for English and 'spa' for Spanish
Ocr.Language = Ocr.Languages.English.Ocr + Ocr.Languages.Spanish.Ocr;
// Load an image or a document from which text needs to be extracted
using (var input = new OcrInput(@"path_to_your_file"))
{
// Perform OCR on the input file
var Result = Ocr.Read(input);
// Output the results to the console
Console.WriteLine(Result.Text);
}
}
}
' Include the necessary IronOCR library at the beginning of your file
Imports IronOcr
Friend Class Program
Shared Sub Main()
' Create an instance of the IronTesseract class
Dim Ocr = New IronTesseract()
' Set the languages in which your document is written
' 'eng' is the language code for English and 'spa' for Spanish
Ocr.Language = Ocr.Languages.English.Ocr + Ocr.Languages.Spanish.Ocr
' Load an image or a document from which text needs to be extracted
Using input = New OcrInput("path_to_your_file")
' Perform OCR on the input file
Dim Result = Ocr.Read(input)
' Output the results to the console
Console.WriteLine(Result.Text)
End Using
End Sub
End Class
Explanation of the Code:
- The
IronOCR
library is included at the top of the file to utilize its OCR functionality. - An instance of
IronTesseract
is created to manage the OCR process. - The
Language
property of theIronTesseract
object is set to specify which languages you expect in the document. In this example, it is set to handle both English and Spanish (eng
andspa
). - An
OcrInput
object is created, loading the file from which you want to extract text. - The
Read
method is called on theIronTesseract
object to perform OCR, storing the result in a variable. - Finally, the OCR result is printed to the console.
By following the Tesseract .traineddata file format standard, you can utilize custom or proprietary language models ensuring broader support and accuracy.