Test in production without watermarks.
Works wherever you need it to.
Get 30 days of fully functional product.
Have it up and running in minutes.
Full access to our support engineering team during your product trial
This tutorial provides a comprehensive guide on using Tesseract in conjunction with IronOCR to recognize text in multiple languages from PDFs and images. First, ensure that IronOCR and the necessary language packs are installed in your project using the NuGet package manager. Begin by importing the required namespaces and setting up IronOCR with a valid license key to unlock its full capabilities. Instantiate the IronTesseract
object to perform optical character recognition, initially using English as the default language. To add support for additional languages, such as Russian, utilize the method to add a secondary language.
Here's a step-by-step guide with properly formatted code:
// Ensure you have added references to IronOCR package and installed necessary language packs.
// Import required namespaces
using IronOcr;
using System;
using System.IO;
class OCRExample
{
static void Main()
{
// Configure IronOcr with your license key
var Ocr = new IronTesseract();
// Set the primary language to English
Ocr.Language = OcrLanguage.English;
// Add a secondary language (Russian)
Ocr.AddSecondaryLanguage(OcrLanguage.Russian);
// Load a PDF file and perform OCR
var pdfInput = new OcrInput();
pdfInput.AddPdf("example.PDF");
var result = Ocr.Read(pdfInput);
// Ensure accurate display of multilingual characters in the console
Console.OutputEncoding = System.Text.Encoding.UTF8;
// Print the extracted text from the PDF
Console.WriteLine("Extracted Text from PDF:");
Console.WriteLine(result.Text);
// Adjust primary language to Russian and add Japanese as a secondary language
Ocr.Language = OcrLanguage.Russian;
Ocr.AddSecondaryLanguage(OcrLanguage.Japanese);
// Load an image file and perform OCR
var imageInput = new OcrInput();
imageInput.AddImage("example.png");
var imageResult = Ocr.Read(imageInput);
// Print the extracted text from the image
Console.WriteLine("Extracted Text from Image:");
Console.WriteLine(imageResult.Text);
}
}
// Ensure you have added references to IronOCR package and installed necessary language packs.
// Import required namespaces
using IronOcr;
using System;
using System.IO;
class OCRExample
{
static void Main()
{
// Configure IronOcr with your license key
var Ocr = new IronTesseract();
// Set the primary language to English
Ocr.Language = OcrLanguage.English;
// Add a secondary language (Russian)
Ocr.AddSecondaryLanguage(OcrLanguage.Russian);
// Load a PDF file and perform OCR
var pdfInput = new OcrInput();
pdfInput.AddPdf("example.PDF");
var result = Ocr.Read(pdfInput);
// Ensure accurate display of multilingual characters in the console
Console.OutputEncoding = System.Text.Encoding.UTF8;
// Print the extracted text from the PDF
Console.WriteLine("Extracted Text from PDF:");
Console.WriteLine(result.Text);
// Adjust primary language to Russian and add Japanese as a secondary language
Ocr.Language = OcrLanguage.Russian;
Ocr.AddSecondaryLanguage(OcrLanguage.Japanese);
// Load an image file and perform OCR
var imageInput = new OcrInput();
imageInput.AddImage("example.png");
var imageResult = Ocr.Read(imageInput);
// Print the extracted text from the image
Console.WriteLine("Extracted Text from Image:");
Console.WriteLine(imageResult.Text);
}
}
' Ensure you have added references to IronOCR package and installed necessary language packs.
' Import required namespaces
Imports IronOcr
Imports System
Imports System.IO
Friend Class OCRExample
Shared Sub Main()
' Configure IronOcr with your license key
Dim Ocr = New IronTesseract()
' Set the primary language to English
Ocr.Language = OcrLanguage.English
' Add a secondary language (Russian)
Ocr.AddSecondaryLanguage(OcrLanguage.Russian)
' Load a PDF file and perform OCR
Dim pdfInput = New OcrInput()
pdfInput.AddPdf("example.PDF")
Dim result = Ocr.Read(pdfInput)
' Ensure accurate display of multilingual characters in the console
Console.OutputEncoding = System.Text.Encoding.UTF8
' Print the extracted text from the PDF
Console.WriteLine("Extracted Text from PDF:")
Console.WriteLine(result.Text)
' Adjust primary language to Russian and add Japanese as a secondary language
Ocr.Language = OcrLanguage.Russian
Ocr.AddSecondaryLanguage(OcrLanguage.Japanese)
' Load an image file and perform OCR
Dim imageInput = New OcrInput()
imageInput.AddImage("example.png")
Dim imageResult = Ocr.Read(imageInput)
' Print the extracted text from the image
Console.WriteLine("Extracted Text from Image:")
Console.WriteLine(imageResult.Text)
End Sub
End Class
AddPdf
, and the OCR process is executed using Read
, capturing the text content.By following these steps, you can seamlessly extract and recognize text in English, Russian, and Japanese from various file types. This tutorial highlights the effectiveness of using multiple languages with Tesseract and IronOCR, making it straightforward to process multilingual text in PDFs and images. For more tutorials and to start using IronOCR, subscribe to Iron Software and consider signing up for a trial.
Further Reading: How to use Multiple Languages with Tesseract