VIDEOS

How to use Multiple Languages with Tesseract

Kannaopat Udonpant
Kannapat Udonpant
March 17, 2025
Share:


This tutorial provides a comprehensive guide on using Tesseract in conjunction with IronOCR to recognize text in multiple languages from PDFs and images. First, ensure that IR OCR and the necessary language packs are installed in your project using the Nouget package manager. Begin by importing the required namespaces and setting up IronOCR with a valid license key to unlock its full capabilities. Instantiate the IronOCR Tesseract object to perform optical character recognition, initially using English as the default language. To add support for additional languages, such as Russian, utilize the 'add secondary language' method.

Load a PDF file named 'example.PDF' with text in various languages using the OCR PDF input class. Perform OCR to extract the text content, storing the results in a designated object. To ensure accurate display of multilingual characters, set the console output encoding to Unicode before printing the extracted text to the console.

Further, adjust the primary language to Russian and add Japanese as a secondary language. This modification facilitates the recognition of both Russian and Japanese text. Load an image file, 'example.png', containing multilingual text using the OCR image input class, and execute OCR with the configured language settings. Store the result and print the extracted text from the image to the console.

By following these steps, you can seamlessly extract and recognize text in English, Russian, and Japanese from various file types. This tutorial highlights the effectiveness of using multiple languages with Tesseract and IronOCR, making it straightforward to process multilingual text in PDFs and images. For more tutorials and to start using IronOCR, subscribe to Iron Software and consider signing up for a trial.

Get stated with IronOCR now.
green arrow pointer

Kannaopat Udonpant
Software Engineer
Before becoming a Software Engineer, Kannapat completed a Environmental Resources PhD from Hokkaido University in Japan. While pursuing his degree, Kannapat also became a member of the Vehicle Robotics Laboratory, which is part of the Department of Bioproduction Engineering. In 2022, he leveraged his C# skills to join Iron Software's engineering team, where he focuses on IronPDF. Kannapat values his job because he learns directly from the developer who writes most of the code used in IronPDF. In addition to peer learning, Kannapat enjoys the social aspect of working at Iron Software. When he's not writing code or documentation, Kannapat can usually be found gaming on his PS5 or rewatching The Last of Us.
< PREVIOUS
How to use OCR Language Packs in IronOCR
NEXT >
How to extract text from an image file