Uso de paquetes de idiomas de OCR personalizados con IronOCR

Actualizado:1 de junio de 2025

Translated

View the article in English

¿Cómo crear paquetes de idiomas personalizados para usar en IronOCR?

Crear un paquete de idioma personalizado requiere entrenar un nuevo archivo/diccionario de idioma LSTM de Tesseract 4 a partir de una fuente.

Hay muchos tutoriales disponibles en línea que explican los pasos necesarios para hacer esto. El proceso no es simple, pero afortunadamente está bastante bien documentado.

Como un buen punto de partida, sugerimos este tutorial de YouTube de Gabriel Garcia (sin afiliación) y su repositorio de GitHub relacionado.

Una vez completo, el resultado será un archivo .traineddata.

El archivo .traineddata puede luego referenciarse en IronOCR de la siguiente manera:

Documentación: IronOCR Custom Languages

using IronOcr;

class Program
{
    static void Main()
    {
        // Initialize the IronTesseract OCR engine
        var Ocr = new IronTesseract();

        // Load your custom Tesseract language file (trained .traineddata file)
        Ocr.UseCustomTesseractLanguageFile("mydir/custom.traineddata");  //<--- your new font

        // Multiple fonts can be used by calling the method multiple times with different files

        // Load an image into the OCR Input for processing
        using (var Input = new OcrInput(@"images\image.png"))
        {
            // Perform OCR on the input image
            var Result = Ocr.Read(Input);

            // Output the recognized text to the console
            Console.WriteLine(Result.Text);
        }
    }
}

using IronOcr;

class Program
{
    static void Main()
    {
        // Initialize the IronTesseract OCR engine
        var Ocr = new IronTesseract();

        // Load your custom Tesseract language file (trained .traineddata file)
        Ocr.UseCustomTesseractLanguageFile("mydir/custom.traineddata");  //<--- your new font

        // Multiple fonts can be used by calling the method multiple times with different files

        // Load an image into the OCR Input for processing
        using (var Input = new OcrInput(@"images\image.png"))
        {
            // Perform OCR on the input image
            var Result = Ocr.Read(Input);

            // Output the recognized text to the console
            Console.WriteLine(Result.Text);
        }
    }
}

Imports IronOcr

Friend Class Program
	Shared Sub Main()
		' Initialize the IronTesseract OCR engine
		Dim Ocr = New IronTesseract()

		' Load your custom Tesseract language file (trained .traineddata file)
		Ocr.UseCustomTesseractLanguageFile("mydir/custom.traineddata") '<--- your new font

		' Multiple fonts can be used by calling the method multiple times with different files

		' Load an image into the OCR Input for processing
		Using Input = New OcrInput("images\image.png")
			' Perform OCR on the input image
			Dim Result = Ocr.Read(Input)

			' Output the recognized text to the console
			Console.WriteLine(Result.Text)
		End Using
	End Sub
End Class

$vbLabelText $csharpLabel

Curtis Chau

Chatea con el equipo de ingeniería ahora

Escritor Técnico

Curtis Chau tiene una licenciatura en Ciencias de la Computación (Carleton University) y se especializa en el desarrollo front-end con experiencia en Node.js, TypeScript, JavaScript y React. Apasionado por crear interfaces de usuario intuitivas y estéticamente agradables, disfruta trabajando con frameworks modernos y creando manuales bien ...

¿Listo para empezar?

Nuget Descargas 5,384,824 | Versión: 2026.2 recién lanzado