OCR Image Optimization Filters

VB C#

using IronOcr;
using System;

var ocrTesseract = new IronTesseract();
using var ocrInput = new OcrInput();
// First load all image(s)
ocrInput.LoadImage(@"images\image.png");

// Note: You don't need all of them; most users only need Deskew() and occasionally DeNoise()
ocrInput.WithTitle("My Document");
ocrInput.Binarize();
ocrInput.Contrast();
ocrInput.Deskew();
ocrInput.DeNoise();
ocrInput.Despeckle();
ocrInput.Dilate();
ocrInput.EnhanceResolution(300);
ocrInput.Invert();
ocrInput.Rotate(90);
ocrInput.Scale(150);
ocrInput.Sharpen();
ocrInput.ToGrayScale();
ocrInput.Erode();

// WIZARD - If you are unsure use the debug-wizard to test all combinations:
string codeToRun = OcrInputFilterWizard.Run(@"images\image.png", out double confidence, ocrTesseract);
Console.WriteLine(codeToRun);

// Optional: Export modified images so you can view them.
foreach (var page in ocrInput.GetPages())
{
    page.SaveAsImage($"filtered_{page.Index}.bmp");
}

var ocrResult = ocrTesseract.Read(ocrInput);
Console.WriteLine(ocrResult.Text);

Imports IronOcr
Imports System

Private ocrTesseract = New IronTesseract()
Private ocrInput = New OcrInput()
' First load all image(s)
ocrInput.LoadImage("images\image.png")

' Note: You don't need all of them; most users only need Deskew() and occasionally DeNoise()
ocrInput.WithTitle("My Document")
ocrInput.Binarize()
ocrInput.Contrast()
ocrInput.Deskew()
ocrInput.DeNoise()
ocrInput.Despeckle()
ocrInput.Dilate()
ocrInput.EnhanceResolution(300)
ocrInput.Invert()
ocrInput.Rotate(90)
ocrInput.Scale(150)
ocrInput.Sharpen()
ocrInput.ToGrayScale()
ocrInput.Erode()

' WIZARD - If you are unsure use the debug-wizard to test all combinations:
Dim confidence As Double
Dim codeToRun As String = OcrInputFilterWizard.Run("images\image.png", confidence, ocrTesseract)
Console.WriteLine(codeToRun)

' Optional: Export modified images so you can view them.
For Each page In ocrInput.GetPages()
	page.SaveAsImage($"filtered_{page.Index}.bmp")
Next page

Dim ocrResult = ocrTesseract.Read(ocrInput)
Console.WriteLine(ocrResult.Text)

Install-Package IronOcr

OCR Image Optimization Filters

The OcrInput class provides granular control to C# and .NET developers to preprocess image input for speed and accuracy before OCR processing. This negates the common practice of using Photoshop Batch Scripts or ImageMagick to prepare images for OCR.

How to Use OCR Filter in Tesseract Alternatively

Install an OCR library to use OCR Filter
Create a OcrInput object using the image path
(optional) Process the image using filter methods.
Use the Read method.
Display the result using the OcrResult's Text property.

Below is an example demonstrating how to use the OcrInput class in C# with IronOcr:

using IronOcr;
using System;

class OcrExample
{
    static void Main()
    {
        // Initialize a new OcrInput object with the path to the image file.
        var ocrInput = new OcrInput(@"path\to\image.jpg");

        // Optional: Preprocess the image by applying various filters.
        // This can include adjusting brightness, sharpening the image, or other adjustments
        // to enhance OCR accuracy.
        ocrInput.Contrast(); // Example of enhancing the image contrast
        ocrInput.Sharpen();  // Example of sharpening the image

        // Create an instance of the IronTesseract class to perform OCR.
        var Ocr = new IronTesseract();

        // Perform OCR on the preprocessed image.
        var result = Ocr.Read(ocrInput);

        // Output the recognized text to the console.
        Console.WriteLine(result.Text);
    }
}

using IronOcr;
using System;

class OcrExample
{
    static void Main()
    {
        // Initialize a new OcrInput object with the path to the image file.
        var ocrInput = new OcrInput(@"path\to\image.jpg");

        // Optional: Preprocess the image by applying various filters.
        // This can include adjusting brightness, sharpening the image, or other adjustments
        // to enhance OCR accuracy.
        ocrInput.Contrast(); // Example of enhancing the image contrast
        ocrInput.Sharpen();  // Example of sharpening the image

        // Create an instance of the IronTesseract class to perform OCR.
        var Ocr = new IronTesseract();

        // Perform OCR on the preprocessed image.
        var result = Ocr.Read(ocrInput);

        // Output the recognized text to the console.
        Console.WriteLine(result.Text);
    }
}

Imports IronOcr
Imports System

Friend Class OcrExample
	Shared Sub Main()
		' Initialize a new OcrInput object with the path to the image file.
		Dim ocrInput As New OcrInput("path\to\image.jpg")

		' Optional: Preprocess the image by applying various filters.
		' This can include adjusting brightness, sharpening the image, or other adjustments
		' to enhance OCR accuracy.
		ocrInput.Contrast() ' Example of enhancing the image contrast
		ocrInput.Sharpen() ' Example of sharpening the image

		' Create an instance of the IronTesseract class to perform OCR.
		Dim Ocr = New IronTesseract()

		' Perform OCR on the preprocessed image.
		Dim result = Ocr.Read(ocrInput)

		' Output the recognized text to the console.
		Console.WriteLine(result.Text)
	End Sub
End Class

$vbLabelText $csharpLabel

Detailed Steps:

Install the OCR Library: Begin by installing the IronOcr OCR library from NuGet. This library provides the functionality needed to perform OCR.
Create OcrInput Object: Use the path to your image file to initialize an OcrInput object. This object represents the image that you will process for OCR.
Preprocess the Image (Optional): You can optionally preprocess the image using methods such as Contrast() and Sharpen() to improve the accuracy of the OCR. This is especially useful for images with low contrast or blurriness.
Read the Image: Use the Read method from an instance of IronTesseract to perform OCR on the OcrInput object.
Display the Result: Finally, use the Text property of the OcrResult to obtain and display the recognized text.

This approach offers a more programmatic and streamlined method of preparing and processing images for OCR, suitable for applications in C# and .NET environments.