OCR Image Color Editing

OCR works faster and more accurately when we read black text on a white background.

If we have, for example, blue text on a pink background, we will want to swap blue to black and pink to white before performing OCR.

This can be very time-consuming and slow using System.Drawing, but it is completely automated with IronOCR.

The OcrInput.ReplaceColor method allows us to replace one color with another in a document. This method is adaptive, allowing you to specify a percentage tolerance for a match to an exact RGB color. This removes the need to use Photoshop or ImageMagick scripts to prepare images for OCR.

Here is an example of how to use IronOCR to replace colors:

using IronOcr; // Import IronOCR namespace

class Program
{
    static void Main()
    {
        // Create an instance of OcrInput
        var OcrInput = new OcrInput(@"path/to/your/image.png"); // Specify the image path

        // Replace blue color with black
        // You can specify a tolerance value that determines how much the color can differ from an exact match
        OcrInput.ReplaceColor(System.Drawing.Color.Blue, System.Drawing.Color.Black, 10f); // 10f specifies 10% tolerance

        // Replace pink color with white
        OcrInput.ReplaceColor(System.Drawing.Color.Pink, System.Drawing.Color.White, 10f); // replace with a tolerance of 10%

        // Perform OCR with the adjusted colors
        var OcrResult = new IronTesseract().Read(OcrInput);

        // Output the text recognized by OCR
        Console.WriteLine(OcrResult.Text);
    }
}
using IronOcr; // Import IronOCR namespace

class Program
{
    static void Main()
    {
        // Create an instance of OcrInput
        var OcrInput = new OcrInput(@"path/to/your/image.png"); // Specify the image path

        // Replace blue color with black
        // You can specify a tolerance value that determines how much the color can differ from an exact match
        OcrInput.ReplaceColor(System.Drawing.Color.Blue, System.Drawing.Color.Black, 10f); // 10f specifies 10% tolerance

        // Replace pink color with white
        OcrInput.ReplaceColor(System.Drawing.Color.Pink, System.Drawing.Color.White, 10f); // replace with a tolerance of 10%

        // Perform OCR with the adjusted colors
        var OcrResult = new IronTesseract().Read(OcrInput);

        // Output the text recognized by OCR
        Console.WriteLine(OcrResult.Text);
    }
}
Imports IronOcr ' Import IronOCR namespace

Friend Class Program
	Shared Sub Main()
		' Create an instance of OcrInput
		Dim OcrInput As New OcrInput("path/to/your/image.png") ' Specify the image path

		' Replace blue color with black
		' You can specify a tolerance value that determines how much the color can differ from an exact match
		OcrInput.ReplaceColor(System.Drawing.Color.Blue, System.Drawing.Color.Black, 10F) ' 10f specifies 10% tolerance

		' Replace pink color with white
		OcrInput.ReplaceColor(System.Drawing.Color.Pink, System.Drawing.Color.White, 10F) ' replace with a tolerance of 10%

		' Perform OCR with the adjusted colors
		Dim OcrResult = (New IronTesseract()).Read(OcrInput)

		' Output the text recognized by OCR
		Console.WriteLine(OcrResult.Text)
	End Sub
End Class
$vbLabelText   $csharpLabel

Explanation

  • IronOCR: A .NET library that simplifies the process of OCR (Optical Character Recognition) for .NET languages.
  • OcrInput: Represents the image input for OCR processing. The ReplaceColor method is used to adjust colors within this image.
  • ReplaceColor Method Parameters:
    • The first parameter is the color to be replaced.
    • The second parameter is the replacement color.
    • The third optional parameter specifies tolerance for how closely the colors must match which allows for variations within the specified percentage.

By replacing colors in your images, you can optimize the conditions for OCR and increase its efficacy significantly.