Handling CAPTCHAs with IronOCR

Will IronOCR read captcha codes?

This is possible, but not guaranteed.

Most CAPTCHA generators are deliberately designed to fool OCR software and some even use "Failing to be read by OCR Software" such as Tesseract as a unit test.

Captcha codes are by definition very difficult for OCR engines to read. The resolution is very low and each character is specifically organized with different angles and gaps from the others, along with the inclusion of variable background noise.

Grayscale images with background noise removed are more successful than color images, but can still prove a challenge:

Below is a sample C# code that attempts to remove noise and convert a CAPTCHA image to grayscale to improve OCR results:

using IronOcr;

class CaptchaReader
{
    static void Main(string[] args)
    {
        // Initialize the IronOCR engine
        var Ocr = new IronTesseract();

        // Create an OCR input object
        var Input = new OcrInput("captcha-image.jpg");

        // Apply noise reduction to improve OCR accuracy
        // This removes background noise while preserving text
        Input.DeNoise();

        // Optionally apply a deep clean for more aggressive noise removal
        Input.DeepCleanBackgroundNoise();

        // Convert the image to grayscale 
        // OCR works better on grayscale images compared to colored ones
        Input.ToGrayScale();

        // Perform OCR to extract text from the image
        var Result = Ocr.Read(Input);

        // Output the recognized text to the console
        Console.WriteLine(Result.Text);
    }
}
using IronOcr;

class CaptchaReader
{
    static void Main(string[] args)
    {
        // Initialize the IronOCR engine
        var Ocr = new IronTesseract();

        // Create an OCR input object
        var Input = new OcrInput("captcha-image.jpg");

        // Apply noise reduction to improve OCR accuracy
        // This removes background noise while preserving text
        Input.DeNoise();

        // Optionally apply a deep clean for more aggressive noise removal
        Input.DeepCleanBackgroundNoise();

        // Convert the image to grayscale 
        // OCR works better on grayscale images compared to colored ones
        Input.ToGrayScale();

        // Perform OCR to extract text from the image
        var Result = Ocr.Read(Input);

        // Output the recognized text to the console
        Console.WriteLine(Result.Text);
    }
}
Imports IronOcr

Friend Class CaptchaReader
	Shared Sub Main(ByVal args() As String)
		' Initialize the IronOCR engine
		Dim Ocr = New IronTesseract()

		' Create an OCR input object
		Dim Input = New OcrInput("captcha-image.jpg")

		' Apply noise reduction to improve OCR accuracy
		' This removes background noise while preserving text
		Input.DeNoise()

		' Optionally apply a deep clean for more aggressive noise removal
		Input.DeepCleanBackgroundNoise()

		' Convert the image to grayscale 
		' OCR works better on grayscale images compared to colored ones
		Input.ToGrayScale()

		' Perform OCR to extract text from the image
		Dim Result = Ocr.Read(Input)

		' Output the recognized text to the console
		Console.WriteLine(Result.Text)
	End Sub
End Class
$vbLabelText   $csharpLabel

Explanation:

  • IronOcr: This library is used for reading text from images.
  • OcrInput: This class represents the image input for OCR processing.
  • DeNoise: This method is used to reduce background noise in the image.
  • DeepCleanBackgroundNoise: This method is employed for more aggressive noise reduction if the basic DeNoise isn't sufficient.
  • ToGrayScale: This converts the image to grayscale to improve recognition accuracy.
  • Read: This method is called to extract text from the preprocessed image.