使用 IronOCR 處理 CAPTCHA
This article was translated from English: Does it need improvement?
Translated
View the article in English
IronOCR 能否讀取驗證碼?
這雖有可能,但無法保證。
大多數 CAPTCHA 生成器都是刻意設計來迷惑 OCR 軟體的,有些甚至將"無法被 OCR 軟體讀取"(例如 Tesseract)作為單元測試的標準。
根據定義,驗證碼(Captcha)對 OCR 引擎而言極難辨識。 圖像解析度極低,每個字元皆以特定角度排列,彼此間距各異,且背景包含不規則的雜訊。
去除背景雜訊的灰階圖像比彩色圖像更成功,但仍可能帶來挑戰:
以下是一段 C# 範例程式碼,旨在透過去除雜訊並將 CAPTCHA 圖片轉為灰階,以提升 OCR 辨識效果:
using IronOcr;
class CaptchaReader
{
static void Main(string[] args)
{
// Initialize the IronOCR engine
var Ocr = new IronTesseract();
// Create an OCR input object
var Input = new OcrInput("captcha-image.jpg");
// Apply noise reduction to improve OCR accuracy
// This removes background noise while preserving text
Input.DeNoise();
// Optionally apply a deep clean for more aggressive noise removal
Input.DeepCleanBackgroundNoise();
// Convert the image to grayscale
// OCR works better on grayscale images compared to colored ones
Input.ToGrayScale();
// Perform OCR to extract text from the image
var Result = Ocr.Read(Input);
// Output the recognized text to the console
Console.WriteLine(Result.Text);
}
}
using IronOcr;
class CaptchaReader
{
static void Main(string[] args)
{
// Initialize the IronOCR engine
var Ocr = new IronTesseract();
// Create an OCR input object
var Input = new OcrInput("captcha-image.jpg");
// Apply noise reduction to improve OCR accuracy
// This removes background noise while preserving text
Input.DeNoise();
// Optionally apply a deep clean for more aggressive noise removal
Input.DeepCleanBackgroundNoise();
// Convert the image to grayscale
// OCR works better on grayscale images compared to colored ones
Input.ToGrayScale();
// Perform OCR to extract text from the image
var Result = Ocr.Read(Input);
// Output the recognized text to the console
Console.WriteLine(Result.Text);
}
}
Imports IronOcr
Friend Class CaptchaReader
Shared Sub Main(ByVal args() As String)
' Initialize the IronOCR engine
Dim Ocr = New IronTesseract()
' Create an OCR input object
Dim Input = New OcrInput("captcha-image.jpg")
' Apply noise reduction to improve OCR accuracy
' This removes background noise while preserving text
Input.DeNoise()
' Optionally apply a deep clean for more aggressive noise removal
Input.DeepCleanBackgroundNoise()
' Convert the image to grayscale
' OCR works better on grayscale images compared to colored ones
Input.ToGrayScale()
' Perform OCR to extract text from the image
Dim Result = Ocr.Read(Input)
' Output the recognized text to the console
Console.WriteLine(Result.Text)
End Sub
End Class
$vbLabelText
$csharpLabel
說明:
IronOcr:此函式庫用於從圖像中讀取文字。OcrInput:此類別代表用於 OCR 處理的影像輸入。DeNoise:此方法用於降低影像中的背景雜訊。DeepCleanBackgroundNoise:若基本DeNoise無法滿足需求,則採用此方法進行更強力的雜訊抑制。ToGrayScale:此操作會將圖片轉換為灰階,以提升辨識準確度。Read:呼叫此方法可從預處理過的影像中擷取文字。
準備開始了嗎?
Nuget 下載 5,888,303 | 版本: 2026.5 just released

