OCR图像优化过滤器
OcrInput
类为 C# 和 .NET 开发者提供了粒度控制,以便在进行之前预处理图像输入以提高速度和精度。
这否定了使用Photoshop批处理脚本或ImageMagick来准备OCR图像的常见做法。
如何在 Tesseract 中交替使用 OCR 过滤器
- 安装 OCR 库以使用 OCR 筛选器
- 创建一个
OcrInput
对象的图像路径 - (可选)使用滤波方法处理图像。
- 使用
读取
方法。 - 使用
OcrResult 的
文本属性。
using IronOcr; using System; var ocrTesseract = new IronTesseract(); using var ocrInput = new OcrInput(); // First load all image(s) ocrInput.LoadImage(@"images\image.png"); // Note: You don't need all of them; most users only need Deskew() and occasionally DeNoise() ocrInput.WithTitle("My Document"); ocrInput.Binarize(); ocrInput.Contrast(); ocrInput.Deskew(); ocrInput.DeNoise(); ocrInput.Despeckle(); ocrInput.Dilate(); ocrInput.EnhanceResolution(300); ocrInput.Invert(); ocrInput.Rotate(90); ocrInput.Scale(150); ocrInput.Sharpen(); ocrInput.ToGrayScale(); ocrInput.Erode(); // WIZARD - If you are unsure use the debug-wizard to test all combinations: string codeToRun = OcrInputFilterWizard.Run(@"images\image.png", out double confidence, ocrTesseract); Console.WriteLine(codeToRun); // Optional: Export modified images so you can view them. foreach (var page in ocrInput.GetPages()) { page.SaveAsImage($"filtered_{page.Index}.bmp"); } var ocrResult = ocrTesseract.Read(ocrInput); Console.WriteLine(ocrResult.Text);
Imports IronOcr Imports System Private ocrTesseract = New IronTesseract() Private ocrInput = New OcrInput() ' First load all image(s) ocrInput.LoadImage("images\image.png") ' Note: You don't need all of them; most users only need Deskew() and occasionally DeNoise() ocrInput.WithTitle("My Document") ocrInput.Binarize() ocrInput.Contrast() ocrInput.Deskew() ocrInput.DeNoise() ocrInput.Despeckle() ocrInput.Dilate() ocrInput.EnhanceResolution(300) ocrInput.Invert() ocrInput.Rotate(90) ocrInput.Scale(150) ocrInput.Sharpen() ocrInput.ToGrayScale() ocrInput.Erode() ' WIZARD - If you are unsure use the debug-wizard to test all combinations: Dim confidence As Double Dim codeToRun As String = OcrInputFilterWizard.Run("images\image.png", confidence, ocrTesseract) Console.WriteLine(codeToRun) ' Optional: Export modified images so you can view them. For Each page In ocrInput.GetPages() page.SaveAsImage($"filtered_{page.Index}.bmp") Next page Dim ocrResult = ocrTesseract.Read(ocrInput) Console.WriteLine(ocrResult.Text)
Install-Package IronOcr
OcrInput
类为 C# 和 .NET 开发者提供了粒度控制,以便在进行之前预处理图像输入以提高速度和精度。
这否定了使用Photoshop批处理脚本或ImageMagick来准备OCR图像的常见做法。
OcrInput
对象的图像路径读取
方法。OcrResult 的
文本属性。