與其他組件比較

IronOCR 與 Dynamsoft OCR 的比較

發佈 2022年6月13日
分享:

光學字符識別,或稱OCR,是一種數據輸入過程,涉及對文本的識別和數字化,無論是手寫還是印刷文本。 這是一種計算機技術,利用圖像分析將印刷文本的數字照片轉換為可由其他程式(如文字處理器)使用的字母和數字。 文本轉換為字符代碼,以便可以在計算機上搜索和更改。

雖然過去是一個所有文件都是實體的世界,未來可能是一個所有文件都是數位的社會,但目前正處於變動中。 在這種過渡狀態下,實體和數位文件共存,因此像 OCR 這樣的技術對於來回轉換至關重要。

文件恢復、資料輸入和可及性只是 OCR 的一些應用。 大多數的光學文字識別(OCR)應用來自掃描文件,儘管偶爾也使用照片。 OCR 是一個寶貴的時間節省工具,因為重新輸入材料通常是唯一的其他選擇。 以下是一些 OCR 的使用範例:

  • 可编辑的文本文件可以从扫描文件(包括传真)中恢复。
  • 表單是根據其手寫內容的近似值進行分類的。
  • 使用書籍掃描來創建可搜尋和可編輯的電子書。
  • 使用截圖照片來搜尋並更改文字。
  • 文字轉語音技術被用來為視障人士朗讀書籍。

    雖然這些只是 OCR 的部分應用,但它展示了該技術在各個行業中的多功能性。 幾乎所有公司中的所有員工在日常工作中都大量依賴文件,因此商務用途是 OCR 系統開發中的一個關鍵考量因素。

    在本文中,我們將比較兩個最強大的 OCR 閱讀器:

  • IronOCR
  • Dynamsoft OCR

    IronOCR 和 Dynamsoft OCR 是兩個 .NET OCR 函式庫,支援掃描影像的轉換和 PDF 文件的 OCR 處理。 只需幾行程式碼即可將圖像轉換為可搜索的文本。 您也可以檢索單個單詞、字母和段落。

IronOCR — 傑出的功能

IronOCR 提供了獨特的能力,可以偵測、讀取和解釋那些未經精確掃描的圖片和 PDF 文件中的文字。 IronOCR 提供了最簡單的方法來從文件和照片中提取文字,即使它並不總是最快的,因為它會自動銳化和校正低質量掃描,減少傾斜、失真、背景噪音和透視問題,同時也改善解析度和對比度。

IronOCR允許開發人員向其發送單頁或多頁掃描圖像,並將返回所有文字、條形碼和QR資訊。 在OCR庫中,一組類為基於網頁、桌面或控制台的應用程式添加OCR功能。 Tesseract OCR C#,以及 net 應用程式 JPG、PNG、TIFF、PDF、GIF 和 BMP,只是可用作輸入的幾種格式。

IronOCR 的光學字符辨識(光學字符識別)引擎可以讀取使用許多常見字體、斜體、字重和下劃線準備的文本。 裁剪類別使得OCR能夠快速且精確地運作。 在處理多頁文件時,IronOCR 的多執行緒引擎可加快光學字符識別 (OCR) 的速度。

IronOCR 功能

在 Tesseract 管理方面,我們使用 IronOCR,因為它在以下方面具有獨特性:

  • 在純 .NET 環境中即可直接使用
  • 不需要在您的機器上安裝Tesseract
  • 運行最新引擎:Tesseract 5(以及 Tesseract 4 和 3)
  • 適用於任何 .NET 專案:.NET Framework 4.5 +、.NET Standard 2 + 和 .NET Core 2、3 及 .NET 5
  • 比傳統的Tesseract提升了準確性和速度
  • 支持 Xamarin、Mono、Azure 和 Docker
  • 它使用 NuGet 套件管理複雜的 Tesseract 字典系統。
  • 支持 PDF、多幀 Tiff,以及所有主要影像格式,無需配置
  • 可以校正低品質和傾斜掃描以從 Tesseract 獲得最佳結果。

Dynamsoft OCR — 功能

Dynamsoft.NET OCR 庫是一個 .NET 組件,提供快速且可靠的光學字符識別功能。 它用於在 C# 或 VB.NET 中創建 .NET 桌面應用程式。 您可以簡單地創建代碼,使用基本的 OCR API 將 PDF 或照片中的無用文字轉換為數字文本,以便進行編輯、搜索、存檔等操作。

可以通過以下方式從掃描器和其他符合 TWAIN 標準的設備獲取影像:

  • 支援本機、緩衝內存和磁碟檔案圖像傳輸機制。
  • 透過自動文件進紙器,可以進行批量掃描。(ADF).
  • TWAIN 屬性可用於修改常見設備功能。
  • IfAutoFeed、IfAutoScan、解析度、位深度、亮度、對比度、單位、雙面功能及其他功能都可以更改。
  • 支持偵測空白頁面。
  • 允許您更改和保存掃描儀配置檔。

    從符合 UVC 和 WIA 的網路攝影機擷取影像:

  • 在捕捉選定網路攝影機照片的同時顯示實時視頻。
  • 自訂相機的設定:亮度、對比、色調、飽和度、銳度、伽瑪、白平衡、背光補償、增益、顏色啟用、縮放、對焦、曝光、光圈、平移、傾斜、滾動。

    強大的圖像加載/查看

  • 可以載入 BMP、JPEG、PNG、TIFF 和多頁 TIFF 格式的圖像。
  • 支援對照片進行縮放。
  • 可以從本地磁碟、FTP 伺服器、HTTP 伺服器或資料庫中檢索圖像。
  • 使用最全面的.NET影像元件集之一進行BMP、JPEG、PNG和TIFF的圖像解碼。

    保存及上傳/下載

  • 允許您通過文件流讀取和寫入照片。
  • 支持將捕獲的照片儲存為 BMP、JPEG、PNG、TIFF 或多頁 TIFF 到本地磁碟、網絡伺服器或資料庫。
  • 支援 RLE、G3/G4、LZW、PackBits 和 TIFF 壓縮。
  • 支持 HTTPS 上傳與下載。
  • 市場上最豐富的 .NET 影像組件之一,支持 BMP、JPEG、PNG 和 TIFF 圖像編碼。
  • 允許您將新獲得的照片附加到現有的 TIFF 檔案中。

在 ASP.NET 中從掃描的 PDF 或其他圖像中讀取文本(光學字符識別)

在當今快速變化的世界中,客戶希望工作能夠快速完成。 客戶經常聯系我們,解決緊急項目。 如果專案涉及掃描包含圖像的文件,我們的技術可以輕鬆識別圖像的內容並將其轉換為文字。 光學字符識別(光學字符識別)節省公司時間和金錢,同時減少數據輸入錯誤。

使用 IronOCR

IronOCR 使用 IronOcr.IronTesseract 類來執行其 OCR 轉換。

在這個基本範例中,我們使用 IronOcr.IronTesseract 類別從圖像中讀取文字,並自動將結果作為字串返回。

// PM> Install-Package IronOcr
using IronOcr;
var Result = new IronTesseract().Read(@"img\Screenshot.png");
Console.WriteLine(Result.Text);
// PM> Install-Package IronOcr
using IronOcr;
var Result = new IronTesseract().Read(@"img\Screenshot.png");
Console.WriteLine(Result.Text);
' PM> Install-Package IronOcr
Imports IronOcr
Private Result = (New IronTesseract()).Read("img\Screenshot.png")
Console.WriteLine(Result.Text)
VB   C#

因此,以下段落是百分之百準確的: 請提供內容以進行翻譯。

IronOCR 簡單範例

在這個簡單的範例中,我們將測試我們 C# OCR 函式庫辨識 PNG 文字的準確性。

圖像。 這是一個非常基本的測試,但隨著教程的進行,情況會變得更加複雜。

敏捷的棕色狐狸跳過懶狗。 請提供內容以進行翻譯。

雖然表面上看起來很簡單,但在其背後進行著複雜的行為:掃描圖像的對齊、質量和解析度,查看其屬性,優化OCR引擎,最後像人類一樣讀取文本。

OCR 對機器來說是一項困難的任務,閱讀速度可能與人類相當。 換句話說,OCR不是一個快速的程序。 不過在這種情況下,這是絕對正確的。

C# OCR 應用程序結果的準確性

在大多數現實情況中,開發人員會希望他們的專案能夠盡可能快速地運行。 在此情境下,我們建議您改用 IronOCR 擴充命名空間的 OcrInput 和 IronTesseract 類別。

您可以使用 OcrInput 設置 OCR 任務的具體功能,例如:

  • JPEG、TIFF、GIF、BMP 和 PNG 只是可以使用的一些影像格式。
  • 導入整個或部分的 PDF 文件
  • 提高圖片的對比度、解析度和大小
  • 旋轉、掃描噪音、數位噪音、傾斜及負影矯正

    IronTesseract

    從數百種預先包裝的語言和方言中選擇

  • 立即使用 Tesseract 5、4 或 3 OCR 引擎
  • 如果我們正在查看截圖、片段或整個文檔,請指定文檔類型。
  • 識別條碼
  • 可搜尋的 PDF、Hocr HTML、DOM 和字串,這些都是 OCR 結果的選項。
using IronOcr;
var Ocr = new IronTesseract();
using (var Input = new OcrInput(@"img\Potter.tiff")) {
var Result = Ocr.Read(Input);
Console.WriteLine(Result.Text);
}
using IronOcr;
var Ocr = new IronTesseract();
using (var Input = new OcrInput(@"img\Potter.tiff")) {
var Result = Ocr.Read(Input);
Console.WriteLine(Result.Text);
}
Imports IronOcr
Private Ocr = New IronTesseract()
Using Input = New OcrInput("img\Potter.tiff")
Dim Result = Ocr.Read(Input)
Console.WriteLine(Result.Text)
End Using
VB   C#

我們甚至可以在中等質量的掃描上使用這個功能,並且達到100%的準確度。

C# OCR 從 Tiff 掃描範例

如你所見,閱讀文本(以及,如有需要,條碼)從掃描的圖像(如 TIFF)轉換相當容易。 此 OCR 工作的準確率為 100%。

接下來,我們將嘗試對同一頁面進行低品質掃描,使用低解析度和帶有大量失真與數位噪聲的方法,同時原始紙張也已損壞。

帶數字噪點的低解析度掃描 C# OCR

這就是 IronOCR 真正比其他 OCR 库(如 Tesseract)更出色的地方,我們會發現其他 OCR 項目避免討論在現實世界中掃描圖像上的 OCR 使用,而不是為了實現 100% OCR 準確度而數字化創建的不現實的“完美”測試案例。

using IronOcr;
var Ocr = new IronTesseract();
using (var Input = new OcrInput(@"img\Potter.LowQuality.tiff"))
{
Input.Deskew(); // removes rotation and perspective
var Result = Ocr.Read(Input);
Console.WriteLine(Result.Text);
}
using IronOcr;
var Ocr = new IronTesseract();
using (var Input = new OcrInput(@"img\Potter.LowQuality.tiff"))
{
Input.Deskew(); // removes rotation and perspective
var Result = Ocr.Read(Input);
Console.WriteLine(Result.Text);
}
Imports IronOcr
Private Ocr = New IronTesseract()
Using Input = New OcrInput("img\Potter.LowQuality.tiff")
Input.Deskew() ' removes rotation and perspective
Dim Result = Ocr.Read(Input)
Console.WriteLine(Result.Text)
End Using
VB   C#

未添加Input.Deskew()將圖像調整為正確位置,我們獲得52.5%的準確率。 這不夠好。

新增 Input.Deskew()帶來了 99.8% 的準確率,幾乎與高品質掃描的OCR一樣準確。

使用 Dynamsoft OCR

我們將展示一些程式碼片段,使用 Dynamic Web TWAIN 在 JavaScript 中進行 TWAIN 掃描和客戶端 OCR。

掃描圖片

您可以使用 Dynamic Web TWAIN 的簡單 API 更改掃描設置並從 TWAIN 掃描儀獲取照片。

function acquireImage()
{
DWObject.SelectSourceByIndex(document.getElementById("source").selectedIndex); //select an available TWAIN scanners

    //set scanning settings like pixel type, resolution, ADF etc.
    DWObject.IfShowUI = false; //don't show the user interface of the scanner
    DWObject.PixelType = 1; //scan in gray
    DWObject.Resolution = 300;
    DWObject.IfFeederEnabled = true; //scan from auto feeder
    DWObject.IfDuplexEnabled = false;
    DWObject.IfDisableSourceAfterAcquire = true;

    //acquire images from scanners
    DWObject.AcquireImage();
}
function acquireImage()
{
DWObject.SelectSourceByIndex(document.getElementById("source").selectedIndex); //select an available TWAIN scanners

    //set scanning settings like pixel type, resolution, ADF etc.
    DWObject.IfShowUI = false; //don't show the user interface of the scanner
    DWObject.PixelType = 1; //scan in gray
    DWObject.Resolution = 300;
    DWObject.IfFeederEnabled = true; //scan from auto feeder
    DWObject.IfDuplexEnabled = false;
    DWObject.IfDisableSourceAfterAcquire = true;

    //acquire images from scanners
    DWObject.AcquireImage();
}
Private Function acquireImage() As [function]
DWObject.SelectSourceByIndex(document.getElementById("source").selectedIndex) 'select an available TWAIN scanners

	'set scanning settings like pixel type, resolution, ADF etc.
	DWObject.IfShowUI = False 'don't show the user interface of the scanner
	DWObject.PixelType = 1 'scan in gray
	DWObject.Resolution = 300
	DWObject.IfFeederEnabled = True 'scan from auto feeder
	DWObject.IfDuplexEnabled = False
	DWObject.IfDisableSourceAfterAcquire = True

	'acquire images from scanners
	DWObject.AcquireImage()
End Function
VB   C#

下載 OCR Professional 模組

要在客戶端使用 OCR Professional 模組進行 OCR,您需要在 head 中包含 ocrpro.js,並下載 OCR Pro DLL。 請提供內容以進行翻譯。


Make edits to the .js file:

```js
var CurrentPathName = unescape(location.pathname);
CurrentPath = CurrentPathName.substring(0, CurrentPathName.lastIndexOf("/") + 1);
DWObject.Addon.OCRPro.Download(CurrentPath + "Resources/addon/OCRPro.zip", OnSuccess, OnFailure);

Recognize text using OCR

Using the JS OCR recognition API to extract text from scanned images is as simple as inserting the code below.

DWObject.Addon.OCRPro.Recognize(0, GetOCRProInfo, GetErrorInfo); // 0 is the index of the image
DWObject.Addon.OCRPro.Recognize(0, GetOCRProInfo, GetErrorInfo); // 0 is the index of the image
DWObject.Addon.OCRPro.Recognize(0, GetOCRProInfo, GetErrorInfo) ' 0 is the index of the image
VB   C#

Reading Cropped Regions of Images

Both sets of software offer solutions for cropping images for OCR.

Reading cropped regions with IronOCR

Iron's branch of Tesseract OCR is adept at reading specific regions of images, as shown in the following code sample.

We can make use of System.Drawing.Rectangle that is used to describe the exact region of an image to be read in pixels.

When dealing with a standardized form that is filled out, and only a portion of the content changes from case to case, this can be really handy.

Scanning a Section of a Page: We can make use of System.Drawing.Rectangle to designate a region in which we shall read a document. Pixels are always the unit of measurement.

We shall find that this improves speed while also avoiding reading needless text. In this example, we will read a student's name from a central region of a standardized paper.

C# OCR Scan From Tiff Example
C# OCR Scan From Tiff Example
using IronOcr;
var Ocr = new IronTesseract();
using (var Input = new OcrInput())
{
// a 41% improvement on speed
var ContentArea = new System.Drawing.Rectangle() { X = 215, Y = 1250, Height = 280, Width = 1335 };
Input.AddImage("img/ComSci.png", ContentArea);
var Result = Ocr.Read(Input);
Console.WriteLine(Result.Text);
}
using IronOcr;
var Ocr = new IronTesseract();
using (var Input = new OcrInput())
{
// a 41% improvement on speed
var ContentArea = new System.Drawing.Rectangle() { X = 215, Y = 1250, Height = 280, Width = 1335 };
Input.AddImage("img/ComSci.png", ContentArea);
var Result = Ocr.Read(Input);
Console.WriteLine(Result.Text);
}
Imports IronOcr
Private Ocr = New IronTesseract()
Using Input = New OcrInput()
' a 41% improvement on speed
Dim ContentArea = New System.Drawing.Rectangle() With {
	.X = 215,
	.Y = 1250,
	.Height = 280,
	.Width = 1335
}
Input.AddImage("img/ComSci.png", ContentArea)
Dim Result = Ocr.Read(Input)
Console.WriteLine(Result.Text)
End Using
VB   C#

This results in a 41 percent boost in speed, while also allowing us to be more specific. This is extremely valuable for .NET OCR applications involving documents that are comparable and consistent, including invoices, receipts, checks, forms, expense claims, and so on.

When reading PDFs, ContentAreas (OCR cropping) is also supported.

Reading cropped regions with Dynamsoft OCR

To begin, launch Visual Studio and build a new C# Windows Forms Application, or open an existing one.

We will need to include DynamicDotNetTWAIN.dll, DynamicOCR.dll, and the appropriate language package. To do so, navigate to Tools -> Choose Toolbox Items, then to the.NET Framework Components tab, click the Browse... button, and locate DynamicDotNetTWAIN.dll in "..Program Files (x86)DynamsoftDynamic.NET TWAIN 4.3 TrialBinv4.0" or v2.0 (depends on the .NET Framework version you are using). Click the OK button. The DynamicDotNetTwain component will then appear in the Toolbox dialog (under the View menu), as illustrated in the accompanying image.

Add Dynamic .NET TWAIN .NET Component

Right-click the project file in Solution Explorer and select Add-> Existing Item... Then, in the file type filter's drop-down list, select All Files. Navigate to  “..\Program Files (x86)\Dynamsoft\Dynamic .NET TWAIN 4.3 Trial\Bin\OCRResources” to add items to the project folder. The .NET TWAIN component can then be dragged and dropped onto the form.

This is the code for clicking the LoadImage button:

private void button1_Click(object sender, EventArgs e) { OpenFileDialog filedlg = new OpenFileDialog(); if (filedlg.ShowDialog() == DialogResult.OK) { dynamicDotNetTwain1.LoadImage(filedlg.FileName);
// choose an image from your local disk and load it into Dynamic .NET TWAIN
} }

We can now attempt to OCR the loaded image and turn it into a searchable text file.

private void dynamicDotNetTwain1_OnImageAreaSelected(short sImageIndex, int left, int top, int right, int bottom) { dynamicDotNetTwain1.OCRTessDataPath = "../../"; // the path of the language package (tessdata)
dynamicDotNetTwain1.OCRLanguage = "eng";
// the language type
dynamicDotNetTwain1.OCRDllPath = "../../";
//the relative path of the OCR DLL file
dynamicDotNetTwain1.OCRResultFormat = Dynamsoft.DotNet.TWAIN.OCR.ResultFormat.Text; byte [] sbytes = dynamicDotNetTwain1.OCR(dynamicDotNetTwain1.CurrentImageIndexInBuffer, left, top, right, bottom);
// OCR the selected area of the image
if (sbytes != null) { SaveFileDialog filedlg = new SaveFileDialog(); filedlg.Filter = "Text File(*.txt) *.txt"; if (filedlg.ShowDialog() == DialogResult.OK) { FileStream fs = File.OpenWrite(filedlg.FileName); fs.Write(sbytes, 0, sbytes.Length);
//save the OCR result as a text file
fs.Close(); } MessageBox.Show("OCR successful"); } else { MessageBox.Show(dynamicDotNetTwain1.ErrorString); } }
private void button1_Click(object sender, EventArgs e) { OpenFileDialog filedlg = new OpenFileDialog(); if (filedlg.ShowDialog() == DialogResult.OK) { dynamicDotNetTwain1.LoadImage(filedlg.FileName);
// choose an image from your local disk and load it into Dynamic .NET TWAIN
} }

We can now attempt to OCR the loaded image and turn it into a searchable text file.

private void dynamicDotNetTwain1_OnImageAreaSelected(short sImageIndex, int left, int top, int right, int bottom) { dynamicDotNetTwain1.OCRTessDataPath = "../../"; // the path of the language package (tessdata)
dynamicDotNetTwain1.OCRLanguage = "eng";
// the language type
dynamicDotNetTwain1.OCRDllPath = "../../";
//the relative path of the OCR DLL file
dynamicDotNetTwain1.OCRResultFormat = Dynamsoft.DotNet.TWAIN.OCR.ResultFormat.Text; byte [] sbytes = dynamicDotNetTwain1.OCR(dynamicDotNetTwain1.CurrentImageIndexInBuffer, left, top, right, bottom);
// OCR the selected area of the image
if (sbytes != null) { SaveFileDialog filedlg = new SaveFileDialog(); filedlg.Filter = "Text File(*.txt) *.txt"; if (filedlg.ShowDialog() == DialogResult.OK) { FileStream fs = File.OpenWrite(filedlg.FileName); fs.Write(sbytes, 0, sbytes.Length);
//save the OCR result as a text file
fs.Close(); } MessageBox.Show("OCR successful"); } else { MessageBox.Show(dynamicDotNetTwain1.ErrorString); } }
Private Sub button1_Click(ByVal sender As Object, ByVal e As EventArgs)
	Dim filedlg As New OpenFileDialog()
	If filedlg.ShowDialog() = DialogResult.OK Then
		dynamicDotNetTwain1.LoadImage(filedlg.FileName)
' choose an image from your local disk and load it into Dynamic .NET TWAIN
	End If
End Sub

We can now attempt [to] OCR the loaded image [and] turn it into a searchable text file.private Sub dynamicDotNetTwain1_OnImageAreaSelected(ByVal sImageIndex As Short, ByVal left As Integer, ByVal top As Integer, ByVal right As Integer, ByVal bottom As Integer)
	dynamicDotNetTwain1.OCRTessDataPath = "../../" ' the path of the language package (tessdata)
dynamicDotNetTwain1.OCRLanguage = "eng"
' the language type
dynamicDotNetTwain1.OCRDllPath = "../../"
'the relative path of the OCR DLL file
dynamicDotNetTwain1.OCRResultFormat = Dynamsoft.DotNet.TWAIN.OCR.ResultFormat.Text
Dim sbytes() As Byte = dynamicDotNetTwain1.OCR(dynamicDotNetTwain1.CurrentImageIndexInBuffer, left, top, right, bottom)
' OCR the selected area of the image
If sbytes IsNot Nothing Then
	Dim filedlg As New SaveFileDialog()
	filedlg.Filter = "Text File(*.txt) *.txt"
	If filedlg.ShowDialog() = DialogResult.OK Then
		Dim fs As FileStream = File.OpenWrite(filedlg.FileName)
		fs.Write(sbytes, 0, sbytes.Length)
'save the OCR result as a text file
fs.Close()
	End If
	MessageBox.Show("OCR successful")
Else
	MessageBox.Show(dynamicDotNetTwain1.ErrorString)
End If
End Sub
VB   C#

This is how the application looks.

Demo App of Zone OCR using Dynamic .NET TWAIN OCR SDK

Image Performance Tuning

The quality of the input image is the most crucial determinant in the speed of an OCR task. The lower the background noise and the higher the dpi, with a great goal value of around 200 dpi, the faster and more accurate the OCR output.

Image Processing Techniques for Dynamsoft OCR

We need to use OCR in a variety of situations, such as scanning a credit card number with our phone or extracting text from paper documents. OCR capabilities are included in Dynamsoft Label Recognition (DLR) and Dynamic Web TWAIN (DWT).

Although they can do an excellent job in general, we can improve the results by using various image processing techniques.

Lighten/remove shadows

Poor illumination may have an impact on the OCR result. To improve the outcome, we can whiten photos or eliminate shadows from images.

Invert

Because the OCR engine is often trained on text in dark colors, text in light colors can be harder to discover and recognize.

Light text

It will be easier to recognize if we invert its color

Light text inverted

To perform the inversion, we can use the GrayscaleTransformationModes parameter in DLR.

Here are the JSON settings:

"GrayscaleTransformationModes": [
    {
        "Mode": "DLR_GTM_INVERTED"
    }
]
"GrayscaleTransformationModes": [
    {
        "Mode": "DLR_GTM_INVERTED"
    }
]
'INSTANT VB TODO TASK: The following line uses invalid syntax:
'"GrayscaleTransformationModes": [{ "Mode": "DLR_GTM_INVERTED" }]
VB   C#

DLR .net’s reading result:

Light text result

Rescale

If the letter height is too low, the OCR engine may not produce a good result. In general, the image should have a DPI of at least 300.

There is a ScaleUpModes parameter in DLR 1.1 that allows you to scale up letters. We may, of course, scale the image ourselves.

Reading the image directly yields the incorrect result:

1x image

After scaling up the image x2, the result is correct:

2x image

Deskew

It is fine if the text is a little distorted. However, if it is overly skewed, the outcome will be adversely altered. To improve the outcome, we need to crop the image.

To accomplish this, we can use the Hough Line Transform in OpenCV.

Skewed image

Here is the code to deskew the image above.

#coding=utf-8
import numpy as np
import cv2
import math
from PIL import Image

def deskew():
src = cv2.imread("neg.jpg",cv2.IMREAD_COLOR)
gray = cv2.cvtColor(src, cv2.COLOR_BGR2GRAY)
kernel = np.ones((5,5),np.uint8)
erode_Img = cv2.erode(gray,kernel)
eroDil = cv2.dilate(erode_Img,kernel) # erode and dilate
showAndWaitKey("eroDil",eroDil)

    canny = cv2.Canny(eroDil,50,150) # edge detection
    showAndWaitKey("canny",canny)

    lines = cv2.HoughLinesP(canny, 0.8, np.pi / 180, 90,minLineLength=100,maxLineGap=10) # Hough Lines Transform
    drawing = np.zeros(src.shape [:], dtype=np.uint8)

    maxY=0
    degree_of_bottomline=0
    index=0
    for line in lines:        
        x1, y1, x2, y2 = line [0]            
        cv2.line(drawing, (x1, y1), (x2, y2), (0, 255, 0), 1, lineType=cv2.LINE_AA)
        k = float(y1-y2)/(x1-x2)
        degree = np.degrees(math.atan(k))
        if index==0:
            maxY=y1
            degree_of_bottomline=degree # take the degree of the line at the bottom
        else:        
            if y1>maxY:
                maxY=y1
                degree_of_bottomline=degree
        index=index+1
    showAndWaitKey("houghP",drawing)

    img=Image.fromarray(src)
    rotateImg = img.rotate(degree_of_bottomline)
    rotateImg_cv = np.array(rotateImg) 
    cv2.imshow("rotateImg",rotateImg_cv)
    cv2.imwrite("deskewed.jpg",rotateImg_cv)
    cv2.waitKey()

def showAndWaitKey(winName,img):
cv2.imshow(winName,img)
cv2.waitKey()

if __name__ == "__main__":              
deskew()
#coding=utf-8
import numpy as np
import cv2
import math
from PIL import Image

def deskew():
src = cv2.imread("neg.jpg",cv2.IMREAD_COLOR)
gray = cv2.cvtColor(src, cv2.COLOR_BGR2GRAY)
kernel = np.ones((5,5),np.uint8)
erode_Img = cv2.erode(gray,kernel)
eroDil = cv2.dilate(erode_Img,kernel) # erode and dilate
showAndWaitKey("eroDil",eroDil)

    canny = cv2.Canny(eroDil,50,150) # edge detection
    showAndWaitKey("canny",canny)

    lines = cv2.HoughLinesP(canny, 0.8, np.pi / 180, 90,minLineLength=100,maxLineGap=10) # Hough Lines Transform
    drawing = np.zeros(src.shape [:], dtype=np.uint8)

    maxY=0
    degree_of_bottomline=0
    index=0
    for line in lines:        
        x1, y1, x2, y2 = line [0]            
        cv2.line(drawing, (x1, y1), (x2, y2), (0, 255, 0), 1, lineType=cv2.LINE_AA)
        k = float(y1-y2)/(x1-x2)
        degree = np.degrees(math.atan(k))
        if index==0:
            maxY=y1
            degree_of_bottomline=degree # take the degree of the line at the bottom
        else:        
            if y1>maxY:
                maxY=y1
                degree_of_bottomline=degree
        index=index+1
    showAndWaitKey("houghP",drawing)

    img=Image.fromarray(src)
    rotateImg = img.rotate(degree_of_bottomline)
    rotateImg_cv = np.array(rotateImg) 
    cv2.imshow("rotateImg",rotateImg_cv)
    cv2.imwrite("deskewed.jpg",rotateImg_cv)
    cv2.waitKey()

def showAndWaitKey(winName,img):
cv2.imshow(winName,img)
cv2.waitKey()

if __name__ == "__main__":              
deskew()
#coding=utf-8
'INSTANT VB TODO TASK: The following line uses invalid syntax:
'import TryCast(numpy, np) import cv2 import math from PIL import Image def deskew(): src = cv2.imread("neg.jpg",cv2.IMREAD_COLOR) gray = cv2.cvtColor(src, cv2.COLOR_BGR2GRAY) kernel = np.ones((5,5),np.uint8) erode_Img = cv2.erode(gray,kernel) eroDil = cv2.dilate(erode_Img,kernel) # erode @and dilate showAndWaitKey("eroDil",eroDil) canny = cv2.Canny(eroDil,50,150) # edge detection showAndWaitKey("canny",canny) lines = cv2.HoughLinesP(canny, 0.8, np.pi / 180, 90,minLineLength=100,maxLineGap=10) # Hough Lines Transform drawing = np.zeros(src.shape [:], dtype=np.uint8) maxY=0 degree_of_bottomline=0 index=0 for line in lines: x1, y1, x2, y2 = line [0] cv2.line(drawing, (x1, y1), (x2, y2), (0, 255, 0), 1, lineType=cv2.LINE_AA) k = float(y1-y2)/(x1-x2) degree = np.degrees(math.atan(k)) if index==0: maxY=y1 degree_of_bottomline=degree # take the degree @of the line at the bottom else: if y1> maxY: maxY=y1 degree_of_bottomline=degree index=index+1 showAndWaitKey("houghP",drawing) img=Image.fromarray(src) rotateImg = img.rotate(degree_of_bottomline) rotateImg_cv = np.array(rotateImg) cv2.imshow("rotateImg",rotateImg_cv) cv2.imwrite("deskewed.jpg",rotateImg_cv) cv2.waitKey() def showAndWaitKey(winName,img): cv2.imshow(winName,img) cv2.waitKey() if __name__ == "__main__": deskew()
VB   C#

Lines detected:

Lines detected

Deskewed:

Deskewed image

Image Processing Techniques for IronOCR

The quality of the input image is not important here because IronOCR excels at repairing defective documents (though this is time-consuming and will cause your OCR jobs to use more CPU cycles).

Choosing input image formats with less digital noise, such as TIFF or PNG, can also result in speedier outcomes than lossy image formats, such as JPEG.

The image filters listed below can significantly enhance performance:

OcrInput.Rotate (double degrees) — Rotates images clockwise by a specified number of degrees. Negative integers are used for anti-clockwise rotation.

OcrInput.Binarize() — This image filter makes every pixel either black or white, with no in-between. It may improve OCR performance in circumstances where the text-to-background contrast is very low.

OcrInput.ToGrayScale() — This image filter converts every pixel to a grayscale shade. It is unlikely to improve OCR accuracy, but it may increase speed.

OcrInput.Contrast() — Automatically increases contrast. In low-contrast scans, this filter frequently improves OCR speed and accuracy.

OcrInput.DeNoise() — This filter should be used only when noise is expected.

OcrInput.Invert() — Reverses all colors. For example, white becomes black: black becomes white.

OcrInput.Dilate() — Advanced morphology. Dilation is the process of adding pixels to the edges of objects in an image. (Erode's inverse)

OcrInput. Erode() — an advanced morphology function. Erosion is the process of removing pixels from the edges of objects. (Dilate's inverse)

OcrInput. Deskew() — Rotates an image so that it is orthogonal and the right way up. Because Tesseract tolerance for skewed scans can be as low as 5 degrees, this is quite useful for OCR.

DeepCleanBackgroundNoise() — Removes a lot of background noise. Only use this filter if you know there is a lot of background noise in the document because it can reduce OCR accuracy on clear documents and is quite CPU intensive.

OcrInput.EnhanceResolution — Improves the resolution of low-resolution photos. Because of OcrInput, this filter is rarely used. OcrInput and will detect and resolve low resolution automatically.

We may want to use Iron Tesseract to speed up OCR on higher-quality scans.

If we're looking for speed, we might start here and subsequently turn features back on until the proper balance is struck.

using IronOcr;
var Ocr = new IronTesseract();
// Configure for speed
Ocr.Configuration.BlackListCharacters = "~`$#^*_}{][\\";
Ocr.Configuration.PageSegmentationMode = TesseractPageSegmentationMode.Auto;
Ocr.Configuration.TesseractVersion = TesseractVersion.Tesseract5;
Ocr.Configuration.EngineMode = TesseractEngineMode.LstmOnly;
Ocr.Language = OcrLanguage.EnglishFast;
using (var Input = new OcrInput(@"img\Potter.tiff"))
{
    var Result = Ocr.Read(Input);
    Console.WriteLine(Result.Text);
}
using IronOcr;
var Ocr = new IronTesseract();
// Configure for speed
Ocr.Configuration.BlackListCharacters = "~`$#^*_}{][\\";
Ocr.Configuration.PageSegmentationMode = TesseractPageSegmentationMode.Auto;
Ocr.Configuration.TesseractVersion = TesseractVersion.Tesseract5;
Ocr.Configuration.EngineMode = TesseractEngineMode.LstmOnly;
Ocr.Language = OcrLanguage.EnglishFast;
using (var Input = new OcrInput(@"img\Potter.tiff"))
{
    var Result = Ocr.Read(Input);
    Console.WriteLine(Result.Text);
}
Imports IronOcr
Private Ocr = New IronTesseract()
' Configure for speed
Ocr.Configuration.BlackListCharacters = "~`$#^*_}{][\"
Ocr.Configuration.PageSegmentationMode = TesseractPageSegmentationMode.Auto
Ocr.Configuration.TesseractVersion = TesseractVersion.Tesseract5
Ocr.Configuration.EngineMode = TesseractEngineMode.LstmOnly
Ocr.Language = OcrLanguage.EnglishFast
Using Input = New OcrInput("img\Potter.tiff")
	Dim Result = Ocr.Read(Input)
	Console.WriteLine(Result.Text)
End Using
VB   C#

This result is 99.8% accurate compared to the baseline of 100% — but 35% faster.

Licensing and Pricing

Dynamsoft Licensing and Pricing

Per year license. All rates include one year of maintenance, which includes free software upgrades and premium support.

Dynamsoft offers two types of licenses:

Per client device license

The "One Client Device License" provides access to a same-origin Application (same protocol, same host, and same port) to use the software's features from a single client device. An inactive client device is one that has not accessed any software capability for 90 days in a row. An inactive client device's license seat will be instantly freed and made available for usage by any other active client device. When you reach the maximum number of license seats allowed, Dynamsoft will give you an extra 10% of your client device allowance for emergency use. Once the additional client device allowance has been depleted, no new client devices can access and use the software until there are available license seats again. Please keep in mind that exceeding your client device allowance has no effect on any client devices that have already been licensed.

Per-server license

To deploy the application to a single server, a "One Server License" is required. Servers refer to both physical and virtual servers and include, but are not limited to, production servers, failover servers, development servers that are also used for testing, quality assurance servers, testing servers, and staging servers, all of which require a license. Additional licenses are not required for continuous integration servers (build servers) or localhost development servers. The per-server license is only valid for on-premises server installations, and not for cloud deployments.

Pricing for Dynamsoft OCR starts at USD 1,249/year.

IronOCR Licensing and Pricing

As developers, we all want to accomplish our projects with the least amount of money and resources possible — budgeting is critical. Examine the chart to determine which license is best suited to your requirements and budget.

IronOCR provides licenses with a customizable number of developers, projects, and locations, allowing you to fulfill the needs of your project while only paying for the coverage you require.

IronOCR licensing keys enable you to publish your product without a watermark.

Licenses start from $749 and include one year of support and upgrades.

You can also use a trial license key to try IronOCR for free.

Conclusion

Tesseract OCR on Mac, Windows, Linux, Azure OCR, and Docker are all available with IronOCR for C#. .NET Framework 4.0 or above is required,  .NET Standard 2.0+, .NET Core 2.0+, .NET 5, Mono for macOS and Linux, and Xamarin for macOS are all examples of cross-platform development. IronOCR also uses the latest Tesseract 5 engine to read text, barcodes, and QR codes from all major image and PDF formats. In minutes, this library adds OCR functionality to your desktop, console, or web apps! The OCR can also read PDFs and multi-page TIFFs, and it can be saved as a searchable PDF document or XHTML in any OCR Scan. Plain text, barcode data, and an OCR result class encompassing paragraphs, lines, words, and characters are among its data output choices. It is available in 125 languages, including Arabic, Chinese, English, Finnish, French, German, Hebrew, Italian, Japanese, Korean, Portuguese, Russian, and Spanish, but keep in mind that bespoke language packs can also be generated.

The Dynamic .NET TWAIN OCR add-on is a quick and reliable .NET component for Optical Character Recognition that you can use in WinForms and WPF applications written in C# or VB .NET. You can scan documents or capture photos from webcams using Dynamic .NET TWAIN's image capture module, and then conduct OCR on the images to convert the text in the images to text, searchable PDF files, or strings. Multiple Asian languages, as well as Arabic, are offered in addition to English.

IronOCR offers better licensing than Dynamsoft OCR; IronOcr starts at $749 with one year free, while Dynamsoft starts at $1249 with a free trial. IronOCR also offers licenses for multiple users, while with Dynamsoft, you only get one license per user.

While both sets of software aim at offering the best performance in terms of OCR readings of barcodes, image to text, and image to text, IronOCR stands out in that it shines its light even on images that are in pretty bad shape. It automatically puts in place its sophisticated tuning methods to give you the best OCR results. IronOCR also makes use of Tesseract to give you optimal results with little or no errors.

Iron Software is also offering its customers and users the option to grab its entire suite of software in just two clicks. This means that for the price of two of the components in the Iron Software suite, you can currently get all five components and uninterrupted support.

< 上一頁
IronOCR 與 Tesseract.NET 的比較
下一個 >
IronOCR與Abbyy Finereader的比較

準備開始了嗎? 版本: 2024.11 剛剛發布

免費 NuGet 下載 總下載次數: 2,698,613 查看許可證 >