在实际环境中测试
在生产中测试无水印。
随时随地为您服务。
在当今的数字时代,光学字符识别技术(光学字符识别)目前,.NET、Java、Python 或 Node js 技术已成为各行各业不可或缺的技术,可将图像和扫描文件转换为可编辑和可搜索的文本。
在众多 OCR 软件中,如 Google Cloud Vision(云视觉应用程序接口)Adobe Acrobat Pro DC、ABBYY Finereader 等等,Windows OCR 引擎与 Tesseract 的对比,以及 Adobe Acrobat Pro DC 与 ABBYY Finereader 的对比。IronOCR在竞争中,".NET"、"Java"、"Python "或 "Node js "脱颖而出,各自提供了独特的功能和性能来帮助文档分析。
本文旨在对这三种 OCR 引擎进行全面的比较分析,评估它们的准确性、性能和易于集成性。
OCR 引擎是一种软件工具,用于识别和提取图像、PDF 和其他扫描文档中的纯文本。 它们采用复杂的算法和机器学习技术来准确识别字符并将其转换为机器可读的文本文件。Windows OCR Engine、Tesseract 和 IronOCR 代表了三种广泛使用的 OCR 解决方案,它们各有优势和应用。
"(《世界人权宣言》)Windows OCR 引擎该工具已集成到 Windows 操作系统中,为从输入图像和扫描文档中提取文本提供了方便且用户友好的解决方案。 利用先进的图像处理技术,它可以准确识别各种语言和字体风格的文本。 Windows OCR 引擎可通过 Windows Runtime API 访问,从而实现与 Windows 应用程序的无缝集成,并具备命令行工具的功能。
using System;
using System.IO;
using System.Text;
using System.Threading.Tasks;
class Program
{
static async Task Main(string [] args)
{
// Provide the path to the image file
string imagePath = "sample.png";
try
{
// Instantiate the program class
Program program = new Program();
// Call the ExtractText method to extract text from the image
string extractedText = await program.ExtractText(imagePath);
// Display the extracted text
Console.WriteLine("Extracted Text:");
Console.WriteLine(extractedText);
}
catch (Exception ex)
{
Console.WriteLine("An error occurred: " + ex.Message);
}
}
public async Task<string> ExtractText(string image)
{
// Initialize StringBuilder to store extracted text
StringBuilder text = new StringBuilder();
try
{
// Open the image file stream
using (var fileStream = System.IO.File.OpenRead(image))
{
Console.WriteLine("Extracted Text:");
// Create a BitmapDecoder from the image file stream
var bmpDecoder = await Windows.Graphics.Imaging.BitmapDecoder.CreateAsync(fileStream.AsRandomAccessStream());
// Get the software bitmap from the decoder
var softwareBmp = await bmpDecoder.GetSoftwareBitmapAsync();
// Create an OCR engine from user profile languages
var ocrEngine = Windows.Media.Ocr.OcrEngine.TryCreateFromUserProfileLanguages();
// Recognize text from the software bitmap
var ocrResult = await ocrEngine.RecognizeAsync(softwareBmp);
// Append each line of recognized text to the StringBuilder
foreach (var line in ocrResult.Lines)
{
text.AppendLine(line.Text);
}
}
}
catch (Exception ex)
{
throw ex; // Propagate the exception
}
// Return the extracted text
return text.ToString();
}
}
using System;
using System.IO;
using System.Text;
using System.Threading.Tasks;
class Program
{
static async Task Main(string [] args)
{
// Provide the path to the image file
string imagePath = "sample.png";
try
{
// Instantiate the program class
Program program = new Program();
// Call the ExtractText method to extract text from the image
string extractedText = await program.ExtractText(imagePath);
// Display the extracted text
Console.WriteLine("Extracted Text:");
Console.WriteLine(extractedText);
}
catch (Exception ex)
{
Console.WriteLine("An error occurred: " + ex.Message);
}
}
public async Task<string> ExtractText(string image)
{
// Initialize StringBuilder to store extracted text
StringBuilder text = new StringBuilder();
try
{
// Open the image file stream
using (var fileStream = System.IO.File.OpenRead(image))
{
Console.WriteLine("Extracted Text:");
// Create a BitmapDecoder from the image file stream
var bmpDecoder = await Windows.Graphics.Imaging.BitmapDecoder.CreateAsync(fileStream.AsRandomAccessStream());
// Get the software bitmap from the decoder
var softwareBmp = await bmpDecoder.GetSoftwareBitmapAsync();
// Create an OCR engine from user profile languages
var ocrEngine = Windows.Media.Ocr.OcrEngine.TryCreateFromUserProfileLanguages();
// Recognize text from the software bitmap
var ocrResult = await ocrEngine.RecognizeAsync(softwareBmp);
// Append each line of recognized text to the StringBuilder
foreach (var line in ocrResult.Lines)
{
text.AppendLine(line.Text);
}
}
}
catch (Exception ex)
{
throw ex; // Propagate the exception
}
// Return the extracted text
return text.ToString();
}
}
Imports System
Imports System.IO
Imports System.Text
Imports System.Threading.Tasks
Friend Class Program
Shared Async Function Main(ByVal args() As String) As Task
' Provide the path to the image file
Dim imagePath As String = "sample.png"
Try
' Instantiate the program class
Dim program As New Program()
' Call the ExtractText method to extract text from the image
Dim extractedText As String = Await program.ExtractText(imagePath)
' Display the extracted text
Console.WriteLine("Extracted Text:")
Console.WriteLine(extractedText)
Catch ex As Exception
Console.WriteLine("An error occurred: " & ex.Message)
End Try
End Function
Public Async Function ExtractText(ByVal image As String) As Task(Of String)
' Initialize StringBuilder to store extracted text
Dim text As New StringBuilder()
Try
' Open the image file stream
Using fileStream = System.IO.File.OpenRead(image)
Console.WriteLine("Extracted Text:")
' Create a BitmapDecoder from the image file stream
Dim bmpDecoder = Await Windows.Graphics.Imaging.BitmapDecoder.CreateAsync(fileStream.AsRandomAccessStream())
' Get the software bitmap from the decoder
Dim softwareBmp = Await bmpDecoder.GetSoftwareBitmapAsync()
' Create an OCR engine from user profile languages
Dim ocrEngine = Windows.Media.Ocr.OcrEngine.TryCreateFromUserProfileLanguages()
' Recognize text from the software bitmap
Dim ocrResult = Await ocrEngine.RecognizeAsync(softwareBmp)
' Append each line of recognized text to the StringBuilder
For Each line In ocrResult.Lines
text.AppendLine(line.Text)
Next line
End Using
Catch ex As Exception
Throw ex ' Propagate the exception
End Try
' Return the extracted text
Return text.ToString()
End Function
End Class
魔方谷歌开发的开源 OCR 引擎".NET "因其准确性和多功能性而广受欢迎。 它支持 100 多种语言,可以处理各种图像格式,包括 TIFF、JPEG 和 PNG。 Tesseract OCR 引擎采用深度学习算法和神经网络来实现高水平的文本识别准确性,因此适用于各种应用。
using Patagames.Ocr;
using (var api = OcrApi.Create())
{
api.Init(Patagames.Ocr.Enums.Languages.English);
string plainText = api.GetTextFromImage(@"C:\Users\buttw\source\repos\ironqr\ironqr\bin\Debug\net5.0\Iron.png");
Console.WriteLine(plainText);
}
using Patagames.Ocr;
using (var api = OcrApi.Create())
{
api.Init(Patagames.Ocr.Enums.Languages.English);
string plainText = api.GetTextFromImage(@"C:\Users\buttw\source\repos\ironqr\ironqr\bin\Debug\net5.0\Iron.png");
Console.WriteLine(plainText);
}
Imports Patagames.Ocr
Using api = OcrApi.Create()
api.Init(Patagames.Ocr.Enums.Languages.English)
Dim plainText As String = api.GetTextFromImage("C:\Users\buttw\source\repos\ironqr\ironqr\bin\Debug\net5.0\Iron.png")
Console.WriteLine(plainText)
End Using
IronOCRIron Software 开发的强大 OCR 引擎--.NET、Java、Python 或 Node js,以其卓越的准确性、易用性和多语言支持而独树一帜。 它提供内部 OCR 功能,支持超过 127 种语言,适合全球应用。 IronOCR 利用先进的机器学习算法和云视觉技术,即使在具有挑战性的场景中,也能提供精确的文本识别结果。
在进入编码示例之前,让我们来看看如何使用 NuGet 包管理器安装 IronOCR。
在 Visual Studio 中进入 "工具 "菜单,选择 "NuGet 包管理器"。
将出现一个新窗口,转到 "浏览 "选项卡,然后点击在搜索栏中输入 "IronOcr"。
using IronOcr;
var ocr = new IronTesseract();
ocr.Language = OcrLanguage.English;
var result = ocr.Read("C:\\Users\\buttw\\source\\repos\\ironqr\\ironqr\\bin\\Debug\\net5.0\\Iron.png");
Console.WriteLine(result.Text);
using IronOcr;
var ocr = new IronTesseract();
ocr.Language = OcrLanguage.English;
var result = ocr.Read("C:\\Users\\buttw\\source\\repos\\ironqr\\ironqr\\bin\\Debug\\net5.0\\Iron.png");
Console.WriteLine(result.Text);
Imports IronOcr
Private ocr = New IronTesseract()
ocr.Language = OcrLanguage.English
Dim result = ocr.Read("C:\Users\buttw\source\repos\ironqr\ironqr\bin\Debug\net5.0\Iron.png")
Console.WriteLine(result.Text)
IronOCR:支持超过 127 种语言,适合全球应用。
6.结论
总之,Windows OCR Engine 和 Tesseract 是文本识别的热门选择、IronOCR在此背景下,OCR 技术成为最准确、功能最全面的 OCR 引擎。其行业领先的准确性、广泛的语言支持和简单的集成使其成为企业和开发人员寻求可靠 OCR 功能的最佳解决方案。 通过利用 IronOCR,企业可以简化文档处理工作流程,提高数据提取的准确性,并从扫描文档和图像中获得有价值的见解。