从.NET OCR示例开始

自动 OCR

using IronOcr;

string imageText = new IronTesseract().Read(@"images\image.png").Text;

Imports IronOcr

Private imageText As String = (New IronTesseract()).Read("images\image.png").Text

Install-Package IronOcr

IronOCR 凭借其自动检测并从扫描质量不佳的图像和 PDF 文档中识别文本的能力而独树一帜。 IronTesseract 类提供了最简单的 API。

尝试其他代码示例，以获得对 C# OCR 操作的更精细控制。

IronOCR 提供了目前已知最先进的 Tesseract 构建版本，适用于任何平台，具有更快的速度、更高的准确性，并提供原生 DLL 和 API。

支持 Tesseract 3、Tesseract 4 和 Tesseract 5 版本，适用于 .NET Framework、Standard、Core、Xamarin 和 Mono。

如何在VB.NET中进行OCR

安装 VB.NET 库以对图像或 PDF 文件进行 OCR 识别
实例化IronTesseract以使用直观的 API
在VB.NET中使用Read方法执行OCR
通过访问Text属性获取 OCR 结果
用一行代码执行步骤 2、3 和 4。

Explore the IronTesseract C# OCR How-To Guide

国际语言

using IronOcr;
using System;

var ocrTesseract = new IronTesseract();

ocrTesseract.Language = OcrLanguage.Arabic;

using (var ocrInput = new OcrInput())
{
    ocrInput.LoadImage(@"images\arabic.gif");
    var ocrResult = ocrTesseract.Read(ocrInput);
    Console.WriteLine(ocrResult.Text);
}

// Example with a Custom Trained Font Being used:

var ocrTesseractCustomerLang = new IronTesseract();
ocrTesseractCustomerLang.UseCustomTesseractLanguageFile("custom_tesseract_files/custom.traineddata");
ocrTesseractCustomerLang.AddSecondaryLanguage(OcrLanguage.EnglishBest);

using (var ocrInput = new OcrInput())
{
    ocrInput.LoadPdf(@"images\mixed-lang.pdf");
    var ocrResult = ocrTesseractCustomerLang.Read(ocrInput);
    Console.WriteLine(ocrResult.Text);
}

Imports IronOcr
Imports System

Private ocrTesseract = New IronTesseract()

ocrTesseract.Language = OcrLanguage.Arabic

Using ocrInput As New OcrInput()
	ocrInput.LoadImage("images\arabic.gif")
	Dim ocrResult = ocrTesseract.Read(ocrInput)
	Console.WriteLine(ocrResult.Text)
End Using

' Example with a Custom Trained Font Being used:

Dim ocrTesseractCustomerLang = New IronTesseract()
ocrTesseractCustomerLang.UseCustomTesseractLanguageFile("custom_tesseract_files/custom.traineddata")
ocrTesseractCustomerLang.AddSecondaryLanguage(OcrLanguage.EnglishBest)

Using ocrInput As New OcrInput()
	ocrInput.LoadPdf("images\mixed-lang.pdf")
	Dim ocrResult = ocrTesseractCustomerLang.Read(ocrInput)
	Console.WriteLine(ocrResult.Text)
End Using

Install-Package IronOcr

IronOCR 语言支持

IronOCR 支持 125 种国际语言。除了默认安装的英语之外，还可以通过 NuGet 将其他语言包添加到您的 .NET 项目中，或者从我们的语言页面下载其他语言包。

大多数语言均提供 Standard（推荐）和 Best 质量等级的译文。 Best 质量选项可能提供更准确的结果，但处理时间也会更长。

使用 IronOCR 探索多种语言的 OCR 技术。

结果对象

using IronOcr;
using IronSoftware.Drawing;

// We can delve deep into OCR results as an object model of
// Pages, Barcodes, Paragraphs, Lines, Words and Characters
// This allows us to explore, export and draw OCR content using other APIs/
var ocrTesseract = new IronTesseract();

ocrTesseract.Configuration.ReadBarCodes = true;

using var ocrInput = new OcrInput();
var pages = new int[] { 1, 2 };
ocrInput.LoadImageFrames("example.tiff", pages);

OcrResult ocrResult = ocrTesseract.Read(ocrInput);
foreach (var page in ocrResult.Pages)
{
    // Page object
    int PageNumber = page.PageNumber;
    string PageText = page.Text;
    int PageWordCount = page.WordCount;
    // null if we dont set Ocr.Configuration.ReadBarCodes = true;
    OcrResult.Barcode[] Barcodes = page.Barcodes;
    AnyBitmap PageImage = page.ToBitmap(ocrInput);
    double PageWidth = page.Width;
    double PageHeight = page.Height;
    double PageRotation = page.Rotation; // angular correction in degrees from OcrInput.Deskew()

    foreach (var paragraph in page.Paragraphs)
    {
        // Pages -> Paragraphs
        int ParagraphNumber = paragraph.ParagraphNumber;
        string ParagraphText = paragraph.Text;
        AnyBitmap ParagraphImage = paragraph.ToBitmap(ocrInput);
        int ParagraphX_location = paragraph.X;
        int ParagraphY_location = paragraph.Y;
        int ParagraphWidth = paragraph.Width;
        int ParagraphHeight = paragraph.Height;
        double ParagraphOcrAccuracy = paragraph.Confidence;
        OcrResult.TextFlow paragrapthText_direction = paragraph.TextDirection;
        foreach (var line in paragraph.Lines)
        {
            // Pages -> Paragraphs -> Lines
            int LineNumber = line.LineNumber;
            string LineText = line.Text;
            AnyBitmap LineImage = line.ToBitmap(ocrInput);
            int LineX_location = line.X;
            int LineY_location = line.Y;
            int LineWidth = line.Width;
            int LineHeight = line.Height;
            double LineOcrAccuracy = line.Confidence;
            double LineSkew = line.BaselineAngle;
            double LineOffset = line.BaselineOffset;
            foreach (var word in line.Words)
            {
                // Pages -> Paragraphs -> Lines -> Words
                int WordNumber = word.WordNumber;
                string WordText = word.Text;
                AnyBitmap WordImage = word.ToBitmap(ocrInput);
                int WordX_location = word.X;
                int WordY_location = word.Y;
                int WordWidth = word.Width;
                int WordHeight = word.Height;
                double WordOcrAccuracy = word.Confidence;
                foreach (var character in word.Characters)
                {
                    // Pages -> Paragraphs -> Lines -> Words -> Characters
                    int CharacterNumber = character.CharacterNumber;
                    string CharacterText = character.Text;
                    AnyBitmap CharacterImage = character.ToBitmap(ocrInput);
                    int CharacterX_location = character.X;
                    int CharacterY_location = character.Y;
                    int CharacterWidth = character.Width;
                    int CharacterHeight = character.Height;
                    double CharacterOcrAccuracy = character.Confidence;
                    // Output alternative symbols choices and their probability.
                    // Very useful for spellchecking
                    OcrResult.Choice[] Choices = character.Choices;
                }
            }
        }
    }
}

Imports IronOcr
Imports IronSoftware.Drawing

' We can delve deep into OCR results as an object model of
' Pages, Barcodes, Paragraphs, Lines, Words and Characters
' This allows us to explore, export and draw OCR content using other APIs/
Private ocrTesseract = New IronTesseract()

ocrTesseract.Configuration.ReadBarCodes = True

Dim ocrInput As New OcrInput()
Dim pages = New Integer() { 1, 2 }
ocrInput.LoadImageFrames("example.tiff", pages)

Dim ocrResult As OcrResult = ocrTesseract.Read(ocrInput)
For Each page In ocrResult.Pages
	' Page object
	Dim PageNumber As Integer = page.PageNumber
	Dim PageText As String = page.Text
	Dim PageWordCount As Integer = page.WordCount
	' null if we dont set Ocr.Configuration.ReadBarCodes = true;
	Dim Barcodes() As OcrResult.Barcode = page.Barcodes
	Dim PageImage As AnyBitmap = page.ToBitmap(ocrInput)
	Dim PageWidth As Double = page.Width
	Dim PageHeight As Double = page.Height
	Dim PageRotation As Double = page.Rotation ' angular correction in degrees from OcrInput.Deskew()

	For Each paragraph In page.Paragraphs
		' Pages -> Paragraphs
		Dim ParagraphNumber As Integer = paragraph.ParagraphNumber
		Dim ParagraphText As String = paragraph.Text
		Dim ParagraphImage As AnyBitmap = paragraph.ToBitmap(ocrInput)
		Dim ParagraphX_location As Integer = paragraph.X
		Dim ParagraphY_location As Integer = paragraph.Y
		Dim ParagraphWidth As Integer = paragraph.Width
		Dim ParagraphHeight As Integer = paragraph.Height
		Dim ParagraphOcrAccuracy As Double = paragraph.Confidence
		Dim paragrapthText_direction As OcrResult.TextFlow = paragraph.TextDirection
		For Each line In paragraph.Lines
			' Pages -> Paragraphs -> Lines
			Dim LineNumber As Integer = line.LineNumber
			Dim LineText As String = line.Text
			Dim LineImage As AnyBitmap = line.ToBitmap(ocrInput)
			Dim LineX_location As Integer = line.X
			Dim LineY_location As Integer = line.Y
			Dim LineWidth As Integer = line.Width
			Dim LineHeight As Integer = line.Height
			Dim LineOcrAccuracy As Double = line.Confidence
			Dim LineSkew As Double = line.BaselineAngle
			Dim LineOffset As Double = line.BaselineOffset
			For Each word In line.Words
				' Pages -> Paragraphs -> Lines -> Words
				Dim WordNumber As Integer = word.WordNumber
				Dim WordText As String = word.Text
				Dim WordImage As AnyBitmap = word.ToBitmap(ocrInput)
				Dim WordX_location As Integer = word.X
				Dim WordY_location As Integer = word.Y
				Dim WordWidth As Integer = word.Width
				Dim WordHeight As Integer = word.Height
				Dim WordOcrAccuracy As Double = word.Confidence
				For Each character In word.Characters
					' Pages -> Paragraphs -> Lines -> Words -> Characters
					Dim CharacterNumber As Integer = character.CharacterNumber
					Dim CharacterText As String = character.Text
					Dim CharacterImage As AnyBitmap = character.ToBitmap(ocrInput)
					Dim CharacterX_location As Integer = character.X
					Dim CharacterY_location As Integer = character.Y
					Dim CharacterWidth As Integer = character.Width
					Dim CharacterHeight As Integer = character.Height
					Dim CharacterOcrAccuracy As Double = character.Confidence
					' Output alternative symbols choices and their probability.
					' Very useful for spellchecking
					Dim Choices() As OcrResult.Choice = character.Choices
				Next character
			Next word
		Next line
	Next paragraph
Next page

Install-Package IronOcr

IronOCR 会针对其扫描的每一页返回一个高级结果对象，使用 Tesseract 5\。这包含位置数据、图像、文本、统计置信度、备用符号选择、字体名、字体大小、装饰、字体权重和位置：

Page
Paragraph
文本行
Word 个人特质
Barcode

探索如何使用 IronOCR 读取 OCR 结果

Human Support related to 在.NET Core中的OCR

直接来自我们开发团队的人工支持

无论是产品、集成还是授权问题，Iron 产品开发团队随时准备回答您所有问题。立即联系并与 Iron 开始对话，以便在您的项目中充分利用我们的库。

提问

Image To Text related to 在.NET Core中的OCR

光学字符识别（OCR）读取引擎 —— 在OCR .NET SDK中实现图像到文本

IronOCR（光学字符识别）库使开发人员在将图像转换为文本时能够快速且高效地获得结果。IronOCR适用于.NET, VB .NET和C#。我们的高端.NET应用程序适用于.NET框架，专为您——开发者——设计，以支持您的项目实现最佳性能。

OCR接收和识别文本文件、条形码、QR内容等。但是，IronOCR还提供了许多方法，允许您将OCR读取和图像中的文本添加到web、windows桌面或控制台.NET项目中，可以支持几乎无限的图像格式和文件，例如JPG、PNG、GIF、TIFF、BMP、JPEG或PDF。

技术揭秘——IronOCR提供完美的结果

尽管从图像输出中识别纯文本、字符、行和段落的结果可能并不显得直接，但您会发现，IronOCR的结果比您最初想象的更为简单。IronOCR扫描图像以校准，采用其噪声去除和过滤器来检查质量和分辨率。它检查其属性，优化OCR引擎，并使用训练过的人工智能网络来识别文本（从图像中），同样准确如人。

即便对于计算机来说，OCR也不是一个简单的过程。然而，IronOCR使得创建可搜索文档的整个过程更加快速和直接，且具有100%的准确性且仅需要最少的代码行数。

适用于 .NET, VB.NET, C#

阅读教程

Support For Languages related to 在.NET Core中的OCR

支持多种国际语言

软件不受地域界限的限制——企业需跨国运作并依赖多种语言来实现其结果。同样的，仅在单一语言支持文档识别的光学字符识别（OCR）工具在任何方面都是失败的！

多语言OCR支持对您意味着什么？

通过提供多个OCR功能的多语言OCR库，您可以从扫描的PDF或扫描的图像中创建可搜索的PDF文档，支持多种语言（从法语到中文）。具有动态、可搜索词的PDF文档，可以使您、您的客户或组织在没有限制的情况下使用和重用。

关注您、贵企业及您的OCR需求，无论内置还是按需，IronOCR库提供广泛的语言支持。您下一步的.NET项目可以不再担心语言兼容性问题！

无论是阿拉伯语、西班牙语、法语、德语、希伯来语、意大利语、日语、简体中文、繁体中文（普通话）、丹麦语、英语、芬兰语、葡萄牙语、俄语、或瑞典语，只需您命名的语言，我们就能为您提供！您可以下载您偏好的语言包，或者联系我们的24/7支持获取更多语言。

第一步是使用我们的Windows Visual Studio NuGet包安装程序。

下载语言包

Advanced Image related to 在.NET Core中的OCR

图像处理以准确读取不完美扫描

IronOCR 与其竞争对手有何不同？除了让您轻松添加 OCR 功能、提取文本和扫描旋转图像外，它还具有从不完美扫描中执行 OCR 的能力！相比之下，许多市场上现成的产品往往是僵化和不准确的，注定在现实世界的个人和企业应用中失败，因为大多数产品只支持机器打印、高分辨率和完美调整的文本。

IronOCR 扩展了 Google Tesseract 的功能，提供强大的 IronTesseract DLL — 一种本地 C# OCR 库，具有比免费的 Tesseract 库更高的稳定性和准确性。

放心使用 - IronOCR 支持您！

只要手中有最好的工具，即使您手中有不完美的扫描图像或存储在您的存储文件夹中的图像 - IronOCR 的图像处理库转换可以清除噪声、旋转、减少扭曲和失真的对齐，并提高分辨率和对比度。高级光学字符识别（OCR）设置为您——编码者——提供生成最佳可搜索结果所需的工具和代码，反复如此。

搜索您需要的单词，绝不要对 99.8%-100% 的准确结果和对 PDF 文件、多帧 TIFF 文件、JPEG 和 JPEG2000、GIF、PNG、BMP、WBMP、System.Drawing.Image、System.Drawing.Bitmap、System.IO.Streams 的图像、二进制图像数据（byte[]）及其他一切感到失望！

Tesseract的替代品

Fast And Polite Behavior related to 在.NET Core中的OCR

快速和准确——从扫描的PDF到旋转扫描图像

与其他.NET框架内的应用程序不同，您会发现IronOCR的软件包管理器控制台和识别文本控制台中的高级光学字符识别赋予您的用户读取多种字体（从Times New Roman到任何花哨或难以理解的字体）、粗细和样式的能力, 从完整图像或扫描图像中准确读取文本。我们能够选择图像的某些区域以提高速度和准确性。从几行到几段的多线程加快了OCR引擎的速度，并允许在多核机器上读取多个文件。