OCR 工具

Windows 上的 Tesseract OCR（代码示例教程）

Name: IronOCR
Brand: Iron Software
Availability: InStock
Rating: 4.86 (101 reviews)

Kannapat Udonpant

已更新:七月 28, 2025

什么是 Tesseract OCR？

Tesseract 是一个光学字符识别引擎，可以在多种操作系统上使用。这是一款自由软件，根据 Apache 许可证发布。在本指南中，我将带您了解我在我的 Windows 10 机器上安装 Tesseract 的步骤。5.0.0 版是当前稳定版本，始于 2021 年 11 月 30 日发布的版本 5。

如何在Windows中使用Tesseract OCR

使用 .exe 文件在 Windows 10 上安装 Tesseract OCR
配置 Tesseract 安装
将安装路径添加到环境变量中
在测试图像上运行适用于 Windows 的 Tesseract OCR
在 Windows 系统中，使用 C# 库可以获得更直观的 API 和高级方法。

步骤 1：使用 .exe 文件在 Windows 10 上安装 Tesseract OCR：

要安装语言数据：sudo port install tesseract -<langcode> 可以在 MacPorts Tesseract 页面 Homebrew 上找到语言代码列表。在 Windows 上安装 Tesseract OCR 的第一步是下载与您的机器操作系统相对应的 .exe 安装程序。

步骤 2：配置安装

接下来，我们需要配置 Tesseract 的安装。如果您感到自信，并且只想在设置为默认语言为英语的情况下运行 Tesseract OCR for Windows，则选择所有默认选项并通过安装屏幕应该可以工作。

安装程序语言

这只是对话框和帮助信息的语言。如果我们愿意，我们可以在 Windows 上运行多语言的 Tesseract OCR：

Windows 的 Tesseract OCR 的安装语言

Tesseract OCR 设置

安装屏幕建议在继续安装之前关闭所有其他应用程序。

Tesseract OCR for Windows 的安装屏幕。

选择安装位置

接下来，我们将选择安装位置。在进行下一步之前，请确保将安装位置复制到 .txt 文件中。一旦安装完成，我们将需要将安装位置添加到我们机器的环境变量中。

选择安装位置。

选择组件

默认情况下，ScrollView、Training Tools、快捷方式创建和语言数据都被选择。除非您有特定原因不安装这些，否则我们希望保持选择这些。

默认的 Windows Tesseract OCR 安装组件。

如果我们向下滚动并展开"附加脚本数据"，我们将看到有下载和安装附加脚本数据的选项。这对提高从某些脚本语言中提取文本的准确性有帮助。是否安装这些由您决定。

可选的脚本安装组件。

选择"开始"菜单文件夹

在安装的最后一步，我们会被要求选择 Tesseract OCR for Windows 的快捷方式的开始菜单文件夹。我将其设置为默认名称："Tesseract-OCR"。

选择 Tesseract OCR for Windows 快捷方式的开始菜单文件夹。

点击安装后，Tesseract OCR for Windows 将开始安装。我们的下一步是将安装路径添加到我们机器的环境变量中。

步骤 3：将安装路径添加到环境变量中

控制面板

要将安装位置添加到我们的环境变量中，请转到开始菜单并搜索"环境变量"。您应该看到编辑系统环境变量的结果。如果您没有看到，您可以随时使用以下步骤：开始菜单 > 控制面板 > 编辑系统环境变量。

搜索"环境变量"

系统属性

在出现"系统属性"对话框时，我们需要确保点击高级选项卡，然后点击屏幕右下角的环境变量按钮。

环境变量

在系统变量下，我们将点击编辑按钮。

当出现"编辑环境变量"屏幕时，点击新建按钮，粘贴我们在步骤 2 中复制的 Tesseract OCR 安装路径。完成后，点击 'OK' 按钮。

将 Tesseract OCR for Windows 安装目录添加到环境变量。

就这样！现在我们已运行 .exe 安装程序并将 Tesseract OCR for Windows 的安装位置添加到我们的环境变量中，我们可以通过在测试图像上运行 Tesseract 来测试我们的安装是否正常工作。

步骤 4：在测试图像上运行 Tesseract OCR for Windows

要测试 Windows 上的 Tesseract OCR 是否成功安装，请打开机器上的命令提示符，然后运行 Tesseract 命令。您应该会看到 Tesseract 用法选项的快速说明输出。

检查成功安装 Tesseract OCR for Windows

祝贺您！您已成功在您的计算机上安装了Windows版Tesseract OCR。

使用 IronOCR 进行 OCR 工作的优势：

IronOCR provides Tesseract OCR on Mac, Windows, Linux, Azure and Docker for:

.NET Framework 4.0 及以上
.NET Standard 2.0 及以上
.NET Core 2.0 及以上
.NET 5
macOS 和 Linux 的 Mono
macOS 的 Xamarin

IronOCR 使用最新的 Tesseract 5 引擎从所有主要图像和 PDF 格式中读取文本、条形码和二维码。这个库可以在几分钟内将 OCR 功能添加到桌面、控制台和 Web 应用程序上。它支持 125 多种国际语言。 Licenses start from $799.

步骤 1：安装最新版本的 IronOCR

安装 DLL

Download the IronOcr DLL directly to your machine.

安装 NuGet

或者，您可以通过以下命令通过 NuGet 安装它：

Install-Package IronOcr

步骤 2：应用您的许可证密钥

使用代码设置您的 IronOCR 许可证密钥

在使用 IronOCR 之前，将此代码添加到应用程序的启动中。

IronOcr.Installation.LicenseKey = "IRONOCR-MYLICENSE-KEY-1EF01";

IronOcr.Installation.LicenseKey = "IRONOCR-MYLICENSE-KEY-1EF01";

IronOcr.Installation.LicenseKey = "IRONOCR-MYLICENSE-KEY-1EF01"

$vbLabelText $csharpLabel

步骤 3：测试您的密钥

测试您的密钥是否安装正确。

bool isValidLicense = IronOcr.License.IsValidLicense("IRONOCR-MYLICENSE-KEY-1EF01");

bool isValidLicense = IronOcr.License.IsValidLicense("IRONOCR-MYLICENSE-KEY-1EF01");

Dim isValidLicense As Boolean = IronOcr.License.IsValidLicense("IRONOCR-MYLICENSE-KEY-1EF01")

$vbLabelText $csharpLabel

开始项目

// PM > Install-Package IronOcr
// using IronOcr;

var Ocr = new IronTesseract();

// Set the recognition language to English
Ocr.Language = OcrLanguage.English;

using (var Input = new OcrInput())
{
    // Add an example image to the OCR input
    Input.Add(@"img\example.tiff");

    // Optional: Clean the image before processing
    // Input.DeNoise();
    // Input.Deskew();

    // Read the text from the image
    IronOcr.OcrResult result = Ocr.Read(Input);

    // Output the recognized text
    Console.WriteLine(result.Text);

    // Explore the OcrResult using IntelliSense
}

// PM > Install-Package IronOcr
// using IronOcr;

var Ocr = new IronTesseract();

// Set the recognition language to English
Ocr.Language = OcrLanguage.English;

using (var Input = new OcrInput())
{
    // Add an example image to the OCR input
    Input.Add(@"img\example.tiff");

    // Optional: Clean the image before processing
    // Input.DeNoise();
    // Input.Deskew();

    // Read the text from the image
    IronOcr.OcrResult result = Ocr.Read(Input);

    // Output the recognized text
    Console.WriteLine(result.Text);

    // Explore the OcrResult using IntelliSense
}

' PM > Install-Package IronOcr
' using IronOcr;

Dim Ocr = New IronTesseract()

' Set the recognition language to English
Ocr.Language = OcrLanguage.English

Using Input = New OcrInput()
	' Add an example image to the OCR input
	Input.Add("img\example.tiff")

	' Optional: Clean the image before processing
	' Input.DeNoise();
	' Input.Deskew();

	' Read the text from the image
	Dim result As IronOcr.OcrResult = Ocr.Read(Input)

	' Output the recognized text
	Console.WriteLine(result.Text)

	' Explore the OcrResult using IntelliSense
End Using

$vbLabelText $csharpLabel

如何在 .NET 的 C# 中使用 Tesseract OCR？

将 Google Tesseract 和 IronOCR for .NET 安装到 Visual Studio 中
检查 C# 中的最新构建
审查准确性和图像兼容性
测试性能和 API 功能
考虑多语言支持

.NET OCR 用法代码示例——从 C# 图像中提取文本

使用 NuGet 包管理器将 IronOCR NuGet 包安装到您的 Visual Studio 解决方案中。

// PM > Install-Package IronOcr
// using IronOcr;

var Ocr = new IronTesseract();

// Set the recognition language to English
Ocr.Language = OcrLanguage.English;

using (var Input = new OcrInput())
{
    // Add an example image to the OCR input
    Input.Add(@"img\example.tiff");

    // Optional: Clean the image before processing
    // Input.DeNoise();
    // Input.Deskew();

    // Read the text from the image
    IronOcr.OcrResult result = Ocr.Read(Input);

    // Output the recognized text
    Console.WriteLine(result.Text);

    // Explore the OcrResult using IntelliSense
}

// PM > Install-Package IronOcr
// using IronOcr;

var Ocr = new IronTesseract();

// Set the recognition language to English
Ocr.Language = OcrLanguage.English;

using (var Input = new OcrInput())
{
    // Add an example image to the OCR input
    Input.Add(@"img\example.tiff");

    // Optional: Clean the image before processing
    // Input.DeNoise();
    // Input.Deskew();

    // Read the text from the image
    IronOcr.OcrResult result = Ocr.Read(Input);

    // Output the recognized text
    Console.WriteLine(result.Text);

    // Explore the OcrResult using IntelliSense
}

' PM > Install-Package IronOcr
' using IronOcr;

Dim Ocr = New IronTesseract()

' Set the recognition language to English
Ocr.Language = OcrLanguage.English

Using Input = New OcrInput()
	' Add an example image to the OCR input
	Input.Add("img\example.tiff")

	' Optional: Clean the image before processing
	' Input.DeNoise();
	' Input.Deskew();

	' Read the text from the image
	Dim result As IronOcr.OcrResult = Ocr.Read(Input)

	' Output the recognized text
	Console.WriteLine(result.Text)

	' Explore the OcrResult using IntelliSense
End Using

$vbLabelText $csharpLabel

IronOCR Tesseract for C

使用 IronOCR，所有 Tesseract 的安装都完全通过 NuGet 包管理器进行。

Install-Package IronOcr

IronOCR Tesseract 中的 Tesseract 5 API

迄今为止，IronTesseract 是唯一已知的 .NET Framework 或 Core 的 Tesseract 5 实现。

// using IronOcr;

var Ocr = new IronTesseract(); // nothing to configure

using (var Input = new OcrInput(@"images\image.png"))
{
    var result = Ocr.Read(Input);

    // Output the recognized text
    Console.WriteLine(result.Text);
}

// using IronOcr;

var Ocr = new IronTesseract(); // nothing to configure

using (var Input = new OcrInput(@"images\image.png"))
{
    var result = Ocr.Read(Input);

    // Output the recognized text
    Console.WriteLine(result.Text);
}

' using IronOcr;

Dim Ocr = New IronTesseract() ' nothing to configure

Using Input = New OcrInput("images\image.png")
	Dim result = Ocr.Read(Input)

	' Output the recognized text
	Console.WriteLine(result.Text)
End Using

$vbLabelText $csharpLabel

IronOCR Tesseract 中的 Tesseract 4 API

// using IronOcr;

var Ocr = new IronTesseract();

// Specify the version of Tesseract
Ocr.Configuration.TesseractVersion = TesseractVersion.Tesseract4;

using (var Input = new OcrInput(@"images\image.png"))
{
    var result = Ocr.Read(Input);

    // Output the recognized text
    Console.WriteLine(result.Text);
}

// using IronOcr;

var Ocr = new IronTesseract();

// Specify the version of Tesseract
Ocr.Configuration.TesseractVersion = TesseractVersion.Tesseract4;

using (var Input = new OcrInput(@"images\image.png"))
{
    var result = Ocr.Read(Input);

    // Output the recognized text
    Console.WriteLine(result.Text);
}

' using IronOcr;

Dim Ocr = New IronTesseract()

' Specify the version of Tesseract
Ocr.Configuration.TesseractVersion = TesseractVersion.Tesseract4

Using Input = New OcrInput("images\image.png")
	Dim result = Ocr.Read(Input)

	' Output the recognized text
	Console.WriteLine(result.Text)
End Using

$vbLabelText $csharpLabel

为什么 IronOCR 比 Tesseract 更好：

准确性

TESSERACT：

如果 Tesseract 遇到旋转、倾斜、低 DPI、扫描或有背景噪音的图像，几乎无法从该图像中提取数据。另外，Tesseract 在提供无意义的信息之前，也需要花费很长时间来处理该文档。

IRONOCR：

IronOCR 解决了这个问题。用户通常可以通过最小的配置获得 99.8-100% 的准确率。

图像兼容性

TESSERACT：

仅接受 Leptonica PIX 图像格式，而这是 C# 中的一个 IntPtr C++ 对象。 PIX 对象不受管理的内存处理——如果在 C# 中未能小心处理，会导致内存泄漏。

IRONOCR：

图像是内存管理的。支持 PDF 和 Tiff。为每种文件格式都包含 System.Drawing、Stream 和 Byte Array。

广泛的图像支持：

PDF 文档
PDF 页面
多帧 TIFF 文件
JPEG 和 JPEG2000
GIF
PNG
System.Drawing.Image
二进制图像数据（byte []）
以及更多……

性能

TESSERACT：

如果正确调整并使用 Photoshop 或 ImageMagick 对输入图像进行预处理，Google Tesseract 可以执行快速和准确的结果。

IRONOCR：

IronOcr .NET Tesseract DLL 通常可以为大多数图像准确而快速地开箱即用。我们已经实现了多线程技术，以利用大多数机器现在使用的多核处理器。即使是低分辨率的图像，通常也能在您的程序中正常工作并保持高精度。不需要 Photoshop。

API

TESSERACT：

我们有两种免费选择：

使用 Interop 层工作——在 GitHub 上找到的许多内容已过时，存在未解决的票证、内存泄漏和控制台警告。可能不支持 .NET Core 或 Standard。
使用命令行 EXE 工作——难以部署并且经常被病毒扫描程序和安全策略打断。

IRONOCR：

一个为 Tesseract 管理和测试的 .NET 库，称为 IronTesseract。

完全记录并支持 IntelliSense。

语言

TESSERACT：

仅支持 100 种语言。

IRONOCR：

支持 125 多种语言。

结论

Tesseract 对于 C++ 开发人员来说是一个优秀的资源，但它并不是一个完整的 .NET OCR 库。扫描或拍摄的图像需要处理，以便使其正交、标准化、高分辨率并没有数字噪声，才能让 Tesseract 准确处理。

相比之下，IronOCR 只需一行代码就能做到这一点，甚至更多。 It is true that IronOCR uses Tesseract for its internal OCR engine, a very finely-tuned Tesseract, built for C#, with a lot of performance improvements and features added as standard.

Kannapat Udonpant

立即与工程团队聊天

软件工程师

在成为软件工程师之前，Kannapat 在日本北海道大学完成了环境资源博士学位。在攻读学位期间，Kannapat 还成为了车辆机器人实验室的成员，隶属于生物生产工程系。2022 年，他利用自己的 C# 技能加入 Iron Software 的工程团队，专注于 IronPDF。Kannapat 珍视他的工作，因为他可以直接从编写大多数 IronPDF 代码的开发者那里学习。除了同行学习外，Kannapat 还喜欢在 Iron Software 工作的社交方面。不撰写代码或文档时，Kannapat 通常可以在他的 PS5 上玩游戏或重温《最后生还者》。

已更新六月 22, 2025

Power Automate OCR（开发者教程）

光学字符识别技术在文档数字化、自动化PDF数据提取和录入、发票处理和使扫描的 PDF 可搜索的应用中得到了应用。

已更新六月 22, 2025

Easyocr 与 Tesseract（OCR 功能比较）

流行的 OCR 工具和库，如 EasyOCR、Tesseract OCR、Keras-OCR 和 IronOCR，通常用于将此功能集成到现代应用程序中。

已更新六月 22, 2025

如何将图片转化为文本

在当前的数字时代，将基于图像的内容转化为易于阅读的可编辑、可搜索文本

在线 OCR 转换器—免费在线工具

Windows 11 上的 OCR（免费在�...

Windows 上的 Tesseract OCR（代码示例教程）

什么是 Tesseract OCR？

如何在Windows中使用Tesseract OCR

步骤 1：使用 .exe 文件在 Windows 10 上安装 Tesseract OCR：

步骤 2：配置安装

安装程序语言

Tesseract OCR 设置

选择安装位置

选择组件

选择"开始"菜单文件夹

步骤 3：将安装路径添加到环境变量中

控制面板

系统属性

环境变量

将 Tesseract OCR for Windows 安装目录添加到环境变量。

步骤 4：在测试图像上运行 Tesseract OCR for Windows

使用 IronOCR 进行 OCR 工作的优势：

步骤 1：安装最新版本的 IronOCR

安装 DLL

安装 NuGet

步骤 2：应用您的许可证密钥

使用代码设置您的 IronOCR 许可证密钥

步骤 3：测试您的密钥

开始项目

如何在 .NET 的 C# 中使用 Tesseract OCR？

.NET OCR 用法代码示例——从 C# 图像中提取文本

IronOCR Tesseract for C

IronOCR Tesseract 中的 Tesseract 5 API

IronOCR Tesseract 中的 Tesseract 4 API

为什么 IronOCR 比 Tesseract 更好：

准确性

TESSERACT：

IRONOCR：

图像兼容性

TESSERACT：

IRONOCR：

性能

TESSERACT：

IRONOCR：

API

TESSERACT：

IRONOCR：

语言

TESSERACT：

IRONOCR：

结论

相关文章

Power Automate OCR（开发者教程）

Easyocr 与 Tesseract（OCR 功能比较）

如何将图片转化为文本

免费获取

下一步：开始免费 30 天试用

下一步：开始免费 30 天试用

深受全球 200 多万工程师信赖