OCR 工具

如何将图片转化为文本

Name: IronOCR
Brand: Iron Software
Availability: InStock
Rating: 4.86 (101 reviews)

已更新:六月 22, 2025

在当前数字时代，将基于图像的内容转化为易于阅读、可编辑、可搜索的文本至关重要。在档案纸质文件、从图像中提取关键信息或数字化印刷材料等场景中，这一点尤为重要。光学字符识别（OCR）技术为自动化转换过程提供了解决方案。一种高效可靠的工具是IronOCR，一个用于.NET的强大OCR库。

本文将解释如何使用IronOCR将图片转换为文本，并探讨这种转换如何节省时间、减少错误并简化数据提取、归档和文档处理等过程。

如何将图片转换为文本

下载一个用于OCR的C#库
创建一个新的IronTesseract实例
使用OcrImageInput加载您的图像
使用OcrRead读取图像的内容
将OCR结果导出为文本文件

为什么要将图片转换为文本？

想要将图像转换为文本的原因有很多，其中包括：

数据提取：从扫描文档和图像中提取文本，用于归档或数据处理。
编辑扫描内容：编辑或更新之前扫描的文档中的文本，节省手动重新输入内容的时间。
提高无障碍性：将印刷材料转换为数字文本，使其可以被屏幕阅读器或文本转语音应用程序访问。
自动化：通过读取发票、收据或名片上的文本来自动输入和处理数据。

开始将图像转换为文本的方法

在我们探索如何利用IronOCR强大的图像到文本功能从图像中提取文本之前，先让我们看看使用在线工具Docsumo的一般步骤。在线OCR工具对于那些希望偶尔做OCR任务甚至一次性任务的人是一个有用的选择，因为它们不需要手动设置。当然，如果您需要定期执行OCR任务，拥有一个强大的OCR工具如IronOCR可能更适合您。

导航到在线OCR工具
上传您的图像并开始提取过程
将结果数据下载为文本文档

第一步：导航到在线OCR工具

要开始利用OCR技术从图像文件中提取文本，我们首先需要导航到我们要使用的在线图像OCR工具。

如何将图片转换为文本：图1 - Docsumo OCR工具

第二步：上传您的图像并开始提取过程

现在，通过点击"上传文件"按钮，我们可以上传要从中提取文本的图像文件。该工具将立即开始处理图像。

如何将图片转换为文本：图2 - Docsumo - 文件处理

第三步：将结果数据下载为文本文档

现在图像处理完成，我们可以将提取的文本下载为新的文本文档，用于进一步使用或操作。

如何将图片转换为文本：图3 - Docsumo - 图像处理完成

您还可以查看文件，突出显示各个部分以查看其中包含的文本。如果您只想查看某些部分的文本，这可能特别有用。然后，您仍然可以将文本下载为文本文档、XLS或JSON。

如何将图片转换为文本：图4

IronOCR 入门

IronOCR是一个多功能的.NET库，允许您对图像进行OCR操作。它具有广泛的功能，可以处理各种文件格式（如PNG、JPEG、TIFF和PDF），执行图像校正，扫描专业文件（护照、车牌等），提供有关扫描文件的高级信息，转换扫描文档，并突出显示文本。

安装IronOCR库

在您能够开始使用IronOCR读取图像之前，如果您尚未安装它，您需要将其安装在您的项目中。您可以在Visual Studio中使用NuGet轻松安装IronOCR。打开 NuGet 包管理器控制台并运行以下命令：

Install-Package IronOcr

或者，您可以通过在解决方案页面的NuGet包管理器中搜索IronOCR来安装。

如何将图片转换为文本：图5

要在您的代码中使用IronOCR，请确保在代码的顶部有适当的导入语句：

using IronOcr;

using IronOcr;

Imports IronOcr

$vbLabelText $csharpLabel

将图像转换为文本：一个基本示例

首先，让我们看一个使用IronOCR的基本图像到文本示例。这是任何OCR工具的核心功能，对于本示例，我们将使用在线工具中使用的PNG文件。在此示例中，我们首先实例化了IronTesseract类并将其分配给变量ocr。然后，我们使用OcrImageInput类从提供的图像文件创建一个新的OcrImageInput对象。最后，使用Read方法从图像中读取文本并返回OcrResult对象。然后，我们可以访问提取的文本并使用ocrResult.Text将其显示到控制台。

using IronOcr;

IronTesseract ocr = new IronTesseract();

// Load the image from which to extract text
using OcrImageInput image = new OcrImageInput("example.png");

// Perform OCR to extract text
OcrResult ocrResult = ocr.Read(image);

// Output the extracted text to the console
Console.WriteLine(ocrResult.Text);

using IronOcr;

IronTesseract ocr = new IronTesseract();

// Load the image from which to extract text
using OcrImageInput image = new OcrImageInput("example.png");

// Perform OCR to extract text
OcrResult ocrResult = ocr.Read(image);

// Output the extracted text to the console
Console.WriteLine(ocrResult.Text);

Imports IronOcr

Private ocr As New IronTesseract()

' Load the image from which to extract text
Private OcrImageInput As using

' Perform OCR to extract text
Private ocrResult As OcrResult = ocr.Read(image)

' Output the extracted text to the console
Console.WriteLine(ocrResult.Text)

$vbLabelText $csharpLabel

输出图像

如何将图片转换为文本：图6

处理不同的图片格式

IronOCR支持多种图像格式，如PNG、JPEG、BMP、GIF和TIFF。从不同图像格式读取文本的过程保持不变，您只需加载正确扩展名的文件即可。

using IronOcr;

IronTesseract ocr = new IronTesseract();

// Load a BMP image
using OcrImageInput image = new OcrImageInput("example.bmp");

// Perform OCR to extract text
OcrResult ocrResult = ocr.Read(image);

// Output the extracted text to the console
Console.WriteLine(ocrResult.Text);

using IronOcr;

IronTesseract ocr = new IronTesseract();

// Load a BMP image
using OcrImageInput image = new OcrImageInput("example.bmp");

// Perform OCR to extract text
OcrResult ocrResult = ocr.Read(image);

// Output the extracted text to the console
Console.WriteLine(ocrResult.Text);

Imports IronOcr

Private ocr As New IronTesseract()

' Load a BMP image
Private OcrImageInput As using

' Perform OCR to extract text
Private ocrResult As OcrResult = ocr.Read(image)

' Output the extracted text to the console
Console.WriteLine(ocrResult.Text)

$vbLabelText $csharpLabel

提高OCR精度

OCR性能可以通过优化图像并配置语言、图像分辨率和图像中的噪声水平等选项来提高。以下是如何通过使用DeNoise()和Sharpen()方法微调OCR，以增加文本提取准确性以改善图像质量：

using IronOcr;

IronTesseract ocr = new IronTesseract();

// Load the image and apply image processing to improve accuracy
using OcrImageInput image = new OcrImageInput("example.png");
image.DeNoise();
image.Sharpen();

// Perform OCR to extract text
OcrResult ocrResult = ocr.Read(image);

// Output the extracted text to the console
Console.WriteLine(ocrResult.Text);

using IronOcr;

IronTesseract ocr = new IronTesseract();

// Load the image and apply image processing to improve accuracy
using OcrImageInput image = new OcrImageInput("example.png");
image.DeNoise();
image.Sharpen();

// Perform OCR to extract text
OcrResult ocrResult = ocr.Read(image);

// Output the extracted text to the console
Console.WriteLine(ocrResult.Text);

Imports IronOcr

Private ocr As New IronTesseract()

' Load the image and apply image processing to improve accuracy
Private OcrImageInput As using
image.DeNoise()
image.Sharpen()

' Perform OCR to extract text
Dim ocrResult As OcrResult = ocr.Read(image)

' Output the extracted text to the console
Console.WriteLine(ocrResult.Text)

$vbLabelText $csharpLabel

导出提取的文本

现在我们已经了解了图像到文本过程的基础知识，接下来让我们看看如何导出结果文本以供以后使用。对于此示例，我们将使用与之前相同的过程来加载并扫描图像。然后，使用File.WriteAllText("output.txt", ocrResult.Text)，我们创建一个名为output.txt的新文本文件，并将提取的文本保存到该文件中。

using IronOcr;
using System.IO;

IronTesseract ocr = new IronTesseract();

// Load the image
using OcrImageInput image = new OcrImageInput("example.png");

// Perform OCR to extract text
OcrResult ocrResult = ocr.Read(image);

// Save the extracted text to a file
File.WriteAllText("output.txt", ocrResult.Text);

using IronOcr;
using System.IO;

IronTesseract ocr = new IronTesseract();

// Load the image
using OcrImageInput image = new OcrImageInput("example.png");

// Perform OCR to extract text
OcrResult ocrResult = ocr.Read(image);

// Save the extracted text to a file
File.WriteAllText("output.txt", ocrResult.Text);

Imports IronOcr
Imports System.IO

Private ocr As New IronTesseract()

' Load the image
Private OcrImageInput As using

' Perform OCR to extract text
Private ocrResult As OcrResult = ocr.Read(image)

' Save the extracted text to a file
File.WriteAllText("output.txt", ocrResult.Text)

$vbLabelText $csharpLabel

如何将图片转换为文本：图7

IronOCR的关键功能

高精度：IronOCR使用先进的Tesseract OCR算法，并包含内置工具来处理复杂的图像，确保高精度。
多语言支持: 支持 125 种以上语言，包括拉丁文、斯拉夫文、阿拉伯文和亚洲字符的多种书写方案。但是需要注意的是，只有与IronOCR一起安装的英语。要使用其他语言，您需要安装该语言的附加语言包。
PDF OCR：IronOCR可以从扫描的PDF中提取文本，成为文档数字化的重要工具。
图像清理：它提供预处理工具，如去斜、降噪和反转，以提高图像质量从而提高OCR精度。
易于集成：API无缝集成到任何.NET项目中，无论是控制台应用程序、Web应用程序还是桌面软件。

将图片转换为文本的常见用例

自动化数据输入：企业可以使用OCR自动从表格、收据或名片中提取数据。
文档归档：组织可以数字化实体文件，使其可搜索且更易于存储。
无障碍性：将印刷材料转换为文本，以便在屏幕阅读器或其他辅助技术中使用。
研究与分析：快速将扫描的研究材料转换为文本，以便分析或集成到其他软件工具中。
学习：将扫描的学习笔记转换为可编辑文本，可以保存为Word文档，以便在IronWord、Microsoft Word或Google docs等工具中进一步操作。

结论

使用IronOCR从图像中转换文本是一种快速、准确且高效的处理文档任务的方法。无论您是在处理扫描文档、数字图像还是PDF文档，IronOCR简化了过程，提供高精度、多语言支持和强大的图像处理工具。该工具非常适合希望简化文档管理工作流程、自动化数据提取或提高无障碍性的企业。

使用免费试用立即亲身体验IronOCR的强大功能。只需几分钟即可在您的工作空间中完全运行，因此您可以立即开始处理OCR任务！

Kannapat Udonpant

立即与工程团队聊天

软件工程师

在成为软件工程师之前，Kannapat 在日本北海道大学完成了环境资源博士学位。在攻读学位期间，Kannapat 还成为了车辆机器人实验室的成员，隶属于生物生产工程系。2022 年，他利用自己的 C# 技能加入 Iron Software 的工程团队，专注于 IronPDF。Kannapat 珍视他的工作，因为他可以直接从编写大多数 IronPDF 代码的开发者那里学习。除了同行学习外，Kannapat 还喜欢在 Iron Software 工作的社交方面。不撰写代码或文档时，Kannapat 通常可以在他的 PS5 上玩游戏或重温《最后生还者》。