OCR 工具 Windows 上的 Tesseract OCR(代码示例教程) Kannapat Udonpant 已更新:七月 28, 2025 Download IronOCR NuGet 下载 DLL 下载 Windows 安装程序 Start Free Trial Copy for LLMs Copy for LLMs Copy page as Markdown for LLMs Open in ChatGPT Ask ChatGPT about this page Open in Gemini Ask Gemini about this page Open in Grok Ask Grok about this page Open in Perplexity Ask Perplexity about this page Share Share on Facebook Share on X (Twitter) Share on LinkedIn Copy URL Email article What is Tesseract OCR? Tesseract is an optical character recognition engine that can be used on a variety of operating systems. It is a free software, released under the Apache License. In this guide, I will take you through the steps that I followed in order to install Tesseract on my Windows 10 machine. The major version 5 is the current stable version and began with release 5.0. 0 on November 30, 2021. How to Use Tesseract OCR in Windows Install Tesseract OCR on a Windows 10 using .exe file Configure the Tesseract installation Add installation path to environment variables Run Tesseract OCR for Windows on a test image Use C# library for more intuitive APIs and advance methods in windows Step 1: Install Tesseract OCR in Windows 10 using .exe File: To install language data: sudo port install tesseract -<langcode> A list of langcodes is found on the MacPorts Tesseract page Homebrew. The first step to install Tesseract OCR for Windows is to download the .exe installer that corresponds to your machine's operating system. Step 2: Configure Installation Next, we'll need to configure the Tesseract installation. If you're feeling confident and only want to run Tesseract OCR for Windows with the default language set to English, running through the installation screens with all of the default options selected should work. Installer Language This is just the language for the dialog boxes and help information. If we want to, we can run Tesseract OCR for Windows in multiple languages: Installer language for Tesseract OCR for Windows Tesseract OCR Setup The setup screen recommends that all other applications are closed before continuing with the installation. The Tesseract OCR for Windows installation screen. Choose Install Location Next, we'll choose the installation location. Before proceeding to the next step, make sure to copy the install location to a .txt file. We will need to add the installation location to our machine's environment variables once the installation is complete. Choose the installation location. Choose Components By default, the ScrollView, Training Tools, Shortcuts creation, and Language data are all selected. Unless you have a specific reason not to install these, we will want to keep all of these selected. Default Tesseract OCR for Windows installation components. If we scroll down and expand the ‘Additional script data’, we will see that we have the option to download and install additional script data. This can be helpful in improving the accuracy of text extraction from certain scripted languages. It’s up to you if you want to install these. Optional script installation components. Choose the Start Menu Folder In the last step of the installation, we’ll be asked to choose the start menu folder for Tesseract OCR for Windows shortcuts. I’ve left mine set to the default name: ‘Tesseract-OCR’. Choose the start menu folder for the Tesseract OCR for Windows shortcuts. After we click install, Tesseract OCR for Windows will begin installing. Our next step is to add the installation path to our machine’s environment variables. Step 3: Add Installation Path to Environment Variables Control Panel To add the installation location to our environment variables, go to the Start menu and search for 'environment variables'. You should see a result to edit the system environment variables. If you don’t, you can always use the following steps: Start menu > Control Panel > Edit the system environment variables. Searching for ‘environment variables’ System Properties When presented with the ‘System Properties’ dialog box, we’ll want to make sure the Advanced tab is clicked, then click the Environment Variables button towards the bottom right of the screen. Environment Variables Under system variables, we will click the Edit button. When presented with the "Edit environment variable" screen, click the New button, and paste in your Tesseract OCR installation path that we copied earlier in Step 2. Once you’ve done this, click the ‘OK‘ button. Add Tesseract OCR for Windows Installation Directory to Environment Variables That’s it! Now that we’ve run the .exe installer and added the Tesseract OCR for Windows install location to our environment variables, we can test that our installation is working by running Tesseract on a test image. Step 4: Run Tesseract OCR for Windows on a Test Image To test that Tesseract OCR for Windows was installed successfully, open the command prompt on your machine, then run the Tesseract command. You should see an output with a quick explanation of Tesseract’s usage options. Checking successful installation of Tesseract OCR for Windows Congratulations! You’ve successfully installed Tesseract OCR for Windows on your machine. Advantages of using IronOCR to do OCR Work: IronOCR provides Tesseract OCR on Mac, Windows, Linux, Azure and Docker for: .NET Framework 4.0 + .NET Standard 2.0 + .NET Core 2.0 + .NET 5 Mono for macOS and Linux Xamarin for macOS IronOCR reads text, barcodes, and QR codes from all major image and PDF formats using the latest Tesseract 5 engine. This library adds OCR functionality to Desktop, Console, and Web applications in minutes. It supports 125+ international languages. Licenses start from $799. Step 1: Install the latest version of IronOCR Install DLL Download the IronOcr DLL directly to your machine. Install NuGet Alternatively, you can install it through NuGet with the following command: Install-Package IronOcr Step 2: Apply Your License Key Set your IronOCR license key using code Add this code to the startup of your application before IronOCR is used. IronOcr.Installation.LicenseKey = "IRONOCR-MYLICENSE-KEY-1EF01"; IronOcr.Installation.LicenseKey = "IRONOCR-MYLICENSE-KEY-1EF01"; IronOcr.Installation.LicenseKey = "IRONOCR-MYLICENSE-KEY-1EF01" $vbLabelText $csharpLabel Step 3: Test your Key Test if your key has been installed correctly. bool isValidLicense = IronOcr.License.IsValidLicense("IRONOCR-MYLICENSE-KEY-1EF01"); bool isValidLicense = IronOcr.License.IsValidLicense("IRONOCR-MYLICENSE-KEY-1EF01"); Dim isValidLicense As Boolean = IronOcr.License.IsValidLicense("IRONOCR-MYLICENSE-KEY-1EF01") $vbLabelText $csharpLabel Get started with the project // PM > Install-Package IronOcr // using IronOcr; var Ocr = new IronTesseract(); // Set the recognition language to English Ocr.Language = OcrLanguage.English; using (var Input = new OcrInput()) { // Add an example image to the OCR input Input.Add(@"img\example.tiff"); // Optional: Clean the image before processing // Input.DeNoise(); // Input.Deskew(); // Read the text from the image IronOcr.OcrResult result = Ocr.Read(Input); // Output the recognized text Console.WriteLine(result.Text); // Explore the OcrResult using IntelliSense } // PM > Install-Package IronOcr // using IronOcr; var Ocr = new IronTesseract(); // Set the recognition language to English Ocr.Language = OcrLanguage.English; using (var Input = new OcrInput()) { // Add an example image to the OCR input Input.Add(@"img\example.tiff"); // Optional: Clean the image before processing // Input.DeNoise(); // Input.Deskew(); // Read the text from the image IronOcr.OcrResult result = Ocr.Read(Input); // Output the recognized text Console.WriteLine(result.Text); // Explore the OcrResult using IntelliSense } ' PM > Install-Package IronOcr ' using IronOcr; Dim Ocr = New IronTesseract() ' Set the recognition language to English Ocr.Language = OcrLanguage.English Using Input = New OcrInput() ' Add an example image to the OCR input Input.Add("img\example.tiff") ' Optional: Clean the image before processing ' Input.DeNoise(); ' Input.Deskew(); ' Read the text from the image Dim result As IronOcr.OcrResult = Ocr.Read(Input) ' Output the recognized text Console.WriteLine(result.Text) ' Explore the OcrResult using IntelliSense End Using $vbLabelText $csharpLabel How to Use Tesseract OCR in C# for .NET? Install Google Tesseract and IronOCR for .NET into Visual Studio Check the latest builds in C# Review accuracy and image compatibility Test performance and API function Consider Multi-Language Support Code Example for .NET OCR Usage — Extract Text from Images in C# Use NuGet Package Manager to install the IronOCR NuGet Package into your Visual Studio solution. // PM > Install-Package IronOcr // using IronOcr; var Ocr = new IronTesseract(); // Set the recognition language to English Ocr.Language = OcrLanguage.English; using (var Input = new OcrInput()) { // Add an example image to the OCR input Input.Add(@"img\example.tiff"); // Optional: Clean the image before processing // Input.DeNoise(); // Input.Deskew(); // Read the text from the image IronOcr.OcrResult result = Ocr.Read(Input); // Output the recognized text Console.WriteLine(result.Text); // Explore the OcrResult using IntelliSense } // PM > Install-Package IronOcr // using IronOcr; var Ocr = new IronTesseract(); // Set the recognition language to English Ocr.Language = OcrLanguage.English; using (var Input = new OcrInput()) { // Add an example image to the OCR input Input.Add(@"img\example.tiff"); // Optional: Clean the image before processing // Input.DeNoise(); // Input.Deskew(); // Read the text from the image IronOcr.OcrResult result = Ocr.Read(Input); // Output the recognized text Console.WriteLine(result.Text); // Explore the OcrResult using IntelliSense } ' PM > Install-Package IronOcr ' using IronOcr; Dim Ocr = New IronTesseract() ' Set the recognition language to English Ocr.Language = OcrLanguage.English Using Input = New OcrInput() ' Add an example image to the OCR input Input.Add("img\example.tiff") ' Optional: Clean the image before processing ' Input.DeNoise(); ' Input.Deskew(); ' Read the text from the image Dim result As IronOcr.OcrResult = Ocr.Read(Input) ' Output the recognized text Console.WriteLine(result.Text) ' Explore the OcrResult using IntelliSense End Using $vbLabelText $csharpLabel IronOCR Tesseract for C# With IronOCR, all Tesseract installation happens entirely using the NuGet Package Manager. Install-Package IronOcr Tesseract 5 API in IronOCR Tesseract To date, IronTesseract is the only known implementation of Tesseract 5 for .NET Framework or Core. // using IronOcr; var Ocr = new IronTesseract(); // nothing to configure using (var Input = new OcrInput(@"images\image.png")) { var result = Ocr.Read(Input); // Output the recognized text Console.WriteLine(result.Text); } // using IronOcr; var Ocr = new IronTesseract(); // nothing to configure using (var Input = new OcrInput(@"images\image.png")) { var result = Ocr.Read(Input); // Output the recognized text Console.WriteLine(result.Text); } ' using IronOcr; Dim Ocr = New IronTesseract() ' nothing to configure Using Input = New OcrInput("images\image.png") Dim result = Ocr.Read(Input) ' Output the recognized text Console.WriteLine(result.Text) End Using $vbLabelText $csharpLabel Tesseract 4 API in IronOCR Tesseract // using IronOcr; var Ocr = new IronTesseract(); // Specify the version of Tesseract Ocr.Configuration.TesseractVersion = TesseractVersion.Tesseract4; using (var Input = new OcrInput(@"images\image.png")) { var result = Ocr.Read(Input); // Output the recognized text Console.WriteLine(result.Text); } // using IronOcr; var Ocr = new IronTesseract(); // Specify the version of Tesseract Ocr.Configuration.TesseractVersion = TesseractVersion.Tesseract4; using (var Input = new OcrInput(@"images\image.png")) { var result = Ocr.Read(Input); // Output the recognized text Console.WriteLine(result.Text); } ' using IronOcr; Dim Ocr = New IronTesseract() ' Specify the version of Tesseract Ocr.Configuration.TesseractVersion = TesseractVersion.Tesseract4 Using Input = New OcrInput("images\image.png") Dim result = Ocr.Read(Input) ' Output the recognized text Console.WriteLine(result.Text) End Using $vbLabelText $csharpLabel Why IronOCR Is Better Than Tesseract: ACCURACY TESSERACT: If Tesseract encounters an image that is rotated, skewed, is of a low DPI, scanned, or has background noise, it becomes almost impossible for Tesseract to get data from that image. In addition, Tesseract will also take a very long time to process that document before providing you with nonsensical information. IRONOCR: IronOCR takes this headache away. Users often achieve 99.8-100% accuracy with minimal configuration. IMAGE COMPATIBILITY TESSERACT: Only accepts Leptonica PIX image format which is an IntPtr C++ object in C#. PIX objects are not managed memory — and failure to handle them with care in C# results in memory leaks. IRONOCR: Images are memory managed. PDF & Tiff supported. System.Drawing, Stream, and Byte Array are included for every file format. Broad image support: PDF Documents PDF Pages MultiFrame TIFF files JPEG & JPEG2000 GIF PNG System.Drawing.Image Binary image Data (byte []) And many more... PERFORMANCE TESSERACT: Google Tesseract can perform fast and accurate results if properly tuned and input images have been preprocessed using Photoshop or ImageMagick. IRONOCR: The IronOcr .NET Tesseract DLL works accurately and at speed for most images out of the box. We have implemented multithreading to make use of the multi-core processors that most machines now use. Even low-resolution images generally work with a high degree of accuracy in your program. No Photoshop required. API TESSERACT: We have two free choices: Work with Interop layers — many that are found on GitHub are out of date, have unresolved tickets, memory leaks, and Console warnings. May not support .NET Core or Standard. Work with the command line EXE — difficult to deploy and constantly interrupted by virus scanners and security policies. IRONOCR: A managed and tested .NET Library for Tesseract called IronTesseract. Fully documented with IntelliSense support. LANGUAGE TESSERACT: Supports only 100 languages. IRONOCR: Supports 125+ languages. Conclusion Tesseract is an excellent resource for C++ developers, but it is not a complete OCR library for .NET. Scanned or photographed images need to be processed so as to be orthogonal, standardized, high-resolution, and free of digital noise before Tesseract can accurately work with them. In contrast, IronOCR can do this and more, with just a single line of code. It is true that IronOCR uses Tesseract for its internal OCR engine, a very finely-tuned Tesseract, built for C#, with a lot of performance improvements and features added as standard. Kannapat Udonpant 立即与工程团队聊天 软件工程师 在成为软件工程师之前,Kannapat 在日本北海道大学完成了环境资源博士学位。在攻读学位期间,Kannapat 还成为了车辆机器人实验室的成员,隶属于生物生产工程系。2022 年,他利用自己的 C# 技能加入 Iron Software 的工程团队,专注于 IronPDF。Kannapat 珍视他的工作,因为他可以直接从编写大多数 IronPDF 代码的开发者那里学习。除了同行学习外,Kannapat 还喜欢在 Iron Software 工作的社交方面。不撰写代码或文档时,Kannapat 通常可以在他的 PS5 上玩游戏或重温《最后生还者》。 相关文章 已更新六月 22, 2025 Power Automate OCR(开发者教程) 光学字符识别技术在文档数字化、自动化PDF数据提取和录入、发票处理和使扫描的 PDF 可搜索的应用中得到了应用。 阅读更多 已更新六月 22, 2025 Easyocr 与 Tesseract(OCR 功能比较) 流行的 OCR 工具和库,如 EasyOCR、Tesseract OCR、Keras-OCR 和 IronOCR,通常用于将此功能集成到现代应用程序中。 阅读更多 已更新六月 22, 2025 如何将图片转化为文本 在当前的数字时代,将基于图像的内容转化为易于阅读的可编辑、可搜索文本 阅读更多 在线 OCR 转换器—免费在线工具Windows 11 上的 OCR(免费在...
已更新六月 22, 2025 Power Automate OCR(开发者教程) 光学字符识别技术在文档数字化、自动化PDF数据提取和录入、发票处理和使扫描的 PDF 可搜索的应用中得到了应用。 阅读更多
已更新六月 22, 2025 Easyocr 与 Tesseract(OCR 功能比较) 流行的 OCR 工具和库,如 EasyOCR、Tesseract OCR、Keras-OCR 和 IronOCR,通常用于将此功能集成到现代应用程序中。 阅读更多