IRONOCRの使い方 C#でOCRを使用した文字認識の作成方法 Kannapat Udonpant 更新日:7月 28, 2025 Download IronOCR NuGet Download テキストの検索と置換 テキストと画像のスタンプ Start Free Trial Copy for LLMs Copy for LLMs Copy page as Markdown for LLMs Open in ChatGPT Ask ChatGPT about this page Open in Gemini Ask Gemini about this page Open in Grok Ask Grok about this page Open in Perplexity Ask Perplexity about this page Share Share on Facebook Share on X (Twitter) Share on LinkedIn Copy URL Email article The technology of Optical Character Recognition (OCR) allows for the conversion of text into digital formats readable by machines. When a document is scanned (such as an invoice or receipt), it is saved by your computer as an image file. However, the text within the scanned image cannot be edited, searched, or counted using a regular text editor. OCR can process the image, extract text, and transform it into a text format that can be read by computers. This enables the extraction of text from various sources, including PDF files and other scanned images. Furthermore, OCR capabilities extend beyond simple text extraction to include major image formats and PDF documents, converting them into searchable OCR data. In C#, developers can leverage the power of OCR through various libraries, and one of which is the powerful library IronOCR from Iron Software. In this tutorial, we'll explore the basics of OCR and demonstrate how to use IronOCR to perform Character Recognition efficiently in C#. How to create Character Recognition in C# Create a brand new C# project and name the project in Visual Studio. Install the IronOCR .NET library and include it in the project folder. Utilize the IronOCR Tesseract to read text from images. Utilize the IronOCR advanced features to read the text in images. Performance Tuning of IronOCR Read Operation. Getting Started with IronOCR IronOCR, a C# library developed by Iron Software, provides advanced OCR capabilities. It offers accurate text extraction from images, PDFs, and scanned documents. Before we dive into the code, make sure you have IronOCR installed in your project. Key features of IronOCR from Iron Software Improved Tesseract OCR Engine IronOCR elevates the capabilities of the widely used Tesseract OCR engine by enhancing both accuracy and speed. It serves as a robust solution for extracting text from various sources, including images, PDFs, and diverse document formats. Wide Language Coverage With support for over 125 languages, IronOCR is adept at handling multilingual requirements, making it an ideal choice for applications demanding linguistic versatility. Versatile Output Choices Extracted text can be conveniently outputted as plain text or structured data for seamless integration into further processing pipelines. Additionally, IronOCR facilitates the creation of searchable PDFs directly from image inputs. Cross-Platform Adaptability Engineered for compatibility with C#, F#, and VB.NET, IronOCR seamlessly operates across various .NET environments including versions 8, 7, 6, Core, Standard, and Framework. Leveraging Tesseract 5 IronOCR harnesses the power of Tesseract 5, finely tailored for optimal performance within the .NET ecosystem. Zone-Based OCR Capability With IronOCR, users can precisely define specific zones within documents, enabling targeted OCR processing. This feature enhances accuracy and efficiency by focusing processing power where it's needed most. Image Preprocessing Tools The library offers a suite of image preprocessing functionalities such as de-skewing and noise reduction. These tools ensure superior results even when dealing with imperfect source images, ultimately enhancing the overall OCR experience. Now, we will develop a demo application that utilizes IronOCR to read text from images. Prerequisites Visual Studio: Ensure you have installed Visual Studio or any other C# development environment. NuGet Package Manager: Ensure NuGet is present in order to manage packages in your project. Step 1: Create a New C# Project in Visual Studio To start with, let us create a new console application using Visual Studio as shown below. Provide a project name and location below. Select the required .NET Version for the project. Click the Create button to create the new project. Step 2: Install the IronOCR library and integrate it into your project. IronOCR can be found in the NuGet package manager console as shown below. Use the command provided to install the package. Using the Visual Studio NuGet Package Manager, search for IronOCR and install it to your project folder. Once installed, the application is ready to make use of IronOCR to read text from images. Step 3: Utilize the IronOCR Tesseract to read text from images IronOCR stands out as the exclusive .NET library offering Tesseract 5 OCR capabilities. At present, it holds the distinction of being the most sophisticated Tesseract 5 library across all programming languages. IronOCR seamlessly integrates Tesseract 5 into various .NET environments, including Framework, Standard, Core, Xamarin, and Mono, ensuring comprehensive support across the ecosystem. Consider the below image file as input. Now, let's see how to read the text in this image file. using IronOcr; public class Program { public static void Main(string[] args) { var ocrTesseract = new IronTesseract(); using var ocrInput = new OcrInput(); ocrInput.LoadImage(@"sample1.png"); var ocrResult = ocrTesseract.Read(ocrInput); Console.WriteLine(ocrResult.Text); } } using IronOcr; public class Program { public static void Main(string[] args) { var ocrTesseract = new IronTesseract(); using var ocrInput = new OcrInput(); ocrInput.LoadImage(@"sample1.png"); var ocrResult = ocrTesseract.Read(ocrInput); Console.WriteLine(ocrResult.Text); } } Imports IronOcr Public Class Program Public Shared Sub Main(ByVal args() As String) Dim ocrTesseract = New IronTesseract() Dim ocrInput As New OcrInput() ocrInput.LoadImage("sample1.png") Dim ocrResult = ocrTesseract.Read(ocrInput) Console.WriteLine(ocrResult.Text) End Sub End Class $vbLabelText $csharpLabel Code Explanation IronTesseract Instance: We start by creating an instance of IronTesseract to perform OCR operations. Loading Image: We load the sample image into the OcrInput object. Reading Text: The text in the image is read, and the result is printed to the console. Output Step 4: Utilize the IronOCR advanced features to read the text in images The IronTesseract.Configuration object grants advanced users access to the underlying Tesseract API within C#/.NET, enabling detailed setup configuration for fine-tuning and optimization. Below are some of the advanced configurations possible. Language Selection You can specify the language for OCR using the Language property. For instance, to set the language to English, use: IronTesseract ocr = new IronTesseract(); ocr.Language = OcrLanguage.English; IronTesseract ocr = new IronTesseract(); ocr.Language = OcrLanguage.English; Dim ocr As New IronTesseract() ocr.Language = OcrLanguage.English $vbLabelText $csharpLabel Page Segmentation Mode The PageSegmentationMode determines how Tesseract segments the input image. Options include AutoOsd, SingleBlock, SingleLine, and more. For example: ocr.Configuration.PageSegmentationMode = TesseractPageSegmentationMode.AutoOsd; ocr.Configuration.PageSegmentationMode = TesseractPageSegmentationMode.AutoOsd; ocr.Configuration.PageSegmentationMode = TesseractPageSegmentationMode.AutoOsd $vbLabelText $csharpLabel Custom Tesseract Variables You can fine-tune Tesseract by setting specific variables. For instance, to disable parallelization: ocr.Configuration.TesseractVariables["tessedit_parallelize"] = false; ocr.Configuration.TesseractVariables["tessedit_parallelize"] = false; ocr.Configuration.TesseractVariables("tessedit_parallelize") = False $vbLabelText $csharpLabel Whitelisting and Blacklisting Characters Use WhiteListCharacters and BlackListCharacters to control which characters Tesseract recognizes. For example: ocr.Configuration.WhiteListCharacters = "ABCDEFGHIJKLMNOPQRSTUVWXYZ"; ocr.Configuration.BlackListCharacters = "`ë|^"; ocr.Configuration.WhiteListCharacters = "ABCDEFGHIJKLMNOPQRSTUVWXYZ"; ocr.Configuration.BlackListCharacters = "`ë|^"; ocr.Configuration.WhiteListCharacters = "ABCDEFGHIJKLMNOPQRSTUVWXYZ" ocr.Configuration.BlackListCharacters = "`ë|^" $vbLabelText $csharpLabel Additional Configuration Variables Explore other Tesseract configuration variables to customize behavior according to your needs. For instance: ocr.Configuration.TesseractVariables["classify_num_cp_levels"] = 3; ocr.Configuration.TesseractVariables["textord_debug_tabfind"] = 0; // ... (more variables) ocr.Configuration.TesseractVariables["classify_num_cp_levels"] = 3; ocr.Configuration.TesseractVariables["textord_debug_tabfind"] = 0; // ... (more variables) ocr.Configuration.TesseractVariables("classify_num_cp_levels") = 3 ocr.Configuration.TesseractVariables("textord_debug_tabfind") = 0 ' ... (more variables) $vbLabelText $csharpLabel Now let us try to decode the same image using advanced settings using IronOcr; public class Program { public static void Main() { Console.WriteLine("Decoding using advanced features"); var ocrTesseract = new IronTesseract() // Create instance { Language = OcrLanguage.EnglishBest, // Configure best English language Configuration = new TesseractConfiguration() { ReadBarCodes = false, // Disable reading barcodes BlackListCharacters = "`ë|^", // Blacklisted characters WhiteListCharacters = null, // No whitelist, allow all PageSegmentationMode = TesseractPageSegmentationMode.AutoOsd, TesseractVariables = null, // No custom variables used }, MultiThreaded = false, }; using var ocrInput = new OcrInput(); // Create a disposible ocr input object ocrInput.AddImage(@"sample1.png"); // Load the sample image var ocrResult = ocrTesseract.Read(ocrInput); // Read the text from the image Console.WriteLine(ocrResult.Text); // Output the text } } using IronOcr; public class Program { public static void Main() { Console.WriteLine("Decoding using advanced features"); var ocrTesseract = new IronTesseract() // Create instance { Language = OcrLanguage.EnglishBest, // Configure best English language Configuration = new TesseractConfiguration() { ReadBarCodes = false, // Disable reading barcodes BlackListCharacters = "`ë|^", // Blacklisted characters WhiteListCharacters = null, // No whitelist, allow all PageSegmentationMode = TesseractPageSegmentationMode.AutoOsd, TesseractVariables = null, // No custom variables used }, MultiThreaded = false, }; using var ocrInput = new OcrInput(); // Create a disposible ocr input object ocrInput.AddImage(@"sample1.png"); // Load the sample image var ocrResult = ocrTesseract.Read(ocrInput); // Read the text from the image Console.WriteLine(ocrResult.Text); // Output the text } } Imports IronOcr Public Class Program Public Shared Sub Main() Console.WriteLine("Decoding using advanced features") Dim ocrTesseract = New IronTesseract() With { .Language = OcrLanguage.EnglishBest, .Configuration = New TesseractConfiguration() With { .ReadBarCodes = False, .BlackListCharacters = "`ë|^", .WhiteListCharacters = Nothing, .PageSegmentationMode = TesseractPageSegmentationMode.AutoOsd, .TesseractVariables = Nothing }, .MultiThreaded = False } Dim ocrInput As New OcrInput() ' Create a disposible ocr input object ocrInput.AddImage("sample1.png") ' Load the sample image Dim ocrResult = ocrTesseract.Read(ocrInput) ' Read the text from the image Console.WriteLine(ocrResult.Text) ' Output the text End Sub End Class $vbLabelText $csharpLabel Code Explanation IronOCR Configuration: An instance of IronTesseract (the main IronOCR class) is created and assigned to the variable ocrTesseract. Configuration settings are applied to ocrTesseract: Language: Specifies the language for OCR (in this case, English). Configuration: A TesseractConfiguration object that allows further customization: ReadBarCodes: Disables reading barcodes. BlackListCharacters: Specifies characters to blacklist (characters not to recognize). WhiteListCharacters: No whitelist specified, allowing all characters. PageSegmentationMode: Sets the page segmentation mode to "AutoOsd." TesseractVariables: No custom variables were used. MultiThreaded: Disables multithreading. OCR Input and Image Loading: A using block creates a disposable ocrInput object of type OcrInput. The image file "sample1.png" is added to ocrInput. Text Extraction: The Read method is called on ocrTesseract, passing in ocrInput. The result is stored in the ocrResult variable. Output: The extracted text is printed to the console using Console.WriteLine(ocrResult.Text). Output Step 5: Performance Tuning of IronOCR Read Operation. When working with IronOCR, you have access to various image filters that can help preprocess images before performing OCR. These filters optimize the image quality, enhance visibility, and reduce noise or artifacts. They help to improve the performance of the OCR operation. Rotate: The Rotate filter allows you to rotate images by a specified number of degrees clockwise. For anti-clockwise rotation, use negative numbers. Deskew: The Deskew filter corrects image skew, ensuring that the text is upright and orthogonal. This is particularly useful for OCR because Tesseract performs best with properly oriented scans. Scale: The Scale filter proportionally scales OCR input pages. Binarize: The Binarize filter converts every pixel to either black or white, with no middle ground. It can improve OCR performance in cases of very low contrast between text and background. ToGrayScale: The ToGrayScale filter converts every pixel to a shade of grayscale. While unlikely to significantly improve OCR accuracy, it may enhance speed. Invert: The Invert filter reverses colors—white becomes black, and black becomes white. ReplaceColor: The ReplaceColor filter replaces a specific color within an image with another color, considering a certain threshold. Contrast: The Contrast filter automatically increases contrast. It often improves OCR speed and accuracy in low-contrast scans. Dilate and Erode: These advanced morphology filters manipulate object boundaries in an image. Dilate adds pixels to object boundaries. Erode removes pixels from object boundaries. Sharpen: The Sharpen filter sharpens blurred OCR documents and flattens alpha channels to white. DeNoise: The DeNoise filter removes digital noise. Use it where noise is expected. DeepCleanBackgroundNoise: This heavy background noise removal filter should be used only when extreme document background noise is known. It may reduce OCR accuracy for clean documents and is CPU-intensive. EnhanceResolution: The EnhanceResolution filter enhances the resolution of low-quality images. It’s not often needed due to automatic resolution handling. Here’s an example of how to apply filters using IronOCR in C#: var ocr = new IronTesseract(); var input = new OcrInput(); input.LoadImage("sample.png"); input.Deskew(); var result = ocr.Read(input); Console.WriteLine(result.Text); var ocr = new IronTesseract(); var input = new OcrInput(); input.LoadImage("sample.png"); input.Deskew(); var result = ocr.Read(input); Console.WriteLine(result.Text); Dim ocr = New IronTesseract() Dim input = New OcrInput() input.LoadImage("sample.png") input.Deskew() Dim result = ocr.Read(input) Console.WriteLine(result.Text) $vbLabelText $csharpLabel Common OCR Applications Document Digitization: OCR is widely used to convert scanned paper documents, such as invoices, receipts, forms, and contracts, into digital formats. This digitization process streamlines document storage, retrieval, and management, reducing paper clutter and improving efficiency. Data Extraction: OCR enables the extraction of text and data from scanned documents, images, and PDFs. This extracted data can be used for automated data entry, content analysis, indexing, and integration into databases or business systems. Text Recognition in Images: OCR technology allows extracting text from printed documents and images for indexing and search purposes. This capability is utilized in various applications, including augmented reality, image-based search engines, and translation services. Automatic License Plate Recognition (ALPR): ALPR systems utilize OCR to read license plate numbers from images or video streams captured by cameras installed in traffic surveillance, parking management, toll collection, and law enforcement applications. Accessibility Solutions: OCR plays a crucial role in creating accessible content for individuals with visual impairments. By converting text from images or documents into speech or braille, OCR helps make information accessible to people with disabilities. Identity Verification: OCR technology is employed in identity verification processes, such as scanning and processing identity documents like passports, driver's licenses, and IDs. It assists in verifying the authenticity of documents and extracting relevant information for identity verification purposes. Banking and Finance: OCR is used in banking and finance for tasks such as reading checks, processing invoices, converting an existing PDF document, extracting data from financial statements, and automating document-based workflows to enhance accuracy and efficiency in financial operations. Automated Translation: OCR technology is integrated into translation tools and language learning apps to convert printed text from one language to another. Users can capture text with their devices, and OCR assists in translating it into the desired language in real time. Archival and Historical Document Preservation: OCR is utilized in digitizing archival materials and historical documents, preserving them in digital formats for future access, research, and analysis while ensuring the preservation of valuable cultural heritage. License Requirements IronOCR. Provide the below details to get the key delivered to your email ID Once the key is obtained either by purchase or free trial, follow the below steps to use the key. Setting Your License Key: Set your IronOCR license key using the code. Add the following line to your application startup (before using IronOCR): IronOcr.License.LicenseKey = "IRONOCR-MYLICENSE-KEY-1EF01"; IronOcr.License.LicenseKey = "IRONOCR-MYLICENSE-KEY-1EF01"; IronOcr.License.LicenseKey = "IRONOCR-MYLICENSE-KEY-1EF01" $vbLabelText $csharpLabel Global Application Key (Web.Config or App.Config): To apply a key globally across your application, use the configuration file (Web.Config or App.Config). Add the following key to your appSettings: <configuration> <!-- Other settings --> <appSettings> <add key="IronOcr.LicenseKey" value="IRONOCR-MYLICENSE-KEY-1EF01"/> </appSettings> </configuration> <configuration> <!-- Other settings --> <appSettings> <add key="IronOcr.LicenseKey" value="IRONOCR-MYLICENSE-KEY-1EF01"/> </appSettings> </configuration> XML Using .NET Core appsettings.json: For .NET Core applications, create an appsettings.json file in your project’s root directory. Replace the "IronOcr.LicenseKey" key with your license value: { "IronOcr.LicenseKey": "IRONOCR-MYLICENSE-KEY-1EF01" } Testing Your License Key: Verify that your key has been installed correctly by testing it: bool result = IronOcr.License.IsValidLicense("IRONOCR-MYLICENSE-KEY-1EF01"); bool result = IronOcr.License.IsValidLicense("IRONOCR-MYLICENSE-KEY-1EF01"); Dim result As Boolean = IronOcr.License.IsValidLicense("IRONOCR-MYLICENSE-KEY-1EF01") $vbLabelText $csharpLabel Conclusion In conclusion, IronOCR, offers a robust solution for OCR starting at $799. Embrace the power of OCR with IronOCR and unlock a world of possibilities in your C# projects. よくある質問 C#で文字認識を行うにはどうすればいいですか? C#で文字認識を行うには、IronOCRを使用できます。まずVisual Studioで新しいC#プロジェクトを作成し、次にNuGetパッケージマネージャーを通じてIronOCR .NETライブラリをインストールします。IronOCRのクラスとメソッドを使用して、画像、PDF、またはスキャンされたドキュメントからテキストを抽出します。 IronOCRを使用してテキスト抽出を行う利点は何ですか? IronOCRは、Tesseract OCRエンジンの精度と速度を向上させることでテキスト抽出を強化します。125以上の言語をサポートし、ゾーンベースのOCR機能を提供し、OCR結果を最適化するための画像前処理ツールを提供します。 IronOCRを使用してOCR精度を最適化するにはどうすればよいですか? IronOCRでOCR精度を最適化するには、回転、傾き補正、コントラスト調整などの画像前処理ツールを使用してください。また、言語選択、ページ分割、文字のホワイトリストまたはブラックリストで微調整が可能です。 OCR技術の一般的な用途は何ですか? OCR技術は、文書のデジタル化、データ抽出、画像内のテキスト認識、自動ナンバープレート認識、アクセシビリティソリューションなどで一般的に使用されます。また、銀行、身元確認、アーカイブ文書の保存にも寄与します。 IronOCRがサポートできる環境にはどんなものがありますか? IronOCRは、C#、F#、VB.NETなどさまざまな.NET環境と互換性があります。8、7、6、Core、Standard、Frameworkの.NETバージョンをサポートしており、多くの開発セットアップに対応しています。 IronOCRのライセンスをどのように扱えばいいですか? IronOCRのライセンスを扱うには、アプリケーションのスタートアップでIronOcr.License.LicenseKeyプロパティを使用してライセンスキーを適用します。あるいは、グローバルにWeb.ConfigやApp.Configファイル、または.NET Coreアプリケーションのappsettings.jsonに設定することも可能です。 IronOCRはどのような高度な機能を提供していますか? IronOCRは、言語選択、ページ分割モード、カスタムTesseract変数、文字のホワイトリストまたはブラックリストを含む高度な機能を提供します。これらの機能は、OCR操作の詳細なカスタマイズと最適化を可能にします。 IronOCRはゾーンベースのOCRをサポートしていますか? はい、IronOCRはゾーンベースのOCRをサポートしており、テキスト抽出のために画像やドキュメントの特定の領域を指定することができます。この機能は、複雑なレイアウトからのターゲットデータ抽出に役立ちます。 自分のC#プロジェクトにOCRを統合するにはどうすれば良いですか? C#プロジェクトにOCRを統合するには、NuGetからIronOCRライブラリをインストールしてプロジェクトで参照します。ライブラリのメソッドを利用して、アプリケーションにOCR機能を実装し、画像やスキャンされたドキュメントを処理できるようにします。 Kannapat Udonpant 今すぐエンジニアリングチームとチャット ソフトウェアエンジニア ソフトウェアエンジニアになる前に、Kannapatは北海道大学で環境資源の博士号を修了しました。博士号を追求する間に、彼はバイオプロダクションエンジニアリング学科の一部である車両ロボティクスラボラトリーのメンバーになりました。2022年には、C#のスキルを活用してIron Softwareのエンジニアリングチームに参加し、IronPDFに注力しています。Kannapatは、IronPDFの多くのコードを執筆している開発者から直接学んでいるため、この仕事を大切にしています。同僚から学びながら、Iron Softwareでの働く社会的側面も楽しんでいます。コードやドキュメントを書いていない時は、KannapatはPS5でゲームをしたり、『The Last of Us』を再視聴したりしていることが多いです。 関連する記事 公開日 9月 29, 2025 IronOCRを使用して.NET OCR SDKを作成する方法 IronOCRの.NET SDKで強力なOCRソリューションを構築。シンプルなAPI、エンタープライズ機能、クロスプラットフォーム対応。 詳しく読む 公開日 9月 29, 2025 IronOCRを使用してC# GitHubプロジェクトにOCRを統合する方法 OCR C# GitHubチュートリアル:IronOCRを使用してGitHubプロジェクトにテキスト認識を実装。コードサンプルとバージョン管理のヒントを含む。 詳しく読む 更新日 9月 4, 2025 私たちが文書処理メモリを98%削減した方法:IronOCRのエンジニアリングブレークスルー IronOCR 2025.9は、TIFF処理メモリを98%削減するストリーミングアーキテクチャを採用し、クラッシュを回避し、企業のワークフローのために速度を向上。 詳しく読む C#での車両登録OCRの実行方法C#でOCRを使用して身分証明...
公開日 9月 29, 2025 IronOCRを使用して.NET OCR SDKを作成する方法 IronOCRの.NET SDKで強力なOCRソリューションを構築。シンプルなAPI、エンタープライズ機能、クロスプラットフォーム対応。 詳しく読む
公開日 9月 29, 2025 IronOCRを使用してC# GitHubプロジェクトにOCRを統合する方法 OCR C# GitHubチュートリアル:IronOCRを使用してGitHubプロジェクトにテキスト認識を実装。コードサンプルとバージョン管理のヒントを含む。 詳しく読む
更新日 9月 4, 2025 私たちが文書処理メモリを98%削減した方法:IronOCRのエンジニアリングブレークスルー IronOCR 2025.9は、TIFF処理メモリを98%削減するストリーミングアーキテクチャを採用し、クラッシュを回避し、企業のワークフローのために速度を向上。 詳しく読む