OCRツール PDFからのOCR(無料のオンラインツール) Kannapat Udonpant 更新日:6月 22, 2025 Download IronOCR NuGet Download テキストの検索と置換 テキストと画像のスタンプ Start Free Trial Copy for LLMs Copy for LLMs Copy page as Markdown for LLMs Open in ChatGPT Ask ChatGPT about this page Open in Gemini Ask Gemini about this page Open in Grok Ask Grok about this page Open in Perplexity Ask Perplexity about this page Share Share on Facebook Share on X (Twitter) Share on LinkedIn Copy URL Email article Optical Character Recognition, or OCR, is a technology used to recognize text in images. This technology has been created to scan printed text or an image file and recognize them on computers. This is because many things today are digital, such as e-mails or books. However, OCR technology has evolved into something more sophisticated with specialized algorithms capable of recognizing text in many different fonts, even if they have been distorted by noise or other common distortions like JPEG compression. OCR can also read handwriting on paper with 98% accuracy. Text that is scanned using OCR can then be edited, indexed, searched, printed out, and archived. OCR software is widely used in the healthcare, pharma, insurance, and law industries. It helps convert paper documents to digital documents so they can be reused more easily and shared with others. Let's see how you can do OCR of PDF files using different tools. Adobe Acrobat Pro Adobe is the company that initially developed PDF. They offer a fast, efficient OCR engine that can edit any PDF document you throw at it. It’s one of the most powerful OCR engines in the market, and if you have lots of PDFs to edit, Adobe Acrobat DC is what you should purchase. This software has been designed in such a way that it can convert any text-based document into PDF format with great accuracy. It also retains the font of the original document using its Custom Font generator. Let's see how we can do PDF OCR using Adobe Acrobat: Open the file in Adobe Acrobat Pro DC. Click on the "Edit PDF" option in the right pane. It will convert a PDF file to an editable PDF using its OCR capabilities. Now, you can edit any text and change image files in the documents easily. You can save the file by choosing "File > Save As" and giving a proper name to the new PDF document. You can easily perform OCR of multiple scanned PDF documents at a time. Sejda Sejda is OCR-enabled PDF editing software that can be hosted on the cloud or downloaded as a desktop application to macOS, Windows, or Linux. Sejda allows users to compress, edit, digitally sign, merge, and fill out PDF files. Files in various formats, including JPEG and Excel, for example, can be turned into PDF files. PDFs can similarly be turned into other formats such as Word and PowerPoint documents. Let's see how you can do OCR of PDF documents using Sejda OCR. Open Sejda OCR website. Click on the "Upload PDF file" button to upload files, or drag and drop files from your computer. After uploading, you'll see the uploaded file name. Select the language of the document. After selecting the language, you have to choose the output format. You can choose "PDF" or "Text". After setting the output format, click on the "Recognize text on all pages" button. It'll start extracting text. When the process is completed, you can download the extracted text. SodaPDF SodaPDF OCR is free online OCR software that can extract text from images. It is a PDF OCR conversion tool that converts scanned documents, faxes, and other printouts into editable text, PDFs, and searchable PDFs. The most common use case of SodaPDF OCR is for converting scanned documents or faxes into editable files. It is free online OCR software. All uploaded documents are automatically deleted from the server after a specific time. It has multiple features like converting PDF to Word, which can then be opened using Microsoft Word. Let's see how we can perform OCR on a PDF using SodaPDF: Open the SodaPDF website. Click the "Choose File" button and select the desired PDF documents to upload. After uploading, it'll give you a user interface for editing the PDF text and images. You can download the file using the Download button. IronOCR: .NET OCR Library IronOCR is a robust library for OCR in the .NET Framework. It provides a powerful API to work with text and images, offering features like real-time recognition, field detection, and optical character recognition for scanned PDF files. IronPDF can also edit scanned documents. IronOCR gives developers the power of text recognition in their applications. It can be used for various purposes, like converting scanned documents into digital formats or recognizing captions on images. The IronOCR .NET Library provides an easy-to-use, low-level interface to the IronOCR SDK. On top of that, it includes an image processing pipeline that automatically handles low-DPI images and extracts text from PDF documents. Let's see how we can do OCR of a PDF file using the OCR tool: OCR of a Complete PDF File The following code can perform OCR on an entire PDF document. using IronOcr; var Ocr = new IronTesseract(); using (var Input = new OcrInput()) { // Add the entire PDF document for OCR processing Input.AddPdf("example.pdf", "password"); var Result = Ocr.Read(Input); // Print the extracted text to the console Console.WriteLine(Result.Text); } using IronOcr; var Ocr = new IronTesseract(); using (var Input = new OcrInput()) { // Add the entire PDF document for OCR processing Input.AddPdf("example.pdf", "password"); var Result = Ocr.Read(Input); // Print the extracted text to the console Console.WriteLine(Result.Text); } Imports IronOcr Private Ocr = New IronTesseract() Using Input = New OcrInput() ' Add the entire PDF document for OCR processing Input.AddPdf("example.pdf", "password") Dim Result = Ocr.Read(Input) ' Print the extracted text to the console Console.WriteLine(Result.Text) End Using $vbLabelText $csharpLabel OCR of Selected Pages of a PDF You can do OCR on selected PDF pages by using the AddPdfPages function. using IronOcr; var Ocr = new IronTesseract(); using (var Input = new OcrInput()) { // Add specific pages of the PDF document for OCR processing Input.AddPdfPages("example.pdf", new [] { 1, 2, 3 }, "password"); var Result = Ocr.Read(Input); // Print the extracted text to the console Console.WriteLine(Result.Text); } using IronOcr; var Ocr = new IronTesseract(); using (var Input = new OcrInput()) { // Add specific pages of the PDF document for OCR processing Input.AddPdfPages("example.pdf", new [] { 1, 2, 3 }, "password"); var Result = Ocr.Read(Input); // Print the extracted text to the console Console.WriteLine(Result.Text); } Imports IronOcr Private Ocr = New IronTesseract() Using Input = New OcrInput() ' Add specific pages of the PDF document for OCR processing Input.AddPdfPages("example.pdf", { 1, 2, 3 }, "password") Dim Result = Ocr.Read(Input) ' Print the extracted text to the console Console.WriteLine(Result.Text) End Using $vbLabelText $csharpLabel Convert PDF to Searchable PDF You can convert a PDF file to a searchable PDF file using IronOCR by using the SaveAsSearchablePdf function. using IronOcr; var Ocr = new IronTesseract(); using (var Input = new OcrInput()) { // Add the PDF for processing and specify the password if any Input.AddPdf("scan.pdf", "password"); // Correct twisted or skewed pages Input.Deskew(); var Result = Ocr.Read(Input); // Save the processed result as a searchable PDF Result.SaveAsSearchablePdf("searchable.pdf"); } using IronOcr; var Ocr = new IronTesseract(); using (var Input = new OcrInput()) { // Add the PDF for processing and specify the password if any Input.AddPdf("scan.pdf", "password"); // Correct twisted or skewed pages Input.Deskew(); var Result = Ocr.Read(Input); // Save the processed result as a searchable PDF Result.SaveAsSearchablePdf("searchable.pdf"); } Imports IronOcr Private Ocr = New IronTesseract() Using Input = New OcrInput() ' Add the PDF for processing and specify the password if any Input.AddPdf("scan.pdf", "password") ' Correct twisted or skewed pages Input.Deskew() Dim Result = Ocr.Read(Input) ' Save the processed result as a searchable PDF Result.SaveAsSearchablePdf("searchable.pdf") End Using $vbLabelText $csharpLabel Conclusion We have explored a few great software tools to perform optical character recognition. These tools allow you to programmatically recognize text and create searchable and editable PDFs. If writing in the .NET Framework, IronOCR is our recommendation. IronOCR allows you to easily perform OCR in the .NET Framework; it is powerful and so can easily be used even when the original document has been damaged or distorted, such as through water damage. Another use case is converting old paper forms filled out by hand, such as invoices and sales receipts, into digital versions. This allows these documents to be processed automatically by accounting software, thereby increasing accuracy and efficiency. Kannapat Udonpant 今すぐエンジニアリングチームとチャット ソフトウェアエンジニア ソフトウェアエンジニアになる前に、Kannapatは北海道大学で環境資源の博士号を修了しました。博士号を追求する間に、彼はバイオプロダクションエンジニアリング学科の一部である車両ロボティクスラボラトリーのメンバーになりました。2022年には、C#のスキルを活用してIron Softwareのエンジニアリングチームに参加し、IronPDFに注力しています。Kannapatは、IronPDFの多くのコードを執筆している開発者から直接学んでいるため、この仕事を大切にしています。同僚から学びながら、Iron Softwareでの働く社会的側面も楽しんでいます。コードやドキュメントを書いていない時は、KannapatはPS5でゲームをしたり、『The Last of Us』を再視聴したりしていることが多いです。 関連する記事 更新日 6月 22, 2025 Power Automate OCR(開発者向けチュートリアル) この光学文字認識技術は、ドキュメントのデジタル化、自動化されたPDFデータの抽出とエントリ、請求書処理、スキャンPDFの検索可能化に応用されます。 詳しく読む 更新日 6月 22, 2025 Easyocr対Tesseract (OCR機能の比較) EasyOCR、Tesseract OCR、Keras-OCR、IronOCRのような人気のOCRツールやライブラリは、現代のアプリケーションにこの機能を統合するためによく利用されています。 詳しく読む 更新日 6月 22, 2025 画像をテキストに変換する方法 現代のデジタル時代では、画像ベースのコンテンツを読みやすい編集可能で検索可能なテキストに変換することが重要です。 詳しく読む Tesseractのインストール(画像付きステップバイステップチュートリアル)PDFのOCR方法(無料オンラ...
更新日 6月 22, 2025 Power Automate OCR(開発者向けチュートリアル) この光学文字認識技術は、ドキュメントのデジタル化、自動化されたPDFデータの抽出とエントリ、請求書処理、スキャンPDFの検索可能化に応用されます。 詳しく読む
更新日 6月 22, 2025 Easyocr対Tesseract (OCR機能の比較) EasyOCR、Tesseract OCR、Keras-OCR、IronOCRのような人気のOCRツールやライブラリは、現代のアプリケーションにこの機能を統合するためによく利用されています。 詳しく読む