IronOCR How-Tos Streams How to Read PDFs Chaknith Bin Updated:July 28, 2025 PDF stands for "Portable Document Format." It is a file format developed by Adobe that preserves the fonts, images, graphics, and layout of any source document, regardless of the application and platform used to create it. PDF files are typically used for sharing and viewing documents in a consistent format, irrespective of the software or hardware used to open them. IronOcr handles various versions of PDF documents with ease. Get started with IronOCR Start using IronOCR in your project today with a free trial. First Step: Start for Free How to Read PDFs Download a C# library for reading PDFs Prepare the PDF document for reading Construct the OcrPdfInput object with PDF file path Employ the Read method to perform OCR on the imported PDF Read specific pages by providing the page indices list Read PDF Example Begin by instantiating the IronTesseract class to perform OCR. Then, utilize a 'using' statement to create an OcrPdfInput object, passing the PDF file path to it. Finally, perform OCR using the Read method. :path=/static-assets/ocr/content-code-examples/how-to/input-pdfs-read-pdf.cs using IronOcr; // Instantiate IronTesseract IronTesseract ocrTesseract = new IronTesseract(); // Add PDF using var pdfInput = new OcrPdfInput("Potter.pdf"); // Perform OCR OcrResult ocrResult = ocrTesseract.Read(pdfInput); Imports IronOcr ' Instantiate IronTesseract Private ocrTesseract As New IronTesseract() ' Add PDF Private pdfInput = New OcrPdfInput("Potter.pdf") ' Perform OCR Private ocrResult As OcrResult = ocrTesseract.Read(pdfInput) $vbLabelText $csharpLabel In most cases, there's no need to specify the DPI property. However, providing a high DPI number in the construction of OcrPdfInput can enhance reading accuracy. Read PDF Pages Example When reading specific pages from a PDF document, the user can specify the page index number for import. To do this, pass the list of page indices to the PageIndices parameter when constructing the OcrPdfInput. Keep in mind that page indices use zero-based numbering. :path=/static-assets/ocr/content-code-examples/how-to/input-pdfs-read-pdf-pages.cs using IronOcr; using System.Collections.Generic; // Instantiate IronTesseract IronTesseract ocrTesseract = new IronTesseract(); // Create page indices list List<int> pageIndices = new List<int>() { 0, 2 }; // Add PDF using var pdfInput = new OcrPdfInput("Potter.pdf", PageIndices: pageIndices); // Perform OCR OcrResult ocrResult = ocrTesseract.Read(pdfInput); Imports IronOcr Imports System.Collections.Generic ' Instantiate IronTesseract Private ocrTesseract As New IronTesseract() ' Create page indices list Private pageIndices As New List(Of Integer)() From {0, 2} ' Add PDF Private pdfInput = New OcrPdfInput("Potter.pdf", PageIndices:= pageIndices) ' Perform OCR Private ocrResult As OcrResult = ocrTesseract.Read(pdfInput) $vbLabelText $csharpLabel Specify Scan Region By narrowing down the area to be read, you can significantly enhance the reading efficiency. To achieve this, you can specify the precise region of the imported PDF that needs to be read. In the code example below, I have instructed IronOcr to focus solely on extracting the chapter number and title. :path=/static-assets/ocr/content-code-examples/how-to/input-pdfs-read-specific-region.cs using IronOcr; using IronSoftware.Drawing; using System; // Instantiate IronTesseract IronTesseract ocrTesseract = new IronTesseract(); // Specify crop regions Rectangle[] scanRegions = { new Rectangle(550, 100, 600, 300) }; // Add PDF using (var pdfInput = new OcrPdfInput("Potter.pdf", ContentAreas: scanRegions)) { // Perform OCR OcrResult ocrResult = ocrTesseract.Read(pdfInput); // Output the result to console Console.WriteLine(ocrResult.Text); } Imports IronOcr Imports IronSoftware.Drawing Imports System ' Instantiate IronTesseract Private ocrTesseract As New IronTesseract() ' Specify crop regions Private scanRegions() As Rectangle = { New Rectangle(550, 100, 600, 300) } ' Add PDF Using pdfInput = New OcrPdfInput("Potter.pdf", ContentAreas:= scanRegions) ' Perform OCR Dim ocrResult As OcrResult = ocrTesseract.Read(pdfInput) ' Output the result to console Console.WriteLine(ocrResult.Text) End Using $vbLabelText $csharpLabel OCR Result Frequently Asked Questions How can I read a PDF file in C#? You can read a PDF file in C# by using IronOCR. Start by instantiating the IronTesseract class, then use a 'using' statement to create an OcrPdfInput object with the file path. Finally, apply the Read method to perform OCR on the document. What steps are required to perform OCR on specific pages of a PDF? To perform OCR on specific pages of a PDF using IronOCR, pass a list of page indices to the PageIndices parameter when constructing the OcrPdfInput. Page indices in IronOCR are zero-based, so the first page is indexed as 0. How can I improve the accuracy of OCR on PDFs? You can improve the accuracy of OCR on PDFs in IronOCR by specifying a high DPI during the construction of the OcrPdfInput. Although usually not necessary, a higher DPI can enhance reading precision. Is it possible to select a specific region of a PDF for OCR processing? Yes, with IronOCR, you can select a specific region of a PDF for OCR processing by using the SelectRegion method. This allows you to focus on extracting content from a defined area, improving efficiency. What is the significance of zero-based numbering in reading PDF pages? In IronOCR, zero-based numbering is used for specifying page indices when reading PDF pages. This means the first page is indexed as 0, which helps accurately specify which pages to process. Do I need to manage resources manually when performing OCR on PDFs? When using IronOCR, it is recommended to use a 'using' statement when working with OcrInput objects. This ensures that resources are disposed of properly after the OCR process is complete. How can I get started with using IronOCR for PDF reading? To get started with IronOCR for PDF reading, download the C# library from NuGet, prepare your PDF, construct an OcrPdfInput object with the file path, and use the Read method for OCR processing. Chaknith Bin Chat with engineering team now Software Engineer Chaknith works on IronXL and IronBarcode. He has deep expertise in C# and .NET, helping improve the software and support customers. His insights from user interactions contribute to better products, documentation, and overall experience. Reviewed by Jeffrey T. Fritz Principal Program Manager - .NET Community Team Jeff is also a Principal Program Manager for the .NET and Visual Studio teams. He is the executive producer of the .NET Conf virtual conference series and hosts 'Fritz and Friends' a live stream for developers that airs twice weekly where he talks tech and writes code together with viewers. Jeff writes workshops, presentations, and plans content for the largest Microsoft developer events including Microsoft Build, Microsoft Ignite, .NET Conf, and the Microsoft MVP Summit Ready to Get Started? Free NuGet Download Total downloads: 4,306,473 View Licenses