如何閱讀 PDF

查克尼思·賓

2023年10月25日

已更新 2025年1月8日

Translated

View the article in English

PDF代表“便攜式文件格式”。它是Adobe開發的一種文件格式，無論源文件是使用哪種應用程序和平台創建的，都能保留字體、圖像、圖形和布局。 PDF 文件通常用於以一致的格式共享和查看文件，無論使用什麼軟件或硬件打開它。 IronOcr 輕鬆處理各種版本的 PDF 文件。

開始使用IronOCR

立即在您的專案中使用IronOCR，並享受免費試用。

第一步：

如何閱讀 PDF

下載用於閱讀PDF的C#函式庫
準備 PDF 文件以供閱讀
使用 PDF 檔案路徑構建 OcrPdfInput 物件
使用Read方法對匯入的PDF進行OCR
提供頁碼列表以讀取特定頁面

讀取 PDF 範例

首先實例化 IronTesseract 類以執行 OCR。然後，使用 'using' 語句來創建一個 OcrPdfInput 對象，並將 PDF 文件路徑傳遞給它。最後，使用Read方法執行OCR。

:path=/static-assets/ocr/content-code-examples/how-to/input-pdfs-read-pdf.cs

using IronOcr;

// Instantiate IronTesseract
IronTesseract ocrTesseract = new IronTesseract();

// Add PDF
using var pdfInput = new OcrPdfInput("Potter.pdf");
// Perform OCR
OcrResult ocrResult = ocrTesseract.Read(pdfInput);

Imports IronOcr

' Instantiate IronTesseract
Private ocrTesseract As New IronTesseract()

' Add PDF
Private pdfInput = New OcrPdfInput("Potter.pdf")
' Perform OCR
Private ocrResult As OcrResult = ocrTesseract.Read(pdfInput)

$vbLabelText $csharpLabel

在大多數情況下，無需指定DPI屬性。然而，在建構OcrPdfInput時提供高DPI數字可以提高讀取精度。

閱讀PDF頁面示例

在讀取PDF文件的特定頁面時，用戶可以指定要導入的頁面索引號。要執行此操作，請在構建 OcrPdfInput 時將頁面索引列表傳遞給 PageIndices 參數。請記住，頁面索引使用從零開始的編號。

:path=/static-assets/ocr/content-code-examples/how-to/input-pdfs-read-pdf-pages.cs

using IronOcr;
using System.Collections.Generic;

// Instantiate IronTesseract
IronTesseract ocrTesseract = new IronTesseract();

// Create page indices list
List<int> pageIndices = new List<int>() { 0, 2 };

// Add PDF
using var pdfInput = new OcrPdfInput("Potter.pdf", PageIndices: pageIndices);
// Perform OCR
OcrResult ocrResult = ocrTesseract.Read(pdfInput);

Imports IronOcr
Imports System.Collections.Generic

' Instantiate IronTesseract
Private ocrTesseract As New IronTesseract()

' Create page indices list
Private pageIndices As New List(Of Integer)() From {0, 2}

' Add PDF
Private pdfInput = New OcrPdfInput("Potter.pdf", PageIndices:= pageIndices)
' Perform OCR
Private ocrResult As OcrResult = ocrTesseract.Read(pdfInput)

$vbLabelText $csharpLabel

指定掃描區域

通過縮小要閱讀的區域，您可以顯著提高閱讀效率。要實現這一點，您可以指定需要讀取的導入PDF的精確區域。在下面的代碼示例中，我已指示IronOCR專注於僅提取章節號和標題。

:path=/static-assets/ocr/content-code-examples/how-to/input-pdfs-read-specific-region.cs

using IronOcr;
using IronSoftware.Drawing;
using System;

// Instantiate IronTesseract
IronTesseract ocrTesseract = new IronTesseract();

// Specify crop regions
Rectangle[] scanRegions = { new Rectangle(550, 100, 600, 300) };

// Add PDF
using (var pdfInput = new OcrPdfInput("Potter.pdf", ContentAreas: scanRegions))
{
    // Perform OCR
    OcrResult ocrResult = ocrTesseract.Read(pdfInput);

    // Output the result to console
    Console.WriteLine(ocrResult.Text);
}

Imports IronOcr
Imports IronSoftware.Drawing
Imports System

' Instantiate IronTesseract
Private ocrTesseract As New IronTesseract()

' Specify crop regions
Private scanRegions() As Rectangle = { New Rectangle(550, 100, 600, 300) }

' Add PDF
Using pdfInput = New OcrPdfInput("Potter.pdf", ContentAreas:= scanRegions)
	' Perform OCR
	Dim ocrResult As OcrResult = ocrTesseract.Read(pdfInput)

	' Output the result to console
	Console.WriteLine(ocrResult.Text)
End Using

$vbLabelText $csharpLabel

OCR結果

查克尼思·賓

立即與工程團隊聊天

軟體工程師

Chaknith 是開發者界的夏洛克福爾摩斯。他第一次意識到自己可能有個軟體工程的未來，是在他為了娛樂而參加程式挑戰的時候。他的重點是 IronXL 和 IronBarcode，但他也引以為豪的是，他幫助客戶解決所有產品的問題。Chaknith 利用他與客戶直接對話中獲得的知識，以進一步改進產品。他的實際反饋超越了 Jira 工單，並支持產品開發、文件撰寫和行銷，以提升客戶的整體體驗。不在公司時，他通常在學習機器學習、寫程式和徒步旅行。