使用 IronOCR 裁切區域與矩形
This article was translated from English: Does it need improvement?
Translated
View the article in English
如何使用 IronOCR 在 PDF 檔案中設定內容區域?
內容區塊與 PDF 檔案
OcrInput.LoadPdf 和 LoadPdfPage 方法皆提供新增 ContentArea 的選項。
問題 - 由於 PDF 檔案的尺寸並非以像素為單位,而內容區域通常以像素為單位測量,我該如何得知我的內容區域有多大?
選項 1
@@—CODE-357—@@ 預設值為 225 —— 此數值決定 PDF 圖片的像素尺寸。 IronOCR 將讀取此內容。
選項 2(理想使用情境)
- 請在您的 PDF 範本中使用
OcrInput.LoadPdf()。 - 使用
OcrInput.GetPages()來取得輸入內容的寬度和高度。 - 使用
OcrInput.GetPages().First().ToBitmap()來獲取 OCR 引擎將讀取的精確圖像。 - 現在您可以從匯出的圖片中,以像素為單位測量 ContentAreas。
- 目標座標可用於特定 OCR 區域(請參閱最終結果)。
如需取得相關資訊:
using System.Linq; // Needed for First()
using IronOcr;
var ocr = new IronTesseract();
using (var input = new OcrInput())
{
// Load the PDF document
input.LoadPdf("example.pdf");
// Save the first page as a bitmap to measure it
input.GetPages().First().ToBitmap().SaveAs("measure-me.bmp");
// Get the dimensions of the first page
var width = input.GetPages().First().Width;
var height = input.GetPages().First().Height;
// Optionally, output the dimensions to understand the scale
Console.WriteLine($"Width: {width}px, Height: {height}px");
}
using System.Linq; // Needed for First()
using IronOcr;
var ocr = new IronTesseract();
using (var input = new OcrInput())
{
// Load the PDF document
input.LoadPdf("example.pdf");
// Save the first page as a bitmap to measure it
input.GetPages().First().ToBitmap().SaveAs("measure-me.bmp");
// Get the dimensions of the first page
var width = input.GetPages().First().Width;
var height = input.GetPages().First().Height;
// Optionally, output the dimensions to understand the scale
Console.WriteLine($"Width: {width}px, Height: {height}px");
}
Imports System.Linq ' Needed for First()
Imports IronOcr
Private ocr = New IronTesseract()
Using input = New OcrInput()
' Load the PDF document
input.LoadPdf("example.pdf")
' Save the first page as a bitmap to measure it
input.GetPages().First().ToBitmap().SaveAs("measure-me.bmp")
' Get the dimensions of the first page
Dim width = input.GetPages().First().Width
Dim height = input.GetPages().First().Height
' Optionally, output the dimensions to understand the scale
Console.WriteLine($"Width: {width}px, Height: {height}px")
End Using
$vbLabelText
$csharpLabel
最終結果:
using IronOcr;
using IronSoftware.Drawing; // Needed for Rectangle
var ocr = new IronTesseract();
using (var input = new OcrInput())
{
// Define the content area rectangle with specific pixel coordinates
var contentArea = new Rectangle
{
X = 215,
Y = 1250,
Height = 280,
Width = 1335
}; //<-- the area you want in px
// Load the specific content area of the PDF
input.LoadPdf("example.pdf", contentArea: contentArea);
// Perform OCR on the defined content area
var result = ocr.Read(input);
// Optionally, print the OCR result
Console.WriteLine(result.Text);
}
using IronOcr;
using IronSoftware.Drawing; // Needed for Rectangle
var ocr = new IronTesseract();
using (var input = new OcrInput())
{
// Define the content area rectangle with specific pixel coordinates
var contentArea = new Rectangle
{
X = 215,
Y = 1250,
Height = 280,
Width = 1335
}; //<-- the area you want in px
// Load the specific content area of the PDF
input.LoadPdf("example.pdf", contentArea: contentArea);
// Perform OCR on the defined content area
var result = ocr.Read(input);
// Optionally, print the OCR result
Console.WriteLine(result.Text);
}
Imports IronOcr
Imports IronSoftware.Drawing ' Needed for Rectangle
Private ocr = New IronTesseract()
Using input = New OcrInput()
' Define the content area rectangle with specific pixel coordinates
Dim contentArea = New Rectangle With {
.X = 215,
.Y = 1250,
.Height = 280,
.Width = 1335
}
' Load the specific content area of the PDF
input.LoadPdf("example.pdf", contentArea:= contentArea)
' Perform OCR on the defined content area
Dim result = ocr.Read(input)
' Optionally, print the OCR result
Console.WriteLine(result.Text)
End Using
$vbLabelText
$csharpLabel
API 參考:OcrInput | OcrInput.Page
準備開始了嗎?
Nuget 下載 5,896,332 | 版本: 2026.5 just released

