Content Areas & Crop Regions with PDFs
How do I set content areas on PDFs with IronOCR?
ContentAreas and PDFs
OcrInput.LoadPdf
and LoadPdfPage
methods all have the option to add a ContentArea.
The question - How do I know how big my content area is as PDFs are not sized in Pixels, but content areas are generally measured in them?
Option 1
OcrInput.TargetDPI Default is 225 - dictates the size of the PDF image in pixels. IronOCR will read this.
Option 2 (ideal use case)
- Use OcrInput.LoadPdf() with your PDF template
- Use OcrInput.GetPages() to get the input's Width and Height
- Use OcrInput.GetPages().First().ToBitmap() to get the exact image the OCR engine will read
- You can now measure ContentAreas in pixels from the exported image
- The targeted coordinates could be used for specific OCR region (see in Final Result)
To get your info:
using IronOcr;
var ocr = new IronTesseract();
using (var input = new OcrInput())
{
input.LoadPdf("example.pdf");
input.GetPages().First().ToBitmap().SaveAs("measure-me.bmp");
var width = input.GetPages().First().Width;
var height = input.GetPages().First().Height;
}
using IronOcr;
var ocr = new IronTesseract();
using (var input = new OcrInput())
{
input.LoadPdf("example.pdf");
input.GetPages().First().ToBitmap().SaveAs("measure-me.bmp");
var width = input.GetPages().First().Width;
var height = input.GetPages().First().Height;
}
Imports IronOcr
Private ocr = New IronTesseract()
Using input = New OcrInput()
input.LoadPdf("example.pdf")
input.GetPages().First().ToBitmap().SaveAs("measure-me.bmp")
Dim width = input.GetPages().First().Width
Dim height = input.GetPages().First().Height
End Using
Final Result:
using IronOcr;
var ocr = new IronTesseract();
using (var input = new OcrInput())
{
var contentArea = new IronSoftware.Drawing.Rectangle()
{ X = 215, Y = 1250, Height = 280, Width = 1335 }; //<-- the area you want in px
input.LoadPdf("example.pdf", ContentArea: contentArea);
var result = ocr.Read(input);
}
using IronOcr;
var ocr = new IronTesseract();
using (var input = new OcrInput())
{
var contentArea = new IronSoftware.Drawing.Rectangle()
{ X = 215, Y = 1250, Height = 280, Width = 1335 }; //<-- the area you want in px
input.LoadPdf("example.pdf", ContentArea: contentArea);
var result = ocr.Read(input);
}
Imports IronOcr
Private ocr = New IronTesseract()
Using input = New OcrInput()
Dim contentArea = New IronSoftware.Drawing.Rectangle() With {
.X = 215,
.Y = 1250,
.Height = 280,
.Width = 1335
}
input.LoadPdf("example.pdf", ContentArea:= contentArea)
Dim result = ocr.Read(input)
End Using
API Reference: OcrInput | OcrInput.Page