Content Areas & Crop Regions with PDFs

How do I set content areas on PDFs with IronOCR?

ContentAreas and PDFs

OcrInput.LoadPdf and LoadPdfPage methods all have the option to add a ContentArea.

The question - How do I know how big my content area is as PDFs are not sized in Pixels, but content areas are generally measured in them?

Option 1

OcrInput.TargetDPI Default is 225 - dictates the size of the PDF image in pixels. IronOCR will read this.

Option 2 (ideal use case)

  1. Use OcrInput.LoadPdf() with your PDF template
  2. Use OcrInput.GetPages() to get the input's Width and Height
  3. Use OcrInput.GetPages().First().ToBitmap() to get the exact image the OCR engine will read
  4. You can now measure ContentAreas in pixels from the exported image
  5. The targeted coordinates could be used for specific OCR region (see in Final Result)

To get your info:

using IronOcr;
var ocr = new IronTesseract();
using (var input = new OcrInput())
{
    input.LoadPdf("example.pdf");
    input.GetPages().First().ToBitmap().SaveAs("measure-me.bmp");
    var width = input.GetPages().First().Width;
    var height = input.GetPages().First().Height;
}
using IronOcr;
var ocr = new IronTesseract();
using (var input = new OcrInput())
{
    input.LoadPdf("example.pdf");
    input.GetPages().First().ToBitmap().SaveAs("measure-me.bmp");
    var width = input.GetPages().First().Width;
    var height = input.GetPages().First().Height;
}
Imports IronOcr
Private ocr = New IronTesseract()
Using input = New OcrInput()
	input.LoadPdf("example.pdf")
	input.GetPages().First().ToBitmap().SaveAs("measure-me.bmp")
	Dim width = input.GetPages().First().Width
	Dim height = input.GetPages().First().Height
End Using
VB   C#

Final Result:

using IronOcr;
var ocr = new IronTesseract();
using (var input = new OcrInput())
{
    var contentArea = new IronSoftware.Drawing.Rectangle()
    { X = 215, Y = 1250, Height = 280, Width = 1335 };  //<-- the area you want in px
    input.LoadPdf("example.pdf", ContentArea: contentArea);
    var result = ocr.Read(input);
}
using IronOcr;
var ocr = new IronTesseract();
using (var input = new OcrInput())
{
    var contentArea = new IronSoftware.Drawing.Rectangle()
    { X = 215, Y = 1250, Height = 280, Width = 1335 };  //<-- the area you want in px
    input.LoadPdf("example.pdf", ContentArea: contentArea);
    var result = ocr.Read(input);
}
Imports IronOcr
Private ocr = New IronTesseract()
Using input = New OcrInput()
	Dim contentArea = New IronSoftware.Drawing.Rectangle() With {
		.X = 215,
		.Y = 1250,
		.Height = 280,
		.Width = 1335
	}
	input.LoadPdf("example.pdf", ContentArea:= contentArea)
	Dim result = ocr.Read(input)
End Using
VB   C#

API Reference: OcrInput | OcrInput.Page