Content Areas & Crop Regions with PDFs

How do I set content areas on PDFs with IronOCR?

ContentAreas and PDFs

OcrInput.AddPdf() and AddPdfPage() methods all have the option to add a ContentArea.

The question - How do I know how big my content area is as PDFs are not sized in Pixels, but content areas are generally measured in them?

Option 1

OcrInput.TargetDPI Default is 225 - dictates the size of the PDF image in pixels. IronOCR will read this.

Option 2 (ideal use case)

  1. Use OcrInput.AddPdf() with your PDF template
  2. Use to get OcrInput.Pages[0].Width and Height
  3. Use OcrInput.Pages[0].ToBitmap() to get the exact image the OCR engine will read
  4. You can now measure ContentAreas in pixels

To get your info:

using IronOcr;
    var Ocr = new IronTesseract();
    using (var Input = new OcrInput())
    Input.AddPdf("example.pdf");
    OcrInput.Pages[0].ToBitmap().Save("measure-me.bmp")
    var width =  OcrInput.Pages[0].Width;
    var height =  OcrInput.Pages[0].Height;
    }
using IronOcr;
    var Ocr = new IronTesseract();
    using (var Input = new OcrInput())
    Input.AddPdf("example.pdf");
    OcrInput.Pages[0].ToBitmap().Save("measure-me.bmp")
    var width =  OcrInput.Pages[0].Width;
    var height =  OcrInput.Pages[0].Height;
    }
Imports IronOcr
	Private Ocr = New IronTesseract()
	Using Input = New OcrInput()
	Input.AddPdf("example.pdf")
	End Using
	OcrInput.Pages(0).ToBitmap().Save("measure-me.bmp") var width = OcrInput.Pages(0).Width
	Dim height = OcrInput.Pages(0).Height
	}
VB   C#

Final Result:

using IronOcr;
    var Ocr = new IronTesseract();
    using (var Input = new OcrInput())
    {
    var ContentArea = new System.Drawing.Rectangle()
    { X = 215, Y = 1250, Height = 280, Width = 1335 };  //<-- the area you want in px
    Input.AddPdf("example.pdf", ContentArea);
    var Result = Ocr.Read(Input);
    }
using IronOcr;
    var Ocr = new IronTesseract();
    using (var Input = new OcrInput())
    {
    var ContentArea = new System.Drawing.Rectangle()
    { X = 215, Y = 1250, Height = 280, Width = 1335 };  //<-- the area you want in px
    Input.AddPdf("example.pdf", ContentArea);
    var Result = Ocr.Read(Input);
    }
Imports IronOcr
	Private Ocr = New IronTesseract()
	Using Input = New OcrInput()
	Dim ContentArea = New System.Drawing.Rectangle() With {
		.X = 215,
		.Y = 1250,
		.Height = 280,
		.Width = 1335
	}
	Input.AddPdf("example.pdf", ContentArea)
	Dim Result = Ocr.Read(Input)
	End Using
VB   C#

Documentation: https://ironsoftware.com/csharp/ocr/object-reference/api/IronOcr.OcrInput.html https://ironsoftware.com/csharp/ocr/object-reference/api/IronOcr.OcrInput.Page.html