Class PageIterator
Inheritance
System.Object
PageIterator
Implements
System.IDisposable
Assembly: IronOcr.dll
Syntax
public class PageIterator : DisposableBase
Walking the layout of a recognized page, block by block down to individual symbols, runs through PageIterator. It moves a cursor over the regions Tesseract found and reports where each one sits, so code can map the structure of a page without decoding its text. When the text itself is needed too, ResultIterator derives from PageIterator and adds the recognized characters on top of the same traversal.
A PageIterator is obtained from a recognized page rather than constructed directly, and because it inherits DisposableBase it should be disposed (or wrapped in using) once the walk finishes. Begin resets the cursor to the first element, and the two Next overloads advance it, one stepping a single PageIteratorLevel and the other moving to the next element at one level while bounded by a parent level. IsAtBeginningOf and IsAtFinalOf test the cursor's position within the hierarchy so a loop knows where a block, paragraph, or line starts and ends.
At each stop, the iterator describes the current region. BlockType returns a PolyBlockType identifying the region's role, body text, heading, image, table, or separator, and TryGetBoundingBox fills a Rect with the region's pixel coordinates at a requested PageIteratorLevel, returning false when none is available. TryGetBaseline reports the text baseline the same way, and GetProperties reads font and size details for the current element. Pass the PageIteratorLevel that matches the granularity a task needs, from Block down to Symbol.
using DynamicTesseract;
void WalkBlocks(PageIterator iterator)
{
iterator.Begin();
do
{
if (iterator.TryGetBoundingBox(PageIteratorLevel.Block, out Rect box))
Console.WriteLine($"{iterator.BlockType} at {box}");
}
while (iterator.Next(PageIteratorLevel.Block));
}
The OCR results how-to walks through reading recognized structure, the results objects example shows the result model, and the region of an image how-to covers working with page coordinates.
Fields
handle
Declaration
protected HandleRef handle
Field Value
| Type |
Description |
| System.Runtime.InteropServices.HandleRef |
|
page
Declaration
protected readonly Page page
Field Value
Properties
BlockType
Declaration
public PolyBlockType BlockType { get; }
Property Value
Methods
Begin()
Declaration
Dispose(Boolean)
Declaration
public override void Dispose(bool disposing)
Parameters
| Type |
Name |
Description |
| System.Boolean |
disposing |
|
Overrides
Finalize()
Declaration
protected override void Finalize()
Overrides
GetProperties()
Declaration
public ElementProperties GetProperties()
Returns
IsAtBeginningOf(PageIteratorLevel)
Declaration
public bool IsAtBeginningOf(PageIteratorLevel level)
Parameters
Returns
| Type |
Description |
| System.Boolean |
|
IsAtFinalOf(PageIteratorLevel, PageIteratorLevel)
Declaration
public bool IsAtFinalOf(PageIteratorLevel level, PageIteratorLevel element)
Parameters
Returns
| Type |
Description |
| System.Boolean |
|
Next(PageIteratorLevel)
Declaration
public bool Next(PageIteratorLevel level)
Parameters
Returns
| Type |
Description |
| System.Boolean |
|
Next(PageIteratorLevel, PageIteratorLevel)
Declaration
public bool Next(PageIteratorLevel level, PageIteratorLevel element)
Parameters
Returns
| Type |
Description |
| System.Boolean |
|
TryGetBaseline(PageIteratorLevel, out Rect)
Declaration
public bool TryGetBaseline(PageIteratorLevel level, out Rect bounds)
Parameters
Returns
| Type |
Description |
| System.Boolean |
|
TryGetBoundingBox(PageIteratorLevel, out Rect)
Declaration
public bool TryGetBoundingBox(PageIteratorLevel level, out Rect bounds)
Parameters
Returns
| Type |
Description |
| System.Boolean |
|
Implements
System.IDisposable