Read Table in Document
This code example demonstrates how to use the IronTesseract OCR engine to extract text and table data from a PDF document.
- An instance of the IronTesseract OCR engine is created.
- An
OcrInput
object is initialized, and a PDF file ("table.pdf") is loaded using theLoadPdf
method. - The OCR engine processes the document using the
ReadDocumentAdvanced
method, which returns a more detailedOcrResult
object. - The first table found in the document is accessed using
result.Tables.First()
, and the cell information for that table is extracted withCellInfos
. - The list of cell data (
cellList
) now contains the table's cells, including the text content and other details (e.g., cell position, size). - This method is useful for extracting structured data like tables from PDFs, allowing the text within each table cell to be programmatically accessed and processed.