Intelligent Table Transcriber: AI-driven Table Recognition Framework for Enhanced Information Retrieval
Date
2024-08
Authors
Kim, Dongyoun
Major Professor
Mitra, Simanta
Advisor
Committee Member
Aduri, Pavan
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Data information retrieval aims to extract useful information with the query from the users. Traditional information retrieval systems often struggle with diverse document formats, hindering e cient data extraction and organization. Motivated by these limitations, this project collaborates with Soilserdem, aiming to design a framework for data information retrieval from a variety of data types with agronomic information to a machine-readable format. (e.g., PDF, images, geospatial files). In particular, the main function of this project includes recognizing tables and extracting data from digital images. One challenge was to capture the textual content while preserving the original tabular layout when directly applying the existing table structure recognition model.
In this creative component, we propose (1) a hybrid approach that feasibly captures text content and preserves the original tabular layout by combining a pre-trained table transformer and a rule-based approach with optical character recognition. (2) Additionally, we design the overall back-end framework, from data loading to main processes.
Our exploratory experiments demonstrate that the proposed hybrid approach better captures textual content and preserves the original tabular layout than a single model of rule-based and data-driven approaches.
Series Number
Journal Issue
Is Version Of
Versions
Series
Academic or Administrative Unit
Type
creative component
Comments
Rights Statement
Attribution 3.0 United States
Copyright
2024