Click here for full text:
Quality Assurance for Document Understanding Systems
Keyword(s): quality assurance; document understanding; content remastering
Abstract: Document understanding is a field that is concerned with semantic analysis of documents to extract human understandable information and codify it into machine- readable form. Document understanding systems provide means to automatically extract meaningful information from a raster image of a document. Those systems provide means to create information rich content that is usable in many end-user applications such as search and retrieval. To process a large volume of data, such as the collection of books and journals produced by a publisher, content understanding systems should run non-stop in an automated fashion and in an unattended operation mode. Ensuring the quality of the output of such system is a challenging task due to several factors including the unattended nature of the system and the mass amount of data (in terabytes) which could give rise to considerable number of exceptions. Automated quality assurance (QA) techniques are essential to the success of the operation of a large- scale document understanding system. In this paper, we propose QA techniques that are essentially needed for a document understanding system and their automation.
Back to Index