Technical Reports
HPL-2008-55
Automated Repurposing of Implicitly Structured Documents
Balinsky, Helen; Wiley, Anthony; Rhodes, Michael; Al-Fathiatul, Abdul- Rahman
HP Laboratories
HPL-2008-55
Keyword(s): document repurposing, hierarchical metrics and structure, typography
Abstract: From the least to most prominent elements, documents are arranged in a tacit visual hierarchy. This is essential for document scanning and comprehension. This conceptual structure can be easily recognized by humans through visual cues, such as spatial intervals and positions, contrasts in font families, sizes and weights. At the same time, the document structure is often not available in a machine readable form due to the ways documents were originally created or later transformed. This paper addresses the challenge of automatic document repurposing-applying styling and formatting from one 'implicitly' structured document to another, whilst preserving the underlying visual hierarchy. Using visual perception analysis, the proportionality mapping is established, according to which the original document content is transformed into the new style without breaking the original hierarchical structure. Spatial relationships, location and frequency analysis are then used to fine- tune the transformation.
10 Pages
Additional Publication Information: Additional Publication Information: Presented and published in ACM DocEng'08, ACM Symposium on Document Engineering, Sao Paulo, Brazil, Sept 2008.
External Posting Date: January 21, 2009 [Fulltext]. Approved for External Publication
Internal Posting Date: January 21, 2009 [Fulltext]