Click here for full text:
Experiences in Collection of Handwriting Data for Online Handwriting Recognition in Indic Scripts
Bhaskarabhatla, Ajay S.; Madhvanath, Sriganesh
Keyword(s): handwriting recognition; Indic scripts; handwriting corpora
Abstract: In this paper, we describe initial efforts at Hewlett- Packard Labs, Bangalore, to create datasets of online handwriting in Indic scripts to support research in online handwriting recognition for the Indic scripts. The term "online" here refers to the fact that handwriting is captured as a stream of (x,y) points using an appropriate pen position sensor (often called a digitizer), rather than as a bitmap (image). The paper describes the structure of Indic scripts in brief. It identifies different choices for segmenting characters into simpler shapes that can then be recognized using pattern recognition techniques. The paper discusses these issues in the context of the Tamil script. The remainder of the paper provides an overview of two distinct data collection efforts for the Tamil script - one at the isolated character level, and the other for isolated words. In the context of these efforts, we briefly describe the data collection procedure, tools for collection and subsequent annotation, user-interface issues, the annotation scheme, and the organization of the dataset. The paper concludes with the current status of the effort and future directions.
Back to Index