Automatic Text and Data Stream Segmentation Using Weighted Feature ExtractionShare
- Author(s): Dadachev, Boris; Balinsky, Alexander; Balinsky, Helen; Forman, George
- HP Laboratories
- Keyword(s): helmholtz principle; text segmentation; feature extraction; data mining
Abstract: Automatic text and data stream segmentation is a fundamental problem in text data mining. Even moderately long document may consist of several relatively independent topics and parts. Such different parts can seriously affect performance of classification and mining algorithms. Applications of data segmentation, ranging from screening of radio communication transcripts to documents summarization, from automatic document classification to information visualization, from automatic filtering to security policy enforcement, all rely on automatic document segmentation. These are but a few examples of how data segmentation is finding its way into applications. In this article, a novel approach for automatic text and data stream segmentation is developed and analyzed. Depending on the needs a text document can be automatically partitioned on relatively small paragraphs or large sections.
- External Posting Date: November 6, 2013 [Abstract Only]. Approved for External Publication - External Copyright Consideration
- Internal Posting Date: November 6, 2013 [Fulltext]