Technical Reports

HPL-2012-161R1

Click here for full text: PDF

Adventures in Feature Selection on an Industrial Dataset... and Ensuing General Discoveries

Forman, George
HP Laboratories

HPL-2012-161

Keyword(s): text feature selection; text classification; document categorization; lessons learned

Abstract: We relate the story of an interesting failure of text feature selection methods on an industrial dataset of technical documents. Our detailed dissection and ultimate understanding of the failure led to the creation of general solutions that not only solved the robustness problem we faced, but were also able to improve classification accuracy for simpler, public datasets, which was crucial to enable the works' publishability.

5 Pages

Additional Publication Information: To be published in the proceedings of Silver 2012: The Silver Lining: learning from unexpected results, ECML/PKDD 2012 Workshop

External Posting Date: September 21, 2012 [Fulltext]. Approved for External Publication
Internal Posting Date: September 21, 2012 [Fulltext]

Back to Index