Technical Reports
HPL-2011-90R1
Company Names Matching in the Large Patents Dataset
Medvedev, Timofey; Ulanov, Alexander
HP Laboratories
HPL-2011-90R1
Keyword(s): Names matching; duplicate detection; clustering; patents
Abstract: This paper addresses the name matching (duplicate detection) problem in the US patent dataset. It contains more then 400K unique company names spellings. In order to solve the matching problem we choose appropriate string similarity measure and clustering approach and estimate their parameters. Finally we apply them to the whole dataset and estimate the positives and negatives rates.
6 Pages
External Posting Date: July 21, 2011 [Fulltext]. Approved for External Publication
Internal Posting Date: July 21, 2011 [Fulltext]