Technical Reports

HPL-2011-90R1

Click here for full text: PDF

Company Names Matching in the Large Patents Dataset

Medvedev, Timofey; Ulanov, Alexander
HP Laboratories

HPL-2011-90R1

Keyword(s): Names matching; duplicate detection; clustering; patents

Abstract: This paper addresses the name matching (duplicate detection) problem in the US patent dataset. It contains more then 400K unique company names spellings. In order to solve the matching problem we choose appropriate string similarity measure and clustering approach and estimate their parameters. Finally we apply them to the whole dataset and estimate the positives and negatives rates.

6 Pages

External Posting Date: July 21, 2011 [Fulltext]. Approved for External Publication
Internal Posting Date: July 21, 2011 [Fulltext]

Back to Index