Jump to content United States-English
HP.com Home Products and Services Support and Drivers Solutions How to Buy
» Contact HP

hp.com home


Data Mining and Machine Learning




printable version
» 

HP Labs

» Research
» News and events
» Technical reports
» About HP Labs
» Careers @ HP Labs
» People
» Worldwide sites
» Downloads
Content starts here


Overview

The Data Mining and Machine Learning project aims to develop, extend, and apply technologies and tools for finding (and enabling people to take advantage of) patterns in large datasets and data streams. The discovery of these patterns draws upon such research fields as Machine Learning, Statistics, Databases, Information Retrieval, and Information Visualization, to name a few.

We are developing technologies to intelligently analyze structured and unstructured information, so as to develop new service capabilities for our partners and customers. We work with HP business units as well as with leading-edge external customers to identify exciting research problems and high-potential application opportunities.

Problems Addressed

There are numerous application areas in which data mining plays a promising role. We have an especially productive relationship with HP Customer Support, which has led to numerous innovations and new capabilities. Here are some examples of problems we have recently addressed:

  • Configuration analysis and semi-automated system assessments: how to focus proactive support resources on areas and systems that deviate from others in suspicious ways
  • Automated categorization of documents in a very large topic hierarchy: how to put hundreds of thousands of documents in the right place, while minimizing the need for training data and dealing with a constantly shifting portfolio
  • Enabling efficient storage, archiving, disk-based backup, and content services by taking advantage of commonalities in stored items
  • Characterization of storage-system throughput patterns based on workload features, so as to enable automated storage services
  • Automated correction and detection of similarities in extremely high-volume commercial transaction flows
  • Analysis of conversion patterns within a sales portal
  • Rapid detection of changes in hardware performance, response times, service levels, and various other variables of interest, without requiring extensive customization, calibration, or configuration of the change-point detection tools.

HP Labs Work

To address these and other applications, we have recently developed new technologies and algorithms in the following areas, among others:

  • Clustering algorithms
    We have developed a new type of clustering, Conjunctive Clustering, which does not rely on mapping data items into metric space.
    We have developed K-Harmonic Means which has been shown to be more robust when compared with the industry-leading K-Means algorithm.
    We have studied scalability issues, parallelizing clustering algorithms, divide and conquer style clustering in the data stream model of computation, as well as investigating the sample complexity of clustering
  • Content management
    We have developed methods for super-efficient representation of very large data items and structured collections of such items, using work in intrinsic references and chunking.
  • Feature selection
    We have developed a new feature-selection algorithm, bi-normal separation, which outperforms previously known alternatives
  • Categorization
    We created a set of tools to manage a topic hierarchy, provide example documents (training cases) for each topic, tag documents automatically, test the accuracy of the resulting tags, and enable the use of the resulting validated tags in browsing and searching. One of the design goals was to make this toolset easily reusable; other opportunities in content-management solutions might also benefit from our contributions.
  • Genetic programming
    We have developed GPLab, a robust, powerful, and, above all, flexible platform for genetic programming research, experimentation, and application. We are also conducting fundamental research aimed at expanding the power of the genetic programming paradigm

Contact: Jaap.Suermondt@hp.com



Solutions and Services

» Technology for Services
» Architecture and Software Components
» Service Elements for the Adaptive Enterprise
» Service Elements for Digital Publishing
» Trust, Security and Privacy
» Innovation for Emerging Economies
» New Competitive Spaces
miner's rock pick and stones
Privacy statement Using this site means you accept its terms Feedback to HP Labs
© 2009 Hewlett-Packard Development Company, L.P.