Jump to content United States-English
HP.com Home Products and Services Support and Drivers Solutions How to Buy
» Contact HP

HP.com home

Technical Reports


HP Labs

» Research
» News and events
» Technical reports
» About HP Labs
» Careers @ HP Labs
» Worldwide sites
» Downloads
Content starts here

Click here for full text: PDF

Offline/Realtime Traffic Classification Using Semi- Supervised Learning

Erman, Jeffrey; Mahanti, Anirban; Arlitt, Martin; Cohen, Ira; Williamson, Carey


Keyword(s): traffic classification; semi-supervised learning; clustering

Abstract: Identifying and categorizing network traffic by application type is challenging because of the continued evolution of applications, especially of those with a desire to be undetectable. The diminished effectiveness of port-based identification and the overheads of deep packet inspection approaches motivate us to classify traffic by exploiting distinctive flow characteristics of applications when they communicate on a network. In this paper, we explore this latter approach and propose a semi- supervised classification method that can accommodate both known and unknown applications. To the best of our knowledge, this is the first work to use semi- supervised learning techniques for the traffic classification problem. Our approach allows classifiers to be designed from training data that consists of only a few labeled and many unlabeled flows. We consider pragmatic classification issues such as longevity of classifiers and the need for retraining of classifiers. Our performance evaluation using empirical Internet traffic traces that span a 6-month period shows that: 1) high flow and byte classification accuracy (i.e., greater than 90%) can be achieved using training data that consists of a small number of labeled and a large number of unlabeled flows; 2) presence of "mice" and "elephant" flows in the Internet complicates the design of classifiers, especially of those with high byte accuracy, and necessities use of weighted sampling techniques to obtain training flows; and 3) retraining of classifiers is necessary only when there are non-transient changes in the network usage characteristics. As a proof of concept, we implement prototype offline and realtime classification systems to demonstrate the feasibility of our approach. Publication Info: Copyright Elsevier. Presented at Performance 2007, 2-5 October 2007, Cologne, Germany, and published in Performance Evaluation journal

15 Pages

Back to Index

»Technical Reports

» 2009
» 2008
» 2007
» 2006
» 2005
» 2004
» 2003
» 2002
» 2001
» 2000
» 1990 - 1999

Heritage Technical Reports

» Compaq & DEC Technical Reports
» Tandem Technical Reports
Printable version
Privacy statement Using this site means you accept its terms Feedback to HP Labs
© 2009 Hewlett-Packard Development Company, L.P.