hp home products & services support solutions how to buy
hp logo - invent
corner hp labs corner
search search
contact hp contact hp
hp labs home hp labs home
about hp labs about hp labs
research research
news and events news and events
careers @ labs careers @ labs
technical reports technical reports
talks and speeches talks and speeches
worldwide sites worldwide sites
corner corner
Web Server and Media Server Workload Analysis and Tools


Streaming media represents a new wave of rich Internet content. Recent technological advancements in video creation, compression, bandwidths, caching, streaming, and other content delivery technology have brought audio and video together to the Internet as rich media. There are predictions that rich media will significantly add to the user experience, and therefore, will be the Internet's next "killer app."

We are developing a tool, called MediaMetrics, that characterizes a media server access profile and its system resource usage in both a quantitative and qualitative way. It extracts and reports information that could be used by service providers to evaluate current solutions and to improve and optimize relevant future components.

MediaMetrics performs an analysis which is entirely based on media server access logs, which can be from one or multiple servers in a cluster. The tool is written in Perl to process the most common media server log formats: from Windows Media Server and RealNetworks Media Server.

Related Papers and Reports


Understanding the nature of traffic to the web site is crucial to properly designing the site support infrastructure, especially for large, busy sites. What are the new access patterns specific to the current Web? How to characterize dynamics or evolution of the web site, and measure the rate of changes?

Our goal is to develop a web server log analysis tool, called WebMetrix, that produces a web site profile and its system resource usage in a way useful to service providers.

One of the question we address with WebMetrix is how to characterize the dynamics and evolution of web sites. Most web sites are adding new content and removing some of the old one. However, these changes to the content only partially characterize what we call dynamics or evolution of the site. These content changes contribute to the dynamics of the site only when the new content is accessed.

Dynamics of the site can be taken into account (in addition to load information) when making a decision about different load balancing solutions, caching or content distribution systems. For example, if the site is very dynamic, i.e. a large portion of the daily client requests are accessing new content, news sites being a prime example, then Akamai approach might be a good choice to handle the load. ``Hot'' documents will be replicated closer to clients on Akamai servers, as this will improve user quality of service. However, if the site's traffic pattern shows consistently that clients access a slowly changing subset of documents, then currently existing Internet caches might be a useful solution at no cost for the service provider, to propagate this content around, and to a certain degree improve server side performance.

Another set of data that WebMetrix provides is related to quality of service for web servers. Aborted connections often reflect unsatisfactory level of service, typically due to high response time, however they are not easily recognizable. From web server logs information, we identify the requests which are most likely due to aborted connections. This profiling technique can be useful as a first warning sign for system administrators about poor quality of service on their sites.

The tool is written in Perl for the Common Log Format, which is the most popular default for web server access logs.

Related Papers and Reports


The shared Web hosting market targets small and medium size businesses. The most common purpose of a shared hosting web site is marketing (in other words, it means that most of the documents are static). In this case, many different sites are hosted on the same hardware. A shared Web hosting service creates a set of virtual servers on the same server. Each virtual server is set-up to write its own access log. Such implementation and set-up, however, splits the ``whole picture'' of web server usage into multiple independent pieces, making it difficult for the service provider to understand and analyze the ``aggregate'' traffic characteristics. The situation gets even more complex when a Web hosting infrastructure is based on a web server farm or cluster, used to create a scalable and highly available solution.

There are several web log analysis tools available ( Analog, Webalizer, WebTrends to name just a few). They give detailed data analysis useful for business sites to understand their customers and customers interests. However, these tools lack the information which is of interest to system administrators; the information which provides insight into the system's resource requirements and traffic access patterns.

Shared Web Hosting Analysis Tool (WHAT) aims to provide a Web hosting service profile and characterize the system's usage specifics and trends:

  • service characterization - a service profile, a comparative analysis of system resource usage by hosted web sites;
  • traffic characterization - a comprehensive analysis of overall workload with extraction of a few main parameters to characterize it;
  • system requirements characterization - a related system resource usage analysis, especially memory requirements.
These characteristics provide an insight into the system's resource requirements and traffic access patterns - the information which is of special interest to system administrators and service providers.

Related Papers and Reports

printing icon
printing instructions printing instructions
Privacy Statement Legal Notices © 1994-2001 Hewlett-Packard Company