The presence of temporal locality in web request traces has long been
recognized, and has been incorporated in synthetic web trace generators.
However, the close proximity of requests for the same file in a trace
can be attributed to two orthogonal reasons: long-term popularity and
short-term correlation .
The former reflects the fact that requests for a popular document
simply appear ``very frequently''
thus they are likely to be ``close'' in an absolute sense.
The latter reflects instead the fact that
requests for a given document might concentrate
around particular points in the trace due to a variety of reasons,
such as deadlines or swings in user interests, hence it focuses
on ``relative'' closeness.
In this work, we introduce a new measure of temporal locality,
the scaled stack distance, which is insensitive to
popularity and captures instead the impact of short-term correlation.
We then use the scaled stack distance observed in the original trace
to parametrize a synthetic trace generator.
Finally, we validate the appropriateness of using this quantity by comparing
the file and byte miss ratios corresponding to either the original
or the synthetic traces.
Our case study is based on server access logs from HP Web Hosting site, HP.com site,
HPLabs.com site, and OpenView.com site.
Our next step is to desigh capacity planning methods taking into account
"locality" characterization to predict and evaluate the
potential performance benefits of caching and content-aware load
Related Papers and Reports