[httperf] a sniffer to create the session file for httperf

Martin Arlitt arlitt at hpl.hp.com
Thu Mar 31 07:51:30 PST 2005


Anoop

I haven't tried this, so I don't know that it will work any better than
what you are already considering.

a slight improvement on the time threshold approach (in case you aren't
already considering this) might be to consider the parallel connections
that most browsers open as a single "pipeline"; as long as there are any
responses still in progress, you could hypothesize that any new requests
are still subrequests.  once the "pipeline" is empty, then you could set a
timeout.  obviously this is still not perfect as a user can always click
on a link while the current page is still being downloaded.  if you look
at some example sessions, you may be able to identify some characteristics
that help you understand when this happens.

I'd be interested in hearing what you eventually decide on, and how well
you think it works.

thanks

Martin

On Tue, 22 Mar 2005, anoop aryal wrote:

> hi,
> a while ago, i had posted a link on this mailing list to a script that used
> tcpflow to create the session file for httperf by sniffing the wire while
> real people used the website which resulted in real think-times etc. it had
> several shortcomings but it worked fine for me at that point in time. i'm
> working on an improved version of that and there is one issue that i'm having
> trouble with.
>
> i'd like to create sub-requests as supported by httperf. (the script does this
> too but the way it does it is not fool proof.) how can i distinguish a 'sub
> request' from the real request? i thought about using:
>
> 1) referrer. but all subsequent clicks would also have a referrer and
> therefore would end up as sub-requests. ie. i don't think i can tell apart an
> image fetch due to an <img> tag from a click on an image link this way.
>
> 2) timed threshold. ie. if the requests are all within x amount of time, they
> are subrequests, otherwise, they are new requests. the problems are obvious.
> what is the right amount of time used for the threshold? it's too hackish.
>
> 3) parsing the response to see if the request string for subsequent requests
> match some part of the previous response. this would work but will be
> non-trivial to get right. (this is what i did with the original script and it
> still got certain things wrong.) also, it would probably figure all subequent
> clicks as subrequests because the click would be on a link in that response
> and as such, without careful html parsing, it would match.
>
> 4) treat all html as initial-requests and all non-html as sub-requests. again,
> if the link was to an image, this fails. it also fails if the html is part of
> an iframe or such.
>
>
> ideas would be appreciated.
>
> anoop
> aaryal at foresightint.com
> _______________________________________________
> httperf mailing list
> httperf at linux.hpl.hp.com
> http://www.hpl.hp.com/hosted/linux/mail-archives/httperf/
>


More information about the httperf mailing list