[httperf] connection failed with unexpected error 99
rick.jones2 at hp.com
Tue Mar 31 09:12:35 PST 2009
Arlitt, Martin wrote:
> How many connections are in TIME_WAIT when httperf fails?
>>From: Kyle Campos [mailto:kyle.campos at gmail.com]
>>Sent: Tuesday, March 31, 2009 9:19 AM
>>To: Arlitt, Martin
>>Cc: httperf at linux.hpl.hp.com
>>Subject: Re: [httperf] connection failed with unexpected error 99
>>I've received FD errors before and they looked different. If on the
>>driver machine httperf would record in the Errors: fd-unavail section.
>>If on the SUT then I'd get some system errors on that side, but I'm
>>not seeing either of those.
Expanding a bit on Martin's question - TCP connections are "named" by the
four-tuple of local/remote IP address and local/remote port number. At the end
of a connection's "life" one or the other of the TCP endpoints is supposed to
remain in TIME_WAIT for at least 2*MSL (Maximum Segment Lifetime). There will be
no file descriptor associated with this TCP endpoint as the endpoint will not
transition to TIME_WAIT state until close() is called, and after close() there is
no longer an associated file descriptor...
2*MSL will generally be at least 60 seconds, and might be as long as 240 seconds,
depending on the TCP stack.
If an application "churns" through (establishes and tears-down) TCP connections
fast enough to cause the TCP connection names to "wrap" before that four-tuple
exists TIME_WAIT, a bind() or connect() call may fail with an EADDRINUSE or
similar error. While on its own the four-tuple would have 96 bits worth of
values, the fixed IP address and port of the web server takes away 48 of those
bits, and the IP address of the client takes away another 32 bits, leaving only
the 16 bits of local port number space. When an application does not make
explicit port number selections in a bind() call, the anonymous or ephermal port
space will be used, which often will limit the size of the port space used to 14
bits - 16384 or so entries, often port numbers 49152 to 65535.
That does not take a particularly large connection churn rate to exhaust. With a
60 second TIME_WAIT that would be 1000 seconds. A 240 second TIME_WAIT that
would be about 68 seconds. The general formula for the maximum churn rate would
be something like sizeof(portspace)/lengthof(TIME_WAIT).
There are things which complicate the calculation - is the server the one with
TIME_WAIT (TIME_WAIT goes to the side which thinks it has sent the first FINished
segment)? Is there code in the stack which tries to safely "restart" the
connection with that four-tuple "name?" etc etc etc.
The fix? Some combination of:
*) use longer-lived connections - persistent or pipelined
*) make explicit calls to bind() in the client to use the entire non-priviledged
port space from 1024 to 65535
*) use more than one client
*) configure more than one IP on each client and modify the client code to make
explicit bind() calls to those addtional IP addresses
*) configure more than one IP on the server
However, the fix is NOT to attempt to circumvent TIME_WAIT. TIME_WAIT is there
for a very specific purpose - to make certain that a new connection by the same
"name" (four-tuple) does not mistakenly accept segments from an old connection by
that name. To do so would result in silent data corruption.
Of course, error 99 could mean something else entirely :)
More information about the httperf