[httperf] Ideas on error "Address already in use"?
mcortesi at gmail.com
Wed Apr 1 05:52:24 PST 2009
Sorry, I wasn't paying attention to the other thread, and I didn't
associate it subject with my issue.
Sorry about that!
On Tue, Mar 31, 2009 at 7:34 PM, Rick Jones <rick.jones2 at hp.com> wrote:
> For all incense and porpoises, that was covered earlier today, per the
> ---------- Forwarded message ----------
> From: Rick Jones <rick.jones2 at hp.com>
> To: "Arlitt, Martin" <martin.arlitt at hp.com>
> Date: Tue, 31 Mar 2009 10:12:35 -0700
> Subject: Re: [httperf] connection failed with unexpected error 99
> Arlitt, Martin wrote:
>> How many connections are in TIME_WAIT when httperf fails?
>>> -----Original Message-----
>>> From: Kyle Campos [mailto:kyle.campos at gmail.com]
>>> Sent: Tuesday, March 31, 2009 9:19 AM
>>> To: Arlitt, Martin
>>> Cc: httperf at linux.hpl.hp.com
>>> Subject: Re: [httperf] connection failed with unexpected error 99
>>> I've received FD errors before and they looked different. If on the
>>> driver machine httperf would record in the Errors: fd-unavail section.
>>> If on the SUT then I'd get some system errors on that side, but I'm
>>> not seeing either of those.
> Expanding a bit on Martin's question - TCP connections are "named" by the
> four-tuple of local/remote IP address and local/remote port number. At the
> end of a connection's "life" one or the other of the TCP endpoints is
> supposed to remain in TIME_WAIT for at least 2*MSL (Maximum Segment
> Lifetime). There will be no file descriptor associated with this TCP
> endpoint as the endpoint will not transition to TIME_WAIT state until
> close() is called, and after close() there is no longer an associated file
> 2*MSL will generally be at least 60 seconds, and might be as long as 240
> seconds, depending on the TCP stack.
> If an application "churns" through (establishes and tears-down) TCP
> connections fast enough to cause the TCP connection names to "wrap" before
> that four-tuple exists TIME_WAIT, a bind() or connect() call may fail with
> an EADDRINUSE or similar error. While on its own the four-tuple would have
> 96 bits worth of values, the fixed IP address and port of the web server
> takes away 48 of those bits, and the IP address of the client takes away
> another 32 bits, leaving only the 16 bits of local port number space. When
> an application does not make explicit port number selections in a bind()
> call, the anonymous or ephermal port space will be used, which often will
> limit the size of the port space used to 14 bits - 16384 or so entries,
> often port numbers 49152 to 65535.
> That does not take a particularly large connection churn rate to exhaust.
> With a 60 second TIME_WAIT that would be 1000 seconds. A 240 second
> TIME_WAIT that would be about 68 seconds. The general formula for the
> maximum churn rate would be something like
> There are things which complicate the calculation - is the server the one
> with TIME_WAIT (TIME_WAIT goes to the side which thinks it has sent the
> first FINished segment)? Is there code in the stack which tries to safely
> "restart" the connection with that four-tuple "name?" etc etc etc.
> The fix? Some combination of:
> *) use longer-lived connections - persistent or pipelined
> *) make explicit calls to bind() in the client to use the entire
> non-priviledged port space from 1024 to 65535
> *) use more than one client
> *) configure more than one IP on each client and modify the client code to
> make explicit bind() calls to those addtional IP addresses
> *) configure more than one IP on the server
> However, the fix is NOT to attempt to circumvent TIME_WAIT. TIME_WAIT is
> there for a very specific purpose - to make certain that a new connection by
> the same "name" (four-tuple) does not mistakenly accept segments from an old
> connection by that name. To do so would result in silent data corruption.
> Of course, error 99 could mean something else entirely :)
> rick jones
> httperf mailing list
> httperf at linux.hpl.hp.com
More information about the httperf