[httperf] fd-unavail non-null

Sylvain Geneves sylvain.geneves at inrialpes.fr
Wed Sep 1 02:03:48 PDT 2010


Hi Martin

Thanks for pointing me to that thread i missed, it's very helpful for my 
understanding!

Here's a quote of the end of a relevant mail :
"
The fix? Some combination of:

*) use longer-lived connections - persistent or pipelined

*) make explicit calls to bind() in the client to use the entire 
non-priviledged
port space from 1024 to 65535

*) use more than one client

*) configure more than one IP on each client and modify the client code 
to make
explicit bind() calls to those addtional IP addresses

*) configure more than one IP on the server

However, the fix is NOT to attempt to circumvent TIME_WAIT.  TIME_WAIT 
is there
for a very specific purpose - to make certain that a new connection by 
the same
"name" (four-tuple) does not mistakenly accept segments from an old 
connection by
that name.  To do so would result in silent data corruption.
"

but I still have questions about these :
*) I can't change the pipelining in my workload (that's part of what i 
want to test), so the first fix isn't relevant in my case.

*) I thought that was exactly what the --hog option would do ?

*) I'm trying that, but I'll be needing much more machines than expected 
to overload my server...

*) I'll try that but i'm not sure how the OS will behave when assigning 
multiple IPs to a single NIC

*) I fear that assigning more than one IP to a single NIC on the server 
will stress the network load-balancing algorithm of the OS too much

actually I tuned TCP parameters for my tests, and after chechking that 
it turns out that TIME_WAIT is dramatically reduced to 1sec in my 
configuration, so I'm surprised this problem arises this fast (seems to 
be any rate above 1000sess/sec).

Thanks for your help
Sylvain


On 08/31/2010 11:33 PM, Arlitt, Martin wrote:
> Hi Sylvain
>
> Using more client machines with less load on each of them is a good approach.
>
> Some other suggestions are available in the mailing list archive. In particular, you may want to read through the thread "httperf: connection failed with unexpected error 99" that appeared in February and March 2009. Included in that thread is a detailed explanation of what is happening (from Rick Jones).
>
> The archive is available at
> http://www.hpl.hp.com/hosted/linux/mail-archives/httperf/
>
> and the specific months are
> http://www.hpl.hp.com/hosted/linux/mail-archives/httperf/2009-February/thread.html
>
> and
>
> http://www.hpl.hp.com/hosted/linux/mail-archives/httperf/2009-March/thread.html
>
> thanks
> Martin
>
>
> -----Original Message-----
> From: Sylvain GENEVES [mailto:Sylvain.Geneves at inrialpes.fr]
> Sent: Tuesday, August 31, 2010 2:18 PM
> To: Arlitt, Martin
> Cc: 'Sylvain GENEVES'; httperf at linux.hpl.hp.com
> Subject: RE: [httperf] fd-unavail non-null
>
> Hi Martin
>
> you got it!! there are indeed connections in TIME_WAIT state.
> could you tell me what is the right thing to do to avoid that? my guess
> would be to use more client machines with less load on each one ?
>
> Thanks
> Sylvain
>
>> Hi Sylvain
>>
>> Have you checked if it is an issue with too many TCP connections on the
>> client ending up in the TIME_WAIT state? If you haven't, run
>> $ netstat -an | grep TIME_WAIT | wc -l
>>
>> Thanks
>> Martin
>>
>>
>> -----Original Message-----
>> From: httperf-bounces at linux.hpl.hp.com
>> [mailto:httperf-bounces at linux.hpl.hp.com] On Behalf Of Sylvain GENEVES
>> Sent: Tuesday, August 31, 2010 1:39 PM
>> To: httperf at linux.hpl.hp.com
>> Subject: [httperf] fd-unavail non-null
>>
>> Hi,
>>
>>
>> I run httperf on linux using --wsesslog=45371,0.000,/my_sessions_path
>> --rate=2000 and --timeout=3 for a big overload test.
>> what i don't understand is that it tells me fd-unavail is non-null (tens
>> of thousands actually), though i've dramatically increased the max open
>> files limit to 786762. I've tried to increase it to 5000000, but the
>> results are the same : there're still the fd-unavail errors...
>>
>> I've had a doubt, but it appears that the per-process max open files limit
>> on linux is shown by "ulimit -n", so i first modified system-wide settings
>> (through sysctl), then in PAM (/etc/security/limits.conf), then in ulimit.
>>
>> I've modified the code to count EINVAL errors in a separate way, thus
>> fd-unavail only counts EMFILE errors, no change...
>>
>> does someone know what's going on?
>> also, i don't know if it's related, but i see a strange behaviour when i
>> launch multiple instances of httperf :
>>   - they don't finish at the same time (some instances finish tens of
>> minutes later than the first one)
>>   - in the end the remaining instances should be doing 2000sessions/sec,
>> but instead they do ~2requests/sec : the server is so idle i can check it
>> live with tcpdump...
>>
>>
>> Regards,
>> Sylvain
>>
>> _______________________________________________
>> httperf mailing list
>> httperf at linux.hpl.hp.com
>> http://www.hpl.hp.com/hosted/linux/mail-archives/httperf/
>>
>> _______________________________________________
>> httperf mailing list
>> httperf at linux.hpl.hp.com
>> http://www.hpl.hp.com/hosted/linux/mail-archives/httperf/
>>
>
>



More information about the httperf mailing list