[httperf] SIGSEGV when multiple instances are running?

Jim Whitehead II jnwhiteh at gmail.com
Fri Feb 4 05:00:59 PST 2011


On Fri, Feb 4, 2011 at 10:28 AM, Gattegno, Victor (GCC - Software)
<victor_gattegno at hp.com> wrote:
>
> Also in this article:
> http://www.cs.uwaterloo.ca/%7Ebrecht/servers/openfiles.html

Indeed, you need to change the ulimit on your system as well as ensure
FD_SETSIZE is set sufficiently high when you compile httperf. Luckily
there is good documentation, as you've pointed out!

>
> My 2 cents.
> Victor
>
> -----Original Message-----
> From: Gattegno, Victor (GCC - Software)
> Sent: vendredi 4 février 2011 10:04
> To: httperf at linux.hpl.hp.com
> Subject: RE: [httperf] SIGSEGV when multiple instances are running?
>
> Hi,
>
> There is a solution for the fd-unavail errors in that blog article:
> http://gom-jabbar.org/articles/2009/02/04/httperf-and-file-descriptors
>
> Rgds,
> Victor
>
> -----Original Message-----
> From: httperf-bounces at linux.hpl.hp.com [mailto:httperf-bounces at linux.hpl.hp.com] On Behalf Of Jim Whitehead II
> Sent: jeudi 3 février 2011 20:02
> To: Arlitt, Martin
> Cc: httperf at linux.hpl.hp.com
> Subject: Re: [httperf] SIGSEGV when multiple instances are running?
>
> On Thu, Feb 3, 2011 at 6:50 PM, Arlitt, Martin <martin.arlitt at hp.com> wrote:
>> Hi Jim
>>
>> Sounds good. No worries about the Exitf issue, I just wanted to disclose the changes I made.
>>
>> I was just looking through the client output; clearly, using too many httperf instances on a single machine creates some issues. In my 20 instance test, there are a lot of FDUnavail errors ( I did not attempt to tune the client before running these tests). That seems like something that needs further investigation, before people try to run one instance per core on systems that have dozens of cores.
>
> Ideally, you would only run one instance per machine, regardless of
> how many processors it has. In my experience I've been running into
> the following bottlenecks:
>
> 1. file descriptor limit, not always easy to change (especially if you
> don't have root)
> 2. socket timeout limits, described in the httperf man pages, can be
> controlled with 'timeout'
> 3. stream limits that cause httperf to encounter an error 98
>
> This is part of why I chose to distribute the load over multiple
> clients so I can skirt some of these limits when I don't have full
> control over my machines.
>
> In short, I'll likely warn when a duplicate worker host is specified
> and the user can make a decision accordingly. Thank you for the
> feedback!
>
>>
>> Thanks
>> Martin
>>
>>> -----Original Message-----
>>> From: Jim Whitehead II [mailto:jnwhiteh at gmail.com]
>>> Sent: Thursday, February 03, 2011 10:44 AM
>>> To: Arlitt, Martin
>>> Cc: httperf at linux.hpl.hp.com
>>> Subject: Re: [httperf] SIGSEGV when multiple instances are running?
>>>
>>> On Thu, Feb 3, 2011 at 6:28 PM, Arlitt, Martin <martin.arlitt at hp.com> wrote:
>>> > Hi Jim
>>> >
>>> > I tried autohttperf on a testbed with two machines, each with two dual-
>>> core CPUs, and Red Hat Enterprise Linux Server 5.6 as the OS. One on of
>>> these machines I ran Apache 2.2.17 as the Web server. On the other machine
>>> I installed autohttperf along with httperf-0.9.0.
>>> >
>>> > Prior to today, I did not have Go installed. I installed it as per the
>>> instructions on the golang.org site. I only mention this because the
>>> autohttperf code would not compile with the version of Go I installed.
>>> Specifically, it did not like "log.Exit" in server.go, and "log.Exitf" in client.go
>>> and utils.go. I changed all of these instances to "log.Printf" so that the
>>> programs would compile and I could run autohttperf (I recognize this is not
>>> the identical functionality, but for this test it should not affect the results).
>>>
>>> Apologies for this, there was a Go release yesterday that renamed log.Exitf
>>> to log.Fatalf and I haven't updated the code to reflect this. The changes you
>>> made would not affect the results at all, so sorry for the difficulties.
>>>
>>> > I ran autohttperf as you stated, and it appears to work fine on my test bed.
>>> I tried to entice httperf to crash by repeating the test with up to 20 instances
>>> on the (single) client machine. However, it still seems to have succeeded. I'll
>>> send you the debug output in a separate message, so that you can either
>>> confirm or refute my interpretation ( I do not clog up the mailboxes of others
>>> on the mailing list by attaching the output to this message). In short, I am not
>>> able to replicate the problem you reported.
>>>
>>> Indeed from the results you've sent me, everything seems above board.
>>> I guess at this point I'll just have to chalk it up to something bizarre in my
>>> development environment that is causing this to happen.
>>>
>>> > If you discover any additional information that may point to the cause,
>>> please let me know.
>>>
>>> If I come across anything further, I will definitely let you know.
>>> Thank you and sorry for the noise!
>>>
>>> > Thanks
>>> > Martin
>>> >
>>> >
>>> >> -----Original Message-----
>>> >> From: Jim Whitehead II [mailto:jnwhiteh at gmail.com]
>>> >> Sent: Wednesday, February 02, 2011 3:10 PM
>>> >> To: Arlitt, Martin
>>> >> Cc: httperf at linux.hpl.hp.com
>>> >> Subject: Re: [httperf] SIGSEGV when multiple instances are running?
>>> >>
>>> >> On Wed, Feb 2, 2011 at 10:06 PM, Arlitt, Martin
>>> >> <martin.arlitt at hp.com>
>>> >> wrote:
>>> >> > Hi Jim
>>> >> >
>>> >> > A student I work with at the University of Calgary ran a test with
>>> >> > two
>>> >> httperf instances on one client, but she could not replicate the problem.
>>> >> Could you please describe the client you are using, and provide a/the
>>> >> command line that results in the crash?
>>> >>
>>> >> The project I've written is currently called autohttperf and the code
>>> >> can be found on github:
>>> >>
>>> >> https://github.com/jnwhiteh/autohttperf
>>> >>
>>> >> It consists of two Go programs, autohttperf_daemon and autohttperf.
>>> >> The former is the worker daemon whereas the first is the main client
>>> >> command that is used to request new benchmarks. The premise is very
>>> >> simple, any benchmarks you request will be split over multiple
>>> >> different
>>> >> clients-- helping to distribute the load while still maintaining
>>> >> saturation of the appropriate load.
>>> >>
>>> >> The only prerequisites for using the package is httperf on any client
>>> >> machines, and the Go programming language.
>>> >>
>>> >> Once Go is installed and you've cloned the repository, you should be
>>> >> able to run 'gomake' in both the client and server directories.
>>> >>
>>> >> 1. Spin up a web server, port and host can be specified 2. Spin up an
>>> >> instance of httperf_daemon
>>> >>
>>> >> Run the autohttperf client with the following commandline switches:
>>> >>
>>> >> ./autohttperf --server <host or ip> --port <port if not 80> --manual
>>> >> -- numconns 5000 <host:port of the machine running the daemon>
>>> >>
>>> >> So if you're doing this all on localhost with default ports:
>>> >>
>>> >> ./autohttperf --server localhost --port 80 --manual --numconns 5000
>>> >> localhost:1717
>>> >>
>>> >> This should succeed with no issues, and you will receive debug
>>> >> messages on both client and server. Now run the following:
>>> >>
>>> >> ./autohttperf --server localhost --port 80 --manual --numconns 5000
>>> >> localhost:1717 localhost:1717
>>> >>
>>> >> This tells the client to connect to the localhost client twice and
>>> >> distribute the load amongst those clients. It is in these cases that
>>> >> I see the crash and can reliably reproduce this. Thinking it was
>>> >> initially a problem with my use of exec.Run in Go, I changed the
>>> >> command to use bash -c to execute the program, and fell back on
>>> /usr/bin/env to see if that made a difference.
>>> >> Neither of these fixed the issue, which pointed towards httperf.
>>> >>
>>> >> For me, the issue happens mostly when I have two of the same client
>>> >> listed as the final arguments, but I've just had to add a third in
>>> >> order to trigger the problem.. if that helps.
>>> >>
>>> >> You can see the error occurs by the following client messages:
>>> >>
>>> >> 2011/02/02 23:07:33 [localhost:1717:0] Got results
>>> >> 2011/02/02 23:07:33 [localhost:1717:0] Error state reported: Command
>>> >> did not properly exit: signal 11
>>> >>
>>> >> On the plus side, I've been using this all day to do some stress
>>> >> testing of servers and its working fine as long as the workers are on
>>> different machines.
>>> >>
>>> >> I know its a bit of a twisted issue, but this is the only way I've
>>> >> been able to reliably reproduce this crash. If I can be of assistance, let me
>>> know.
>>> >>
>>> >> --
>>> >> Jim Whitehead
>>> >> Oxford University Computing Laboratory
>>> >>
>>> >>
>>> >> >
>>> >> > Thanks
>>> >> > Martin
>>> >> >
>>> >> >
>>> >> >> -----Original Message-----
>>> >> >> From: httperf-bounces at linux.hpl.hp.com [mailto:httperf-
>>> >> >> bounces at linux.hpl.hp.com] On Behalf Of Jim Whitehead II
>>> >> >> Sent: Wednesday, February 02, 2011 7:44 AM
>>> >> >> To: httperf at linux.hpl.hp.com
>>> >> >> Subject: [httperf] SIGSEGV when multiple instances are running?
>>> >> >>
>>> >> >> I'm currently writing a tool that helps to automate benchmarking
>>> >> >> using httperf, distributing the load over a number of worker
>>> >> >> machines to generate the appropriate load. Everything works fine
>>> >> >> as long as I only issue one request to each worker machine, but as
>>> >> >> soon as I spawn two processes on the same machine to perform the
>>> >> >> httperf benchmark, one or both processes will frequently crash
>>> >> >> with a SIGSEGV. The core dump shows that this is happening in the
>>> >> >> conn_inc_ref(conn) macro call on core.c:1172. This error happens
>>> >> >> on both SVN trunk as well as
>>> >> >> 0.9.0 as downloaded from the HP ftp server.
>>> >> >>
>>> >> >> The following shows the backtrace and the core dump itself (shown
>>> >> >> last)
>>> >> >> #0  0x000000010000796c in core_loop () at core.c:1221
>>> >> >> #1  0x00000001000044b2 in main (argc=14, argv=0x7fff5fbff5d8) at
>>> >> >> httperf.c:971
>>> >> >>
>>> >> >> #0  0x000000010000796c in core_loop () at core.c:1221
>>> >> >> 1221                        conn_inc_ref (conn);
>>> >> >>
>>> >> >> The problem is I have difficulty getting httperf to reproduce this
>>> >> >> problem on its own; it seems that it only occurs when I am
>>> >> >> spawning new (multiple) instances from my RPC server that is handling
>>> requests.
>>> >> >>
>>> >> >> The RPC server is written in Go, so in an attempt to better
>>> >> >> isolate the problem, I tried running httperf using both bash -c
>>> >> >> and env. In each of these cases, the problem persisted.
>>> >> >>
>>> >> >> My question is: Are there any known issues with running concurrent
>>> >> >> instances of httperf on the same machine that might be causing
>>> >> >> this problem? I'd rather have a nice working tool than one with a
>>> >> >> dirty caveat of "httperf will crash is you do this, so don't".
>>> >> >>
>>> >> >> Beyond this snag, httperf has been instrumental in my
>>> >> >> benchmarking, so thank you!
>>> >> >>
>>> >> >> - Jim
>>> >> >> _______________________________________________
>>> >> >> httperf mailing list
>>> >> >> httperf at linux.hpl.hp.com
>>> >> >> http://www.hpl.hp.com/hosted/linux/mail-archives/httperf/
>>> >> >
>>> >> > _______________________________________________
>>> >> > httperf mailing list
>>> >> > httperf at linux.hpl.hp.com
>>> >> > http://www.hpl.hp.com/hosted/linux/mail-archives/httperf/
>>> >> >
>>> >
>>
>
> _______________________________________________
> httperf mailing list
> httperf at linux.hpl.hp.com
> http://www.hpl.hp.com/hosted/linux/mail-archives/httperf/
>
> _______________________________________________
> httperf mailing list
> httperf at linux.hpl.hp.com
> http://www.hpl.hp.com/hosted/linux/mail-archives/httperf/
>



More information about the httperf mailing list