Jump to content United States-English
HP.com Home Products and Services Support and Drivers Solutions How to Buy
» Contact HP

hp.com home


Persistent starvation

» 

HP Labs

» Research
» News and events
» Technical reports
» About HP Labs
» Careers @ HP Labs
» People
» Worldwide sites
» Downloads
Content starts here


Author: David Mosberger, HP
Created: 20 May 2003
Last update:14 Oct 2003

Summary

The O(1) scheduler can get itself into a scenario where two tasks (a) share a single CPU and (b) always are runnable, yet one task completely starves out execution in the other task.

Description

The problem is most easily illustrated on a single-CPU machine with the test program below. The program creates two tasks, with the child task constantly sending SIGUSR1 to the parent and the parent simply counting until it receives 10,000,000 signals. With the original Linux scheduler, the two tasks share the CPU relatively equitably (about 60% vs 40%). With the O(1) scheduler, the two start out working just fine but after a few seconds, the child takes over and consumes all available CPU time (over 99%).

Status

Fixed: This problem has been fixed in recent versions of the 2.5/2.6 kernel. Finer-grained accounting seems to have been needed for this issue to go away. The problem does not occur, e.g., in kernel v2.6.0-test7.

  • Analysis: Likely a bug in the dynamic priority adjustment. The starvation starts to happen when the dynamic priority of the two tasks is equal. The dynamic priority of both tasks then stays stuck at that value. What ought to happen is that the parent should eventually get a higher priority because it relinquishes the CPU via sleep() after handling a signal.

  • Scope: believed to affect all O(1)-based kernels. Problem has been confirmed to exist on:
    • single-CPU 600MHz Pentium 3, kernel v2.4.20-8, Red Hat 9
    • single-CPU 900MHz Itanium 2, kernel v2.5.69, Debian 3.0

  • Workaround: Renicing the child by just one priority level makes the problem go away.

Credits

Thanks to Stéphane Eranian for reporting this problem and for providing a test-case.

Test program

click here to download the test program

#include <stdlib.h>
#include <stdio.h>
#include <signal.h>
#include <unistd.h>

#include <sys/types.h>
#include <sys/wait.h>

volatile unsigned long loop = 10000000;

void
handler (int n)
{
  if (loop > 0)
    --loop;
}

static int
child (void)
{
  pid_t ppid = getppid ();

  sleep (1);
  while (1)
    kill (ppid, SIGUSR1);
  return 0;
}

int
main (int argc, char **argv)
{
  pid_t child_pid;
  int r;

  loop = argc > 1 ? strtoul (argv[1], NULL, 10) : 10000000;
  printf ("expecting to receive %lu signals\n", loop);

  if ((child_pid = fork ()) == 0)
    exit (child ());

  signal (SIGUSR1, handler);
  while (loop)
    sleep (1);
  r = kill (child_pid, SIGTERM);
  waitpid (child_pid, NULL, 0);
  return 0;
}
Printable version
Privacy statement Using this site means you accept its terms Feedback to HP Labs
© 2008 Hewlett-Packard Development Company, L.P.