Maintaining Network QoS Across NIC Device Driver Failures Using Virtualization

Le, Michael; Gallagher, Andrew; Tamir, Yuval; Turner, Yoshio
HP Laboratories


Keyword(s): device driver, recovery, virtual machine, fault tolerance, QoS, network, dependability, resiliency

Abstract: Device driver failures have been shown to be a major cause of system failures. Network services stress NIC device drivers, increasing the probability of NIC driver bugs being manifested as server failures. System virtualization is increasingly used for server consolidation and management. The isolated driver domain (IDD) architecture used by several virtual machine monitors, such as Xen, forms a natural foundation for making systems resilient to NIC driver failures. In order to realize this potential, recovery must be fast enough to maintain QoS for network services across NIC driver failures. We show that the standard Xen configuration, enhanced with simple detection and recovery mechanisms, cannot provide such QoS. However, with NIC drivers isolated in two virtual machines, in a primary/warm-spare configuration, the system can recover from an overwhelming majority of NIC driver failures in under 10ms.

Additional Publication Information: To be published in and presented at IEEE International Symposium on Network Computing and Applications, July 9-11, 2009

External Posting Date: May 21, 2009 [Fulltext]. Approved for External Publication
Internal Posting Date: May 21, 2009 [Fulltext]

