New Availability Features of Local Area VAXcluster Systems By Lee Leahy Abstract the VAXcluster system to VMS version 5.4-3 increases tolerate and work around the availability of local network failures. area VAXcluster (LAVc) This paper describes the configurations by allowing availability features added the use of multiple local to local area VAXcluster area network (LAN) adapters (LAVc) support in VMS in the VAXcluster system. version 5.4-3. These Availability is increased features support multiple by enabling fail-over local area network (LAN) between LAN adapters, adapters, reduce the time reducing channel failure required to detect network detection time, and path (channel) failures, providing better network and provide additional troubleshooting. Combined, support for network these changes significantly troubleshooting. (Table increase the availability 1 presents definitions for of LAN-based VAXcluster terms used throughout the configurations by allowing paper.) Table 1 LAVc Terminology ___________________________________________________________________ Channel A data structure in PEDRIVER that represents a network path (see network path below). Each channel is associated with a single virtual circuit (VC). Datagram A message that is requested to be sent by the client of the LAN driver. A datagram does not have guaranteed delivery to the remote system. The datagram may never be sent, or could be lost during transmission and never received. LAN Adapter An Ethernet or fiber distributed data interface (FDDI) adapter. Each type of LAN adapter has a unique set of attributes, such as the receive ring size. Digital Technical Journal Vol. 3 No. 3 Summer 1991 1 New Availability Features of Local Area VAXcluster Systems Table 1 (Cont.) LAVc Terminology ___________________________________________________________________ LAN Address The network address used to reference a specific LAN adapter connected to the Ethernet or FDDI. This address is displayed as six hexadecimal bytes separated by dashes, e.g., 08-00-2B-12-34-56. LAN Segment An Ethernet segment or FDDI ring. Each type of LAN has a unique set of attributes, e.g., maximum packet size. LAN segments can be connected together with bridges to form a single extended LAN. However, in such a LAN, the LAN segments can have different characteristics (e.g., different packet sizes for an FDDI ring bridged to an Ethernet). Network Path The pieces of the physical network traversed when a datagram is sent from one LAN address to another LAN address. The network path is represented by a pair of LAN addresses, one for the local system and one on the remote system. Each network path has a specific set of attributes, which are a combination of the attributes of the local LAN adapter, the remote LAN adapter, and each of the LAN segments and LAN devices on the path between them. PEDRIVER The VMS port driver that provides reliable cluster communication utilizing the Ethernet. Virtual A data structure in PEDRIVER that represents the Circuit data path between the local system and the remote system. This data path provides guaranteed delivery for the messages sent. PEDRIVER's datagram service, along with an error recovery mechanism, ensures that a message is delivered to the remote system or is returned to the client with an error. A virtual circuit (VC) has one channel for each network path ______________to_the_remote_system.________________________________ 2 Digital Technical Journal Vol. 3 No. 3 Summer 1991 New Availability Features of Local Area VAXcluster Systems We begin the paper to the remote system. with an overview of the If not acknowledged added LAVc availability within 2 seconds, a features of VMS version datagram is retransmitted. 5.4-3. We then present Retransmission continues the multiple-adapter until the connection support features of the new between the two systems release, with comparisons is declared broken. to the previous single- However, applications can adapter implementation. be stalled during this The detection of network error recovery process. delays is discussed, Therefore, reducing the along with how the system time for detecting channel selects alternate paths failures and retransmitting around these delays after datagrams reduces the detection. Finally, we amount of application discuss the analysis of delay introduced by network network failures and the problems. physical descriptions VMS version 5.4-3 also needed to achieve the increases availability proper level of failure by reducing the delays reporting. introduced by network congestion. This latest Added Availability Features release measures the VMS version 5.4-3 supports network delays on a channel LAVc use of up to four basis. The channel with LAN adapters for each VAX the lowest computed network system. Availability and delay value is used to performance are increased communicate with the remote by connecting each LAN system. adapter to a different LAVc network failure LAN segment. Maximum analysis is a new feature availability is achieved in VMS version 5.4-3. by redundantly bridging This feature provides these LAN segments together an analysis of failing to form a single extended channels by isolating the LAN. This configuration common network components maximizes availability responsible for the channel and reduces single points failures. LAVc network of failure by increasing failure analysis increases the number of possible availability by reducing network paths between the the downtime caused by different systems within failing network components. the VAXcluster system. To enable this feature, the Availability has also system or network manager been increased at the must provide an accurate applications level by physical description of reducing the time required the network used for LAVc to detect channel failures. communications. The LAVc protocol (NISCA) sends sequenced datagrams Digital Technical Journal Vol. 3 No. 3 Summer 1991 3 New Availability Features of Local Area VAXcluster Systems Multiple-Adapter Support stop the LAVc protocol This section describes on the LAN adapters. This the availability features support allows the system added with the multiple- manager to select which LAN adapter LAVc support in adapters will run the LAVc VMS version 5.4-3. Some protocol. limitations of the single- The means of locating adapter implementation are the LAN devices in the presented for comparison. system has also changed. Single Points of Failure The system now maintains a list of LAN devices. As In single-adapter LAVc each LAN device driver is satellites, the Ethernet loaded into the system, an adapter remains as a single entry is appended to this point of failure. This list. A new support routine failure "point" actually steps through this list extends through the network and returns a pointer to components common to all the next LAN device in the of the network paths in use system. The single-adapter for cluster communication. implementation requires The combination of VMS code changes in PEDRIVER version 5.4-3 with multiple to add a new LAN device; LAN adapters removes the the new implementation LAN adapter as a single no longer requires these point of failure in the changes. local system. Additionally, Channel Control Handshake the use of multiple LAN adapters connected to The channel control an extended LAN creates handshake is a three-way multiple network paths message exchange. The to remote systems. This exchange starts when a configuration results HELLO message is received in a higher tolerance from a remote system for network component and the channel is in failures and higher cluster the closed state, or any availability. time a CCSTART message is Adapter Selection received. Upon receiving a HELLO message on a The single-adapter closed channel, the system implementation is responds with a CCSTART configuration-dependent message. and does not allow the Upon receiving a CCSTART system manager a choice message, the system closes of adapters. The multiple- the channel if the PATH adapter support in VMS bit was set. In all cases, version 5.4-3 configures if the cluster password the system for maximum is correct, the system availability by starting responds with a VERF the LAVc protocol on message. Upon receiving all LAN adapters in the the VERF message, the system. Support is also remote system verifies provided to start and the cluster password. If 4 Digital Technical Journal Vol. 3 No. 3 Summer 1991 New Availability Features of Local Area VAXcluster Systems the password is correct, packet size of 4468 bytes the remote system sends or smaller. An increased an acknowledgment (VACK) packet size reduces the message and marks the number of messages required channel as usable by when large blocks of data setting the PATH bit. are sent. This increase The local system, upon in packet size results receiving the VACK message, in fewer messages, less also marks the channel as handshaking, and thus usable by setting the PATH better network efficiency. bit. The PIPE_QUOTA value is The channel control used to limit the number of handshake now verifies messages sent to the remote the network path used system before waiting for by this channel, an acknowledgment. PIPE_ instead of verifying QUOTA was implemented the virtual circuit to help prevent receiver (VC) as in the single- overrun on the remote adapter implementation. system. Instead of using Additionally, the handshake a fixed value, the new is used to negotiate some implementation uses a parameters between the value specified by the local and remote systems on LAN driver. This value a channel basis (instead factors in the LAN device's of assuming that the receive ring size and is parameters are common for typically larger than all channels connected to the fixed value of eight the VC). messages used previously. Packet size and pipe quota Increasing the PIPE_QUOTA are two characteristics value allows more data to that are now arbitrated be sent between the nodes between the two systems. before an acknowledgment These parameters are message is required, thus negotiated on a channel- increasing the protocol's by-channel basis to allow efficiency and reducing the different channels to fully network traffic. utilize the capabilities of These new features in VMS the specific network path. version 5.4-3 have reduced With the introduction of the amount of handshaking FDDI, larger packet sizes required to move data and are now supported. The the number of messages channel handshake between required to move large two nodes negotiates amounts of data. The result a packet size that is is greater applications supported by the entire availability through fewer network path. Any path that network-based delays. utilizes an Ethernet must Use of Hello Messages use a packet size of 1498 bytes or smaller. An FDDI- to-FDDI path on the same extended ring must use a Digital Technical Journal Vol. 3 No. 3 Summer 1991 5 New Availability Features of Local Area VAXcluster Systems The single-adapter separated by an Ethernet implementation uses a HELLO segment.) message to maintain the VC Detection of the dumbbell and not the channels. Also, configuration is performed the handshake to verify using the priority field connectivity is performed in the frame control byte by the VC, which forces of the FDDI message header. all channels to use the This field does not exist same characteristics. In in Ethernet messages and comparison, the multiple- must be created when adapter implementation uses forwarding an Ethernet HELLO messages to trigger message to an FDDI ring. the channel handshake, Ethernet-to-FDDI LAN test the network path bridges set this field's and maintain the channel value to zero. All LAVc in the open state, and messages transmitted by continuously verify the the FDDI adapters use a network topology. non-zero value for the To maintain the channel priority field. When a and test the network path, channel control message each system multicasts a is received, the value of HELLO message through each this field is checked. If of its LAN adapters every the value is non-zero, then 3 seconds. Upon receipt large messages can be used of a HELLO message (if the because the message did channel is not open), a not traverse an Ethernet channel handshake begins. segment. If the channel is open, the The priority field is also network delay is computed verified every time a HELLO and the channel packet size message is received and is verified. When an open the channel is open. A channel does not receive topology change is detected a HELLO message within 8 when a change in the seconds, it declares a priority value is received. listen time-out and the If the priority value channel is closed. goes from zero to non- Additional topology zero, the packet size is change detection is renegotiated and a larger required because FDDI- packet size may be used. to-FDDI communications If the priority value goes use large packets. If two from non-zero to zero, the systems using FDDI adapters channel packet size must exchange channel control be reduced. If this is the messages, then both can only channel with a larger agree on a large packet packet size, then the VC size. However, if the closes and forces the two network is configured in systems to reassign the the dumbbell configuration, message sequence numbers. then only the small packet size can be used. (The dumbbell configuration consists of two FDDI rings 6 Digital Technical Journal Vol. 3 No. 3 Summer 1991 New Availability Features of Local Area VAXcluster Systems Listen Time-out receiving HELLO messages VMS version 5.4-3 now are inserted into the ring consistently times out of queues pointed to by the channels in 8 to 9 seconds, current time pointer, which whereas the single-adapter prevents them from timing implementation detects the out. This implementation failure in 8 to 15 seconds. reduces CPU utilization Reducing this time reduces during the time-out scan the delays experienced by by looking at only the applications when a LAVc channels that have timed node is removed from the out. cluster. The result is an Changes to Virtual Circuit increase in applications Maintenance availability. The single-adapter The single-adapter implementation closes implementation traverses the VC and performs a the VC list and scans each channel control handshake of the receive channels every time a new channel (RCH structures embedded is established. This in the VC) to check for implementation also forces time-out. Because this each channel to use the scan is CPU-intensive, the same characteristics, algorithm was designed to specifically packet size, scan the VC list only once thereby reducing the every 8 seconds. Reducing characteristics to the this scan time required the lowest common denominator. design of a new algorithm VMS version 5.4-3 does that reduces the CPU not close the VC each utilization required to time a new channel is locate the channels that established. The channel have timed out. handshake affects only The VMS version 5.4-3 the channel and is used implementation places each to negotiate the channel open channel into a ring characteristics, including of time-out queues. The packet size. The VC time-out routine maintains remains open as long a pointer into the ring as a channel with the of queues corresponding corresponding packet size to the 8-second time-out. is open. This maintenance Each second, the time-out increases applications routine executes, removes availability by allowing any channels pointed to by channels to fail and the time-out pointer, and reestablish transparently calls the listen time-out without disrupting service routine for the channel. at the VC and systems Next, the time-out pointer communication services and the 8-second time-out (SCS) layers. pointer are updated to point to a new set of queue headers in the ring. Active channels and channels Digital Technical Journal Vol. 3 No. 3 Summer 1991 7 New Availability Features of Local Area VAXcluster Systems One Channel with Matching that messages will be Characteristics Required. received out of order. The VC can be opened as Channel Failure Not soon as the first channel Displayed. The multiple- to the remote system is adapter implementation does opened. When the VC opens, not display any messages its packet size is set when a channel fails. This to the packet size of the choice was made to maintain channel being used. The VC compatibility with the can remain open as long as previous implementation. at least one channel with We also wished to reduce a compatible packet size the number of console is open. The packet size is messages and still provide compatible if the channel enough data to isolate the packet size is greater than problem. However, without or equal to the packet size some channel failure currently in use by the VC. notification, all but one Transfers restricted to channel could fail without an FDDI ring can use a notice, thus negating all larger packet size than the availability that those that traverse an was introduced by using Ethernet LAN segment. multiple adapters. PEDRIVER now supports The LAVc network failure variable packet sizes up to analysis allows the the size supported for the system or network manager FDDI ring. Each time the to select one of the VC switches channels, the following levels of channel new channel characteristics failure notification: are copied into the VC. The no notification, if not result is that as soon as enabled; channel failure the VC switches to using notification, when barely the FDDI-to-FDDI channel, enabled; or isolation it also switches to using of the failing network the larger packet size. component, when fully Receive Message Caching. enabled. When this feature VMS version 5.4-3 is fully enabled, a failing introduces a receive network component typically message cache to prevent generates only a single any performance degradation console message instead when messages are received of displaying tens or out of order. Because of hundreds of channel failure transmission and network messages. delays, messages are Channel Selection typically received out of order at approximately VMS version 5.4-3 bases the time a channel switch its selection of a single occurs. Also, most of the transmit channel for a channel selections are remote system first, on the invoked after locating packet size and second, on a channel with a lower the network delay value. network delay value, thus The channel selection increasing the probability algorithm searches for 8 Digital Technical Journal Vol. 3 No. 3 Summer 1991 New Availability Features of Local Area VAXcluster Systems an open channel with a selection algorithm, e.g., compatible packet size in PEDRIVER or in any so that the VC does not component implementing have to be broken. If more NISCA. than one channel has a Multiple-adapter compatible packet size, the Availability Summary network delays are compared and the channel with the The multiple-adapter lowest network delay value LAVc support added to VMS is chosen. The selected version 5.4-3 increases channel is used until it the availability of fails, encounters an error, applications and of or a channel with a lower the overall cluster. network delay value is Availability is increased found. by removing the LAN Channel selection is adapter as a single performed independently point of failure. Cluster for each remote system. availability is enhanced This implementation means through continuous testing that a two-node cluster of the network paths and increases its availability correction for network through the use of more topology changes. LAN adapters, but does This implementation not achieve a performance also increases network benefit by increasing the utilization and cluster number of LAN adapters performance by taking full above two. Larger clusters, advantage of a channel's however, can take advantage characteristics. Larger of the additional LAN receive ring sizes reduce adapters and thus achieve the protocol handshaking better cluster performance. overhead. Moreover, larger Multiple LAN adapters can packet sizes reduce the also increase the bandwidth number of messages that available for use by the must be sent for large LAVc protocol. However, transfers. the actual performance is The next section discusses very configuration- and how the PEDRIVER detects application-dependent. network delays and selects Channel selection is alternate paths. limited to the transmit channel, but all channels Network Delay Detection are used to receive data. The receive cache helps VMS version 5.4-3 prevent retransmission increases application by the remote system by availability by detecting placing messages received significant network delays out of order into the and selecting alternate receive cache until the paths. As the network previous messages are gets busy, it becomes more received. This receive difficult for a LAVc node algorithm is compatible to send cluster messages. with any transmit channel These delays in network Digital Technical Journal Vol. 3 No. 3 Summer 1991 9 New Availability Features of Local Area VAXcluster Systems communications cause delays The second step of the in cluster traffic and delay calculation is to translate into delays in compare the delay times the applications. Thus, between different channels through delay detection and to the same remote system. the use of alternate paths, This comparison is a VMS version 5.4-3 reduces subtraction of the values the delays for applications computed above for each and increases overall channel. The computation cluster performance. removes the common factor Assumptions and Delay (the difference in the two Calculations system times) and results in the comparison of the PEDRIVER computes network two network delays. When delays through a series of multiple channels exist, assumptions. The primary PEDRIVER attempts to use assumptions are that the the channel with the lowest transmit and receive delays network delay value. for a path are equal, Problems and Benefits and that there are small Associated with the internal delays associated Assumptions with the LAN device. Although these assumptions The assumptions in the are occasionally invalid, network delay calculation PEDRIVER uses them because do not always hold true. there are no round-trip The arbitration delay to messages available in the transmit a message on the NISCA protocol to compute Ethernet, between a pair the delay. of systems, is not always As the first step in the equal in both directions. delay calculation for each Over the long term, this channel between nodes, assumption would be valid each node time-stamps if the systems are sending the HELLO message just the same number of messages prior to transmission. in each direction; however, When the HELLO message is this is not typically the received, the time stamp case. When this assumption is subtracted from the does not hold true, i.e., local system time. This if the transmit delay is resulting value equals longer than the receive the sum of the transmit delay, then additional queue delay, the network delay is introduced when delay, the receive queue transmitting messages using delay, and the difference this channel. in the two system times. The assumption that Applying the assumptions internal delays are small reduces this value to the depends upon the network sum of the network delay traffic and the transmit and the difference in the traffic generated for an two system times. adapter by the other LAN clients. If another LAN client is a heavy user of 10 Digital Technical Journal Vol. 3 No. 3 Summer 1991 New Availability Features of Local Area VAXcluster Systems a particular LAN adapter, then transmissions from large network delays. With this adapter experience this data, PEDRIVER is additional queue delays usually able to select while waiting for the alternate paths around adapter. If the network the network delays is busy, messages in the when multiple channels transmit queue have an exist, providing better additional wait. cluster performance and Finally, the network delay availability. computed is the delay Figure 1 represents an from the remote system example of network delay to the local system. Since detection. If LAN segment A the delay is not always is very busy, then PEDRIVER symmetric, it does not can detect an additional always represent the delay network delay for channels in the other direction, A1-B1, A1-B2, and A2-B1. i.e., transmitting messages PEDRIVER can then select to the remote system. Yet, an alternate path, that because the NISCA protocol is, transmit packets only does not have any round- on channel A2-B2. Use of trip messages, this is the channels A1-B1, A1-B2, and best possible delay value. A2-B1 can resume when the Even with these problems network traffic level on in the assumptions, the LAN segment A is reduced network delay calculations to about the level of LAN increase the availability segment B, or if channel of the cluster by detecting A2-B2 fails. LAVc Network Failure Analysis helping to locate the cause VMS version 5.4-3 uses of the failure. Also, as multiple LAN adapters to the cluster configuration increase availability by gets larger, or the number working around network of LAN adapters increases, delays and failures. channel failure messages Channels fail as network increase (depending on what failures occur, reducing component failed) beyond the availability provided the point where they are by these extra channels. helpful. Yet to maintain However, the VC remains cluster availability, the open, allowing cluster system or network manager communication as long as needs to be told of the a single channel remains channel failures that are open. reducing the availability. To maintain compatibility The LAVc network failure with previous VMS versions, analysis, introduced with only VC failures are VMS version 5.4-3, is used displayed on the local to analyze the network console. Displaying failures and display the messages about channel OPCOM messages that call failures would only out the failing network indicate a problem without component. This support Digital Technical Journal Vol. 3 No. 3 Summer 1991 11 New Availability Features of Local Area VAXcluster Systems requires a description of the physical network used for LAVc communications. Depending upon the description supplied, the system or network manager can select the level of failure reporting. This level may range from channel failure reporting to calling out the actual component that failed. Display of Channel Failures There is a significant difference between displaying the channel failures and performing LAVc failure analysis. This difference is shown in Figure 2, which represents a multiple-adapter LAVc configuration. 12 Digital Technical Journal Vol. 3 No. 3 Summer 1991 New Availability Features of Local Area VAXcluster Systems Looking from system VAX causing the failure and is A, the following channels the only network component exist: A1-A2, A2-A1, A1- displayed on the console. B1, A1-B2, A2-B1, A2-B2, In this small cluster A1-C1, A1-C2, A2-C1, A2-C2, configuration, LAVc A1-D1, A1-D2, A2-D1, and network failure analysis A2-D2. Let us assume that has reduced the messages DELNI B fails, causing the displayed, i.e., from following channel failures: four channel failure A1-C1, A2-C1, A1-D1, and messages to one component A2-D1. A display of channel failure message. This failures would show that simpler display provides some interesting event timely notification had just occurred but and better isolation would leave it up to the of network component system or network manager failures, allowing the to isolate the actual system or network manager failure. Also, since other to repair the network channels are still open earlier and restore the to VAX C and VAX D (A1-C2, full availability of the A2-C2, A1-D2, and A2-D2), cluster. these nodes still remain Physical Network in the cluster. However, Description the number of channels to these nodes has been LAVc network failure halved, reducing cluster analysis requires a availability. description of the physical LAVc network failure network. This description analysis uses the physical lists the components used network description to by the LAVc and the network analyze channel failures. paths that correspond to The working channel A1- the LAVc channels. C2 indicates that VAX A, The network component A1, DELNI A, LAN segment description consists of A, Ethernet-to-Ethernet several pieces of data, LAN bridge, LAN segment including a component B, DELNI D, C2, and VAX type and text description C function. The working provided by the system channel A2-D2 indicates or network manager. Some that A2, DELNI C, D2, component types will and VAX D also function. require additional data. The remaining components There are several types of are DELNI B, C1, and network components: NODE, D1. By reviewing the ADAPTER, COMPONENT, and failing channels for common CLOUD. Each NODE component failures, we see that two requires a unique node name channels use component C1, associated with it that two channels use component matches the SCSNODE SYSGEN D1, and all four channels parameter. The ADAPTER use component DELNI B. component has at least Therefore, DELNI B has one and sometimes two LAN the highest probability of addresses associated with Digital Technical Journal Vol. 3 No. 3 Summer 1991 13 New Availability Features of Local Area VAXcluster Systems it. One LAN address is the using this path. The final hardware address and the component ID value is that other, when specified, is of the remote node. the DECnet LAN address. Each network path COMPONENTs are used to description must contain describe all pieces of the two node ID values and network, both working and two adapter ID values. To nonworking. CLOUDs describe be useful for analysis, portions of the network the path description must that are working only if contain the node ID value all paths are working. Any for the node running the path failure implies that analysis. Without this node the CLOUD component may not ID value, the path cannot be working. be matched with any of the Component descriptions can LAVc channels on that node. range from actual devices Channel Mapping and and cables to internal Processing CPU bus adapters. When the component is defined, an The network path ID value is returned for descriptions are matched use in the network path with the LAVc channels by description. The choice of using the LAN addresses. the components described If possible, only the LAN is left to the system or hardware address is used network manager and allows for the mapping function. the manager to select the This mapping provides the desired level of network best analysis because analysis. Each network it remains constant with component has a reference respect to any LAN adapter. count and a working count. In clusters running mixed The reference count is VMS versions, the LAN incremented when a network hardware address is not path is defined that available for systems utilizes the network running a version prior component. The working to VMS version 5.4-3. In count is incremented each prior versions, the DECnet time a LAVc channel is LAN address is used for the opened, and decremented mapping function. each time an open LAVc Each time a LAVc channel channel is closed. is opened, the network path The network path database is searched to description consists of a locate a matching network directed list of component path description. If identifier (ID) values. found, this description For proper analysis, this is connected to the list must start with the ID channel and a scan of value for the local node. all the components in Each successive ID value in the path is performed. the list must be associated For each component in the with the next network path, the working count component through which is incremented. If the a message would travel when component switches from 14 Digital Technical Journal Vol. 3 No. 3 Summer 1991 New Availability Features of Local Area VAXcluster Systems not working to working, and a primary suspect is then a WORKING message is selected. The primary displayed. suspect is the first When a LAVc channel fails, component with the highest the corresponding network suspect count in the path is placed on a failure network path. Secondary list. The network path suspects are the other is then scanned and the components in the network working count for each path with the same suspect component is decremented. count value. The primary and secondary suspects Failure Analysis are displayed after all Related channel failures the network paths have are collected by delaying been reviewed. The other 10 seconds following the components in the suspect channel failure. Each list are removed from channel failure extends the list, and are not the time delay to the full displayed because the 10 seconds. Once the 10- failure analysis judged second delay has elapsed them to be unrelated to any following the last channel of the channel failures. failure, the full list of There are several failing network paths is limitations to the processed. failure analysis. The Computing the failure analysis requires an probabilities begins by accurate description of reviewing each of the the physical network. components in the network The failure analysis is path. If a component also looking for a common cannot be proven to work, network component failure. then it is placed on Therefore, an incorrect the suspect list and the analysis results from component's suspect count either an inaccurate is incremented. A component network description, is working if the working multiple related failures, count is non-zero; a CLOUD or too much detail. component is working if The key to a valid network the working count equals failure analysis is the the reference count. This correct description of step ends with a list of the physical network. In suspect components, each Figure 2, if the network with a suspect count that path A1-B1 incorrectly represents the number of listed DELNI B, then the times this component could failure analysis would find have caused the failure. that DELNI B is working Suspects are selected and remove it from the by comparing the suspect suspect list. The final counts for each of the analysis would list both components in a network C1 and D1 as the failing path. Each network path components. Validation of is reviewed independently the network description can be performed by network Digital Technical Journal Vol. 3 No. 3 Summer 1991 15 New Availability Features of Local Area VAXcluster Systems fault insertion and by reduce the components to a reviewing the network single failure. Instead, a failure analysis. If the primary suspect and several description is accurate, secondary suspects are then the failure analysis usually displayed. Too much should display the expected detail also requires more messages. If an inaccurate CPU cycles and memory for network description exists, analysis, and in general is unexpected messages may be a bad trade-off. displayed. In such cases, In Figure 2, if the the network description Ethernet adapter C1 fails, should be reviewed. and the transceiver cables Multiple related failures are listed in the network may also cause an incorrect description, then the failure analysis. Referring failure analysis displays again to Figure 2, two messages. The primary assume a correct network suspect is listed as description. Instead the transceiver cable of a DELNI B failure, because it is the first assume that both C1 component that matches and D1 have failed. The the failure in the path failure analysis reviews from A to C. The Ethernet the network description adapter C1 is listed and locates the single as a secondary suspect, component DELNI B because because its suspect count it is common to all of the matches the suspect count failures. In this case, of the primary suspect. the failure analysis does In this example, there correctly locate the area are no network paths of the network (something described that use Ethernet connected to DELNI B). adapter C1 without using However, further review the transceiver cable is required to identify connected between C1 and that DELNI B itself has not DELNI B. With the network failed, but rather both C1 description provided, and D1. there is no way to The choice of the network distinguish between these description, the number of two components. Therefore, components defined, and the both are displayed when path descriptions, is left either is a primary or to the system or network secondary suspect. manager. This choice allows Benefits the manager to select the The LAVc network failure level of failure reporting analysis, combined with needed to troubleshoot an accurate description the network. However, of the physical network, when the physical network enables the system or description includes network manager to maintain too much detail (e.g., the increased availability transceiver cables), it gained with the use of becomes difficult for multiple LAN adapters. the failure analysis to 16 Digital Technical Journal Vol. 3 No. 3 Summer 1991 New Availability Features of Local Area VAXcluster Systems Timely analysis and o Detect problems earlier reporting of network and report them more component failures accurately, with network significantly reduces data that helps isolate troubleshooting times the failing network and increases the overall components cluster availability. In addition to meeting these goals, the features in VMS version 5.4-3 increase the cluster Summary communication bandwidth. VMS version 5.4-3 increases Acknowledgements the availability of I want to thank Kathy Perko Local Area VAXcluster and Steve Mayhew for their configurations by providing help with the design of the the following features: multiple-adapter version of o Faster detection of PEDRIVER. Kathy reviewed channel failures the code during the implementation and provided o Support for the use of valuable input for both the multiple adapters code and this paper. Thanks o Support for the use of to Scott H. Davis, Sandy additional network paths Snaman, and Dave Thiel o Detection of network for their contributions to congestion the new PEDRIVER design. Thanks also to the LAN o Analysis of network Group (Linda Duffell, failures Dave Gagne, Rod Gamache, The goals of these features Bill Salkewicz, and Dick are to Stockdale) for the VAX communication interface o Provide higher cluster to the LAN drivers, which availability simplified the design of o Work around network the new PEDRIVER. I also congestion and network wish to acknowledge the LAN component failures while Group for their help during keeping the cluster the debug phase of this running implementation. Digital Technical Journal Vol. 3 No. 3 Summer 1991 17 ============================================================================= Copyright 1991 Digital Equipment Corporation. Forwarding and copying of this article is permitted for personal and educational purposes without fee provided that Digital Equipment Corporation's copyright is retained with the article and that the content is not modified. This article is not to be distributed for commercial advantage. Abstracting with credit of Digital Equipment Corporation's authorship is permitted. All rights reserved. =============================================================================