Performance Analysis of a High-speed FDDI Adapter By Ramsesh S. Kalkunte Abstract adopting fiber distributed The DEC FDDIcontroller data interface (FDDI) 400 host-to-FDDI network local area network (LAN) adapter implements technology as a follow- real-time processing on to Ethernet, Digital functionality in hardware, recognized the need to unlike conventional build an industry-leading microprocessor-based network adapter to service designs. To develop its high-performance this high-performance platforms. As a result, product with the available we designed and developed technological resources the DEC FDDIcontroller and at minimal cost, we 400 product. To track optimized the adapter the adapter performance design by creating through the design and a simulation model. development stages, we This model, apart from created a simulation predicting performance, model; our objective enabled engineers to was to ensure that the analyze the functional device met our performance correctness and the goals. This paper begins performance impact of with a description of potential designs. As a the DEC FDDIcontroller result, our implementation 400, followed by a brief delivers close to ultimate historical perspective performance for an FDDI and statement of the adapter and surpasses performance objectives the initial project of the adapter project. expectations. We then discuss in detail the modeling methodology As high-performance and the results achieved. systems become available In addition, we present and the use of distributed validation of these computing proliferates, the results in the form of need for high-performance measurements taken on networks increases. Faster prototype hardware. interconnects are required to achieve such performance goals. Consequently, network adapters must be able to function at higher speeds. In Digital Technical Journal Vol. 3 No. 3 Summer 1991 1 Performance Analysis of a High-speed FDDI Adapter The DEC FDDI controller 400 DECnet, the transmission The DEC FDDIcontroller control protocol with 400, also known as the the internet protocol DEMFA, is a high-speed FDDI (TCP/IP), and local area network adapter. Attached transport (LAT).[2] Figure to a host machine running 1 shows a typical network under either the VMS or the configuration using the DEC ULTRIX operating system, FDDIcontroller 400 adapter the DEMFA enables the with other Digital FDDI host to communicate with products. other network entities The XMI bus is capable of through the FDDI ring. The transferring data at rates DEMFA adapter implements up to 800 Mb/s and can Digital's proprietary XMI serve as either a CPU-to- bus protocol and can be memory interconnect, e.g., used with any system that in the VAX 6000 platform, has an XMI backplane.[1] or an I/O bus, e.g., in Laboratory measured the VAX 9000 platform.[3,4] performance data presented Also, Digital plans to later in the paper shows include the XMI bus in that the adapter hardware future systems. can sustain a practically FDDI is a timed-token, infinite stream of frames fiber-optic ring that at the full FDDI data provides a network data bandwidth of 100 megabits bandwidth of 100 Mb/s.[5] per second (Mb/s) for frame In addition to this high sizes 69 bytes or larger data rate, the advantages on the receive stream and of low signal attenuation, 51 bytes or larger on the low noise susceptibility, transmit stream. Even the high security, and low smallest, i.e., 20-byte cost (as the technology dataless, FDDI frames can matures) will make FDDI a be received at 36 Mb/s and popular interconnect of the transmitted at 47 Mb/s. 1990s.[6] The DEMFA is an FDDI Class- B single attachment station (SAS) that interfaces to the FDDI token ring network through the DECconcentrator 500. A port driver resident in the host controls the DEMFA port. The port, the port driver, and the adapter hardware implement the American National Standards Institute (ANSI) data link and physical layer functionality for FDDI LANs. This foundation supports user protocols such as the Open Systems Interconnection (OSI), 2 Digital Technical Journal Vol. 3 No. 3 Summer 1991 Performance Analysis of a High-speed FDDI Adapter Historical Perspective and bytes and larger at 100 Performance Objectives of Mb/s, i.e., the adapter the DEMFA would be able to process approximately 80,000 frames With the advent of high- per second (frames/s). performance systems and Also, twenty microseconds distributed computing was deemed an acceptable strategies, the need for adapter latency for the high-performance networking smallest FDDI frames. options has increased. Considering the relatively Traditionally, I/O adapters small number of frames a have been built to serve host system can process the current performance today, these adapter needs. As a consequence, criteria represented an such adapters offer little ambitious goal-one which or no network performance would make a product scalability to accommodate with high-performance future increases in scalability as faster CPUs demand. Scalability is became available. important to ensure that the adapter does not become Performance Modeling a bottleneck when such Considerations demands exist. Nonscalable adapters become obsolete, During the development of and the resulting frequent a high-performance product, hardware upgrades increase changes in architectural system cost. functionality, technology The first Ethernet constraints, and cost adapters, which complied considerations can result with the IEEE 802.3 in design modifications. It standard, were built in the is desirable to track the early 1980s. Only recently performance of the product do adapters exist that through its development to can process frames at the understand the impact of maximum Ethernet throughput such modifications. rate of 10 Mb/s.[7] As The DEMFA consists of many mentioned earlier, FDDI hardware entities that has the capability of perform the desired adapter supporting speeds an functions.[8] Although order of magnitude higher such hardware adapters than Ethernet. Since the have the obvious advantage header in an FDDI frame of superior performance is three times smaller over conventional, than that for Ethernet, i.e., microprocessor- FDDI frame arrival rates based adapter cards, can be as much as 30 times this advantage does not the Ethernet arrival rate. come without the risks Considering the various associated with hardwired constraints, Digital logic. Such risks have set out with the goal to a negative impact on build an FDDI adapter that project budget and schedule could process frames 150 and necessitate a risk Digital Technical Journal Vol. 3 No. 3 Summer 1991 3 Performance Analysis of a High-speed FDDI Adapter management strategy to ensure that product goals are successfully met. Performance modeling of the adapter and extending the use of such modeling to evaluate various designs formed part of this strategy. The following subsections describe the goals and tasks of the DEMFA performance modeling. 4 Digital Technical Journal Vol. 3 No. 3 Summer 1991 Performance Analysis of a High-speed FDDI Adapter Goals The performance model The set of performance served as a platform modeling goals for the that could be enhanced DEMFA evolved throughout to solve these more complex the development process. problems by simulation. Three major goals were Designs were analyzed to performance projection, determine their impact buffer sufficiency on adapter performance. analysis, and design Because the simulation testing through simulation. methodology afforded greater testability, we Performance Projection. were able to make the In the early phases of the designs more robust and design, the primary goal to answer design questions of the model was to project in a significantly shorter the adapter performance. time than other methods. This prediction gave us Consequently, modifications confidence that the design to the hardware were made could meet our performance at an early design stage expectations. and at negligible cost. Buffer Sufficiency Tasks Analysis. Buffer capacity To accomplish performance plays an important part modeling, we faced the in the performance of a following basic tasks: design. Whereas too much of choosing the metrics, this resource is wasteful, defining the workload, too little has a negative and deciding on a modeling effect on performance. methodology. Relevant It was critical to metrics to measure the determine the extent of performance of a product buffering necessary to are crucial. We chose attain the desired target metrics that are simple performance at the least to understand and provide cost. The performance model insight into the behavior considered the dependencies of the product. Also, on this resource. The areas in which workload amount of buffering was development is required varied and the effects of must be identified and such variation, manifested investigated in detail. in the simulation results, An incorrect workload were analyzed. Using these invalidates all performance results as input to a data. And the methodology cost/benefits equation used to model the system helped the designers make must be well thought- intelligent decisions out beforehand, so that concerning buffer capacity. the model is accurate and Design Testing through also flexible enough to be Simulation. As development easily changed. progressed, important design issues arose that could not be solved by simple analysis. Digital Technical Journal Vol. 3 No. 3 Summer 1991 5 Performance Analysis of a High-speed FDDI Adapter Definition of Metrics. The insight into the adapter main performance metrics behavior. For the context used were throughput and of this paper, we consider frame latency. Throughput the DEMFA processing is the rate at which pure frame streams only, frames are processed and i.e., the expressions is measured in megabits "receive throughput" and per second or frames per "receive latency" refer second. The units can to a pure receive stream be converted easily from of frames containing no one to the other, if the transmit frames. Similarly, average frame size is "transmit throughput" and specified. In this paper, "transmit latency" refer to throughput is expressed in a pure transmit stream of megabits per second. frames. Frame latency is the Workload Definition. Using elapsed time measured in a relevant traffic workload microseconds between the is very important in any time at which a frame simulation model. Since is queued for service at most systems are workload- a facility and the time sensitive, defining an at which the service is incorrect workload may completed. The following result in irrelevant descriptions illustrate the data. We identified two approach used to measure areas in which we needed receive and transmit to define workloads. We latency. The host receives then characterized the frames from and transmits traffic patterns and frames to the FDDI ring. built a workload model Receive frame latency is for performance simulation the time elapsed between based on these patterns. (1) the arrival of the o Frame receive and last bit of the frame into transmit workloads. the adapter from the FDDI The receive and transmit ring and (2) the time the workloads are stimuli frame becomes available to for the performance the host for processing. simulation. These Transmit frame latency is workloads mimic traffic the elapsed time between due to frame arrival (1) the time the adapter on the FDDI ring (i.e., starts processing a frame the receive workload) from the host and (2) the or frame transmission exit time of the first from the host (i.e., bit of the frame from the the transmit workload). adapter destined for the The receive workload FDDI ring. model generates frames The adapter can process which the DEMFA model transmit and receive receives, whereas the frames simultaneously. We transmit workload acts defined performance metrics as a source of frames to analyze a variety of to be transmitted by traffic scenarios to gain the DEMFA model on 6 Digital Technical Journal Vol. 3 No. 3 Summer 1991 Performance Analysis of a High-speed FDDI Adapter the FDDI ring. These allows changes to be workloads must be made easily. The SIMULA characteristic of actual language implements the FDDI traffic. Since FDDI simulation model.[9] LANs did not exist when The simulation-class and the DEMFA was in the queuing constructs in development stage, we this language are tailored used our experiences to help simulation and with Ethernet to derive modeling.[10,11] The these workloads, as object-oriented structures we explain in greater present other advantages detail in the FDDI Token to model development. A Ring section. debug procedure coded into o XMI traffic workload. the model prints status Apart from the DEMFA information about all the traffic, there may be queues in the model. This other traffic on the information helped us trace XMI bus due to CPU-to- the path of frames through memory transactions the system. or from other I/O One important first step adapters attached to in designing a simulation the system. The load model is to determine the on the XMI bus impacts detail at which to model. the performance of the Two factors that influence DEMFA. Consequently, we the level of detail are the designed a workload o Existing knowledge of model to mimic the the design. Usually, traffic pattern on the information gathered bus. We based our model from the behavioral and on the traffic patterns analytical models of observed for real a design helps to make XMI bus traffic. The a performance model performance of DEMFA may abstraction. Designs degrade as this traffic with behavior that increases because cannot be analyzed by the DEMFA traffic these lower-level models and the non-DEMFA have to be modeled in traffic consume common greater detail. resources. The other traffic is referred to o Expectation of as the XMI interference performance model workload. The XMI accuracy. Typically, Workload Generator a performance model section describes the predicts results model for this workload. accurate to within Modeling Methodology. The ±10.0 percent of the simulation model has a performance that would hierarchical design to be achieved with the allow the construction of actual hardware. smaller, more manageable blocks, i.e., submodels. The structure also Digital Technical Journal Vol. 3 No. 3 Summer 1991 7 Performance Analysis of a High-speed FDDI Adapter During the design phase, a part and generalized behavioral and structural to the adapter system models of hardware were in environment in which the development. This hardware piece operates. Models that was partitioned across represent the changes were important functional included and interfaced as boundaries. Hardware submodels. These submodels within these boundaries served the dual purposes of would be modeled and testing the new design and tested thoroughly by the of improving the accuracy respective development of the performance model. engineers. Hence, to include details of these Design of the Simulation Model pieces of hardware in our model would have resulted The performance simulation in redundant effort. Since model consisted of the the interfaces and the following major components: gross functionality of o FDDI ring the hardware within these boundaries are relevant to o FDDI chip set and parser performance, we did include o Packet memory controller these components in our o Host interface model. Existing hardware components, such as the o XMI system FDDI chip set, were grouped o Host system together before being The base-level model modeled for functionality. evolved over time, as we Each submodel was designed gained insight into the and tested separately behavior of the individual to ensure conformity to components and defined the functionality and workloads. The model performance of other evolved further to support behavioral and structural the need to analyze new models. This strategy designs through simulation. resulted in the base- This section briefly level performance model describes the components of that we used to generate the final model, as listed preliminary performance above. data for the DEMFA. As development progressed, FDDI Token Ring we encountered design The FDDI token ring was changes of various modeled to act as a source complexities. Simple of received frames and as design changes resulted a sink of transmit frames. in very small changes in Gross functionality for the the performance model. But remainder of the FDDI nodes larger and more complex and network components was design changes required desirable. Consequently, that we investigate we designed a black-box behavior both specific model for the FDDI ring to the piece of hardware that provides two-way of which the design is interaction with the FDDI 8 Digital Technical Journal Vol. 3 No. 3 Summer 1991 Performance Analysis of a High-speed FDDI Adapter chip set and parser model. the traffic as seen in This FDDI model allocates realistic networks. time on the FDDI ring Several studies had been for transmit and receive conducted on large Ethernet transactions. The model LANs within Digital; a case also controls a receive study by D. Chiu and R. workload generator when Sudama is one example.[12] frames are received by the We analyzed the results adapter. from these studies to The receive workload understand the frame- generator is an analytical size distribution in such model used to create networks. From the analysis different patterns of we concluded that receive traffic to the o Frame sizes on the DEMFA. The parameters input networks are related to to this workload model are user protocols. Frames the average frame size, the in a test sample were frame-size distribution, distributed about a few the frame type, the load, discrete frame sizes and the number of back- (i.e., modes of the to-back frame arrivals distribution) rather (i.e., the burst rate or than over a wide range "burstiness" of the frame of frame sizes. arrivals). We varied these parameters to generate o The probability function desired workloads. of the frame sizes The average frame size and near each mode can be frame-size distribution approximated as a normal parameters generate distribution centered different size frames. about the mode. Actual frame sizes can be A composition analysis of specified as normally or the measurements provided exponentially distributed different modal mean sizes, about the mean or as standard deviations, and constant. The workload the probabilities of frames model can generate station belonging to the different management (SMT), LLC SNAP modes. We used these values /SAP, or LLC non-SNAP/SAP to statistically create frame types and can create Ethernet network traffic. a load between 0 and 100 For our performance Mb/s. If workloads are measurements, it was less than the peak FDDI necessary for us to change bandwidth, i.e., 100 Mb/s, this traffic pattern the frame arrival pattern appropriately to reflect can be specified as an the differences that exist exponential, constant, between FDDI LANs and or normal distribution. Ethernet LANs. The FDDI The model can generate a frame header is smaller wide range of synthetic than the Ethernet header, traffic patterns, but to and the largest FDDI frame obtain credible performance is approximately three results, we characterized times the size of the Digital Technical Journal Vol. 3 No. 3 Summer 1991 9 Performance Analysis of a High-speed FDDI Adapter largest Ethernet frame. network. We considered We factored these changes different contributions into the Ethernet model to and found their effect on produce an FDDI workload adapter throughput to be model. The FDDI workload negligible. Therefore, only has either four or five one case for each workload modes. is presented in this paper. The four-mode distribution FDDI Chip Set and Parser contained a majority of The FDDI chip set, also frames grouped around 60, referred to as the FDDI 576, 1518, and 4496 bytes. corner, is the base-level The standard deviations technology that was part of the frames around these of Digital's strategy to mean values were 22, 5, 2, build high-performance, and 2 bytes, respectively. low-cost data links for The frame volumes at these FDDI LANs. This chip set modal values represented performs serial-to-parallel contributions of 29 data conversion, acts as percent, 67 percent, 3 an interface to the packet percent, and 1 percent, memory in the data link respectively, to the total layer, and can support load. a data rate of 100 Mb The five-mode frame sizes /s.[13] The entire chip were grouped around 33, 80, set, except for the ring 576, 1518, and 4496 bytes. memory controller (RMC), The standard deviations was modeled as a black box of the frames around these with a specified per-frame means were 1, 20, 5, 2, and latency. The RMC and the 2, respectively. The frame associated first in, first volumes at these modes out (FIFO) buffers for contributed 26 percent, the receive and transmit 15 percent, 55 percent, stream staging were modeled 3 percent, and 1 percent, in greater detail. The respectively, to the total detail was necessary load. to capture any overflow In the above FDDI workload or underflow conditions model, the mode of 1518 that might occur in the bytes is determined by the FIFO buffers. We also Ethernet network's maximum modeled the interaction frame-size capacity and, between the transmit and similarly, the mode of receive streams. The RMC 4496 bytes is determined by model, which served as the the FDDI network's maximum front end of the chip set frame-size capacity. These model, was also capable two modal frame sizes of generating control represent traffic generated and data transactions to by large data transfer perform read/write memory operations, e.g., file operations. transfers. Contributions due to these two modes vary from network to 10 Digital Technical Journal Vol. 3 No. 3 Summer 1991 Performance Analysis of a High-speed FDDI Adapter The parser hardware The high throughput off-loads some host capability of FDDI rings frame processing to can result in traffic the adapter. The parser patterns that cause a reads information about strain on the packet a receive frame from the memory. The PMC model RMC bus and creates a allowed us to study such forwarding vector, which scenarios. It is also is appended to the frame. important to analyze the This forwarding vector is working and performance used by different entities of the ring entry mover, in the adapter and the host which moves frames between to efficiently process a different interfaces frame. The parser latency by manipulating the to generate this vector control information of a varies with the frame stored frame. The control type and size. The parser information and frame model helped to analyze data reside in the packet the impact of this latency memory. on performance. This model Host Interface mimics the hardware to produce a forwarding vector The host interface, also for a given frame with a called the host protocol pertinent latency. decoder, moves data between Packet Memory Controller the adapter and the host system through an XMI The packet memory bus and also interfaces controller (PMC) is the with the PMC. We modeled heart of the adapter the interface to include system. The ring entry details of the dual direct mover stage, the packet memory access (DMA) design buffer memory, and (one channel for the the packet memory receive stream and one for interface constitute the transmit stream), the the functionality in the staging buffers associated PMC.[8] The PMC controls with each DMA channel, the arbitration and the XMI interface, and the servicing of requests to PMC interface. The host and from memory to effect interface also has the the efficient transfer of capability of scheduling information. The PMC also write operations while controls the movement of waiting for the delivery of pointers corresponding to read information. Priority every frame. These pointers schemes to complete such and the associated protocol transactions, i.e., generate work for the RMC, handshake mechanisms, the host interface, or the are important from a adapter manager. performance perspective and, hence, were included in the model. XMI System Digital Technical Journal Vol. 3 No. 3 Summer 1991 11 Performance Analysis of a High-speed FDDI Adapter The XMI system interacts times in such systems. with the host system and The model presented in was modeled to include this paper depicts the VAX details of the XMI bus 9000 I/O architecture and and memory. This model current implementation. consists of an XMI bus Performance may vary with submodel that interfaces other implementations. to the XMI end of the host XMI Workload Generator. We interface model of the designed the XMI workload adapter. The submodel also generator to represent interacts with a memory the load on the XMI bus, model and an XMI workload excluding traffic from the generator model. The bus DEMFA. This load tends to submodel implements the XMI have a deteriorating effect protocol. on DEMFA performance and Memory Model. The memory thus, is referred to as the model was designed to XMI interference workload. generate responses to It was important not only transactions that request to model the amount of memory. Latency for these load but also to capture requests is the memory the arrival pattern of access time, which includes this traffic. The workload a queue wait time. There model generated traffic are basically two types based on three inputs: the of systems that support total XMI bandwidth used the DEMFA, as shown in by other XMI nodes, the Figure 2. The type is average length of each XMI determined by whether transaction, and the burst the XMI is used as the rate of the frame arrivals. CPU bus, denoted in this Transaction lengths on XMI paper as the XMI (CPU) bus vary from one to five XMI configuration, or as the cycles (i.e., 64-nanosecond I/O bus, denoted as the XMI cycles). The maximum number (I/O) bus configuration. of nodes that can exist on The only difference between an XMI bus is 14. Thus, the the two systems is memory burst rate can vary from 1 access time. This time is to 13. greater if XMI is used Typically, traffic on an as the I/O bus; there XMI bus consists of many is an added latency on back-to-back transactions the read transactions of various sizes. We performed to fetch memory decided to use the worst from locations that are case values for both not local to the XMI bus. the burst rate and the The memory space that is transaction length in the local to the CPU bus is XMI interference workload accessed through another presented in this paper. I/O adapter mechanism. The worst case burst rate Such I/O adapters, CPU is 13, and the worst case buses, and main memory transaction length is 5 XMI bandwidth all play a role cycles. in determining the access 12 Digital Technical Journal Vol. 3 No. 3 Summer 1991 Performance Analysis of a High-speed FDDI Adapter Host System of relevant events and quantities and print out The host system consists this information at the of the CPU, disks, layered end of a simulation. As software, the operating discussed previously, the system, the device driver, hardware performance of and a host workload the DEMFA varies depending generator. The host system upon whether the system was modeled in accordance is implemented to use with assumptions presented the XMI bus as a CPU bus in the section Results from or as an I/O bus. This Performance Simulation. section presents simulation The CPU, disks, host results for both uses, software, and the operating where appropriate. system were modeled in Assumptions such a way that they do not become bottlenecks For our simulation during frame reception or purposes, we made several transmission. A model of assumptions. These the device driver handles assumptions make the frame transmission and results more general and reception. The driver bring out the hardware interacts with a host performance characteristics workload generator, which of the DEMFA, indicating creates different traffic the upper bounds of patterns for transmission. performance that the This workload generator adapter can achieve. has the same capabilities CPU and Software as the receive workload Capabilities. The device generator discussed in an driver and the host earlier section. software do not become bottlenecks during frame Results from Performance reception and transmission. Simulation We assumed that the host The data presented in this CPU had enough computing section was generated ability to process frames using the simulation without posing as a model of the adapter. performance bottleneck. This data represents the Memory Bandwidth. Frames hardware performance of the sent from or received by DEMFA; system performance the host result in XMI with the DEMFA as a bus transactions that component is not within are written to or read the scope of this paper. from the host memory. We input parameters to Throughput varies with the simulation model that the memory implementation defined traffic patterns and interleaving. We and ran simulations for a assumed that the memory sufficient length of time implementation and to ensure that we captured interleaving were selected steady-state behavior. The such that no overloading models maintain statistics of the memory occurs, thus Digital Technical Journal Vol. 3 No. 3 Summer 1991 13 Performance Analysis of a High-speed FDDI Adapter eliminating wasted bus cycles. 14 Digital Technical Journal Vol. 3 No. 3 Summer 1991 Performance Analysis of a High-speed FDDI Adapter Buffer Alignment and an exponential arrival Segmentation. We assumed pattern, the throughput that data for transmission increases at a rate and buffers for reception proportional to the load were hexaword (i.e., 32- up to a certain point, and byte) aligned and that then gradually decreases frames were unsegmented. until the load is 100 Mb/s. Simulation Traffic. The decrease in throughput No error frames or is caused by the loss of error transactions were resources due to excessive simulated, since we assumed loading. these to be negligible. We simulated traffic with No adapter manager traffic a constant arrival pattern was simulated during the and conducted the same performance measurements, experiments. These results since these represent a are also shown in Figure 3. very negligible fraction Observe that the point of the frames received of maximum throughput during steady-state ring and the rate at which operation. the throughput decreases Throughput Measurements after reaching the maximum vary with the arrival Measurements were made to pattern of traffic. After determine the throughput performing experiments that the adapter can on other frame sizes, sustain for received and we concluded that there transmitted frames. It is is no fixed relationship important to understand how between the maximum throughput is related to achievable throughput and the load, the burstiness of the throughput at FDDI frame arrivals, the percent saturation (i.e., 100-Mb XMI interference, and the /s load). Also, there is frame size. This section graceful degradation in presents the results of the throughput after the peak. throughput measurements Receive Throughput for as functions of these Four- and Five-mode parameters. Workloads. We measured Received Throughput as a adapter receive throughput Function of the Load. The for four- and five-mode graph shown in Figure 3 workloads with a load is the result of several of 100 Mb/s. The XMI experiments conducted by interference workload was varying the load for 33- varied, and the results byte received frames. are presented in Figure The frame arrival rates 4. The adapter can receive depend on the load and the the workload at 100 Mb/s, arrival rate distribution. if the XMI interference As mentioned earlier, workload remains moderate. the model is capable of Figure 4 also shows that simulating traffic with there is very little different arrival patterns. difference in performance Figure 3 shows that, with between the four- and Digital Technical Journal Vol. 3 No. 3 Summer 1991 15 Performance Analysis of a High-speed FDDI Adapter five-mode workloads. Large Transmit Throughput frames constitute a major for Four- and Five- part of both workloads, and mode Workloads. Figure 6 larger frames can be easily illustrates the transmit supported by DEMFA at full throughput for a four-mode FDDI data bandwidth. workload as a function of Receive Throughput as the XMI interference. We a Function of Frame performed simulations to Size. Figure 5 shows the obtain throughput data for throughput as a function of the DEMFA when attached to the frame size and the XMI an XMI (CPU) bus or to an interference workload, with XMI (I/O) bus. Throughput DEMFA attached to an XMI for the XMI (CPU) bus (CPU) bus. Smaller frames configuration is 100 Mb/s have a lower throughput and is insensitive to low, rate than larger ones XMI interference loads. because of high control Whereas, XMI (I/O) bus /data overhead. Since configuration measurements control transactions are negatively affected consume bandwidth, the by all levels of XMI bandwidth available for interference traffic. The data movement is reduced. higher read latency that Consequently, the overall is inherent to an XMI (I/O) throughput rate is lower. bus configuration degrades Another reason for lower further with increasing adapter throughput is the interference traffic. In XMI utilization by traffic addition the degradation from other nodes on the XMI appears to be linear. The bus. This XMI interference throughputs observed for results in less available the five-mode workloads are XMI bandwidth for the very similar to the data adapter and hence, less shown in Figure 6. throughput. Transmit Throughput as The adapter throughput a Function of the Frame for an XMI (I/O) bus Size. Figure 7 shows the configuration differs throughput as a function only slightly from of the frame size when the that for an XMI (CPU) DEMFA is attached to an XMI bus configuration. Any (CPU) bus. Throughput is differences that exist are also presented for various for frames smaller than 64 XMI interference workloads. bytes, since the adapter As in the case of receive experiences a per-frame throughput, transmit latency cost because the throughput degrades as the memory is not local to the frame size decreases and XMI bus. the XMI interference load increases. This degradation is again attributed to high control/data overhead and lower XMI bandwidth availability. 16 Digital Technical Journal Vol. 3 No. 3 Summer 1991 Performance Analysis of a High-speed FDDI Adapter Latency Measurements linearly with increased XMI interference. Latency, as it relates to Transmit Latency as a the DEMFA, is explained in Function of the Frame the Definition of Metrics Size. Figure 10 presents section. We measured transmit latency results the latency for receive for an XMI (CPU) bus and transmit frames. configuration and Figure Frame latency consists 11 presents the results of two components: the for an XMI (I/O) bus active component, which configuration. The latency contributes to the time was measured as a function when the frame or a portion of the frame size for thereof is being processed various XMI interference at a service center (also workloads. Transmit latency called the service time); is more sensitive to and the passive component, the system type and to which is the time when the the XMI interference frame or a portion thereof workload because most waits for access to the XMI transactions that service center. All latency constitute transmit traffic data presented in this are read operations. section represents averages There is a distinctly across a large number of higher latency cost samples. When measuring associated with these the latency of a frame, transactions in the XMI we applied the maximum (I/O) bus configuration load that can be sustained as compared to the XMI continuously for that frame (CPU) bus configuration. As size and type. in the case of receive Receive Latency as a latency, the transmit Function of the Frame Size. latency degrades with XMI Figure 9 represents the interference. receive latency data as a function of the frame size for an XMI (CPU) bus configuration. Latency is also presented for various XMI interference levels. We present performance data for only one XMI configuration because there is little variation between the results for the XMI (CPU) bus and XMI (I/O) bus configurations. Both frame size and latency are plotted using logarithmic scales. The data illustrates that XMI latency increases Digital Technical Journal Vol. 3 No. 3 Summer 1991 17 Performance Analysis of a High-speed FDDI Adapter An FDDI tester is Performance Measurements with also attached to the the Prototype DEMFA DECconcentrator 500 and The intent of performing acts as a source of frames. measurements with the The FDDI tester is a prototype DEMFA was useful tool for testing twofold. First, we wanted the DEMFA product; the to confirm the performance tester is capable of predictions arrived at transmitting traffic at through simulation. And 100 Mb/s and can generate second, we wanted to frames of various sizes measure some features that and types with different we did not implement in destination addresses. A the model, either because standalone software driver they were not quantifiable and operating system runs or because they were too on the VAX 6000 system and complex to model. Again, is used for DEMFA hardware we present only hardware performance tests. A logic performance measurements; analyzer is used to measure system performance with the elapsed time and count DEMFA is beyond the scope events. of this paper. Measurement Setups The experimental configuration required to perform the measurements on the prototype DEMFA is shown in Figure 12. This configuration consists of a VAX 6000 processor connected to a DECconcentrator 500. The VAX 6000 system has an XMI backplane. The DEMFA occupies one of the slots in the XMI backplane and is part of the XMI (CPU) bus configuration in this system. 18 Digital Technical Journal Vol. 3 No. 3 Summer 1991 Performance Analysis of a High-speed FDDI Adapter Throughput Measurements throughput and is therefore The device driver measures acceptable. receive and transmit Transmit Throughput throughput and is designed Measurements. To measure to perform minimal the transmit throughput, processing for each frame. we forwarded frames from Receive Throughput the driver to the FDDI ring Measurements. We measured at the maximum possible the receive throughput by rate. The throughput was sending a continuous stream calculated from the number of frames at 100 Mb/s from of frames that could be the FDDI tester to the sent in a unit of time. DEMFA. We varied the frame The adapter can transmit size for the tests and ran frames larger than 51 each test for a length of bytes at 100 Mb/s. Transmit time sufficient to verify throughputs measured in data convergence. the laboratory validate the modeled results as closely We compared the prototype as the receive throughput measurements with the validation results shown modeled results for receive in Figure 13. The modeled throughput as a function of throughput results were the frame size for an XMI lower than the measured (CPU) bus configuration. results because we used This validation of the a conservative approach receive throughput results to modeling the memory is shown in Figure 13. latency. The hardware measurements Multisegmented and demonstrate that the Misaligned Frames. adapter can receive frame Segmentation and alignment sizes above 69 bytes at 100 of transmit frame buffers Mb/s. Throughput degrades in host memory is variable. for smaller frame sizes. Typically, frames consist These measurements closely of two segments, the first validate the modeled containing the frame results. The throughput header information and for the performance model the second containing the demonstrates that the DEMFA data. Since the DEMFA must can continuously receive access control and data frames greater than 65 separately, segmentation bytes at 100 Mb/s. There is makes this process less a slight difference between efficient, from a hardware the measured and modeled perspective, than if results at the lower frame the data and control sizes because residual information exist in XMI interference traffic the same buffer. Also, exists in the measured buffers may be aligned system. This experimental to start on different error is unavoidable, but byte boundaries. Since the difference is a small the DEMFA transactions percentage of the total begin on hexaword (i.e., 32-byte) boundaries, Digital Technical Journal Vol. 3 No. 3 Summer 1991 19 Performance Analysis of a High-speed FDDI Adapter hexaword alignment of frame because of the practical data in the host buffers difficulty to perform is the most efficient latency measurements on arrangement from the a large number of frames. adapter's perspective. We measured throughput with unsegmented and two- segmented frames, and with frames aligned on longword, quadword, and hexaword byte boundaries. Segmentation and alignment variations cause negligible throughput degradation for frames 64 bytes or larger. Latency Measurements We used the logic analyzer to measure the frame latency. The logic analyzer responds to signals that indicate the starting and ending times for processing a frame. The difference between these two times is the frame latency. The events were chosen such that the measurements conformed to the definition of latency as described in the Definition of Metrics section. Note that the traffic pattern used to measure latency in this section differs from the workload illustrated in the section Performance Results from Simulation. Here, a single frame was received or transmitted, and we measured latency due to that frame only. Whereas previously, we used the simulation model to measure latency as an average across a large number of frames representing a load equal to the maximum sustainable adapter throughput. The workloads differ 20 Digital Technical Journal Vol. 3 No. 3 Summer 1991 Performance Analysis of a High-speed FDDI Adapter Receive Latency. The and pessimistic memory receive frame latency latency assumptions for predictions from the transmit frames. performance model and Throughput due to the four- adapter service time and five-mode workloads measurements taken from is nearly the same. The the prototype hardware average frame size for are shown in Figure 14. these distributions is These latency measurements 496 bytes and 487 bytes, validate the model respectively. Thus, predictions in a way throughput is a function similar to that for the of the frame size and throughput measurements. independent of the number Transmit Latency. We also of modes that exist in the compared transmit latency workload. Also, this data measurements to predictions leads to the conclusion from the performance that the DEMFA may never model and found these pose as a performance measurements to approximate bottleneck in a real the modeled results. But network environment. actual latency measurements For the simulation, we were slightly lower than chose an XMI workload the modeled results, again with an extremely high due to a conservative burst rate. Actual XMI modeled latency. systems may result in better throughput than that Conclusions presented in this paper. The performance model The resources required was intended to track to create XMI workload the performance of the variations are not easily prototype hardware to an accessible, so we did not accuracy of ±10.0 percent. perform measurements on The comparisons between the prototype adapter modeled and measured under different workload results demonstrate conditions. But since other that the model actually measurements validated surpasses our goal. The the model predictions measured performance so closely, measuring for the XMI (I/O) bus performance with varied configuration using a XMI workloads proved VAX 9000 system validated unnecessary. the modeled results Validation of the results as closely as did the that we predicted through corresponding results simulation increased our for the XMI (CPU) bus confidence in various configuration. Disparity, design mechanisms that if any, between the modeled were verified using the and the measured results performance model as a test basically stem from platform. When designing unavoidable measurement new I/O architecture or errors for receive frames memory implementations, our Digital Technical Journal Vol. 3 No. 3 Summer 1991 21 Performance Analysis of a High-speed FDDI Adapter performance model allows 3. B. Allison, "An Overview changes to be made easily of the VAX 6200 Family in order to determine the of Systems," Digital impact of such changes on Technical Journal, no. 7 performance. The modeling (August 1988): 19-27. strategy proved very 4. D. Fite, Jr., T. Fossum, effective and helped to and D. Manley, "Design deliver a high-quality Strategy for the VAX product with better 9000 System," Digital performance than what was Technical Journal, vol. intended initially. 2, no. 4 (Fall 1990): 13-24. Acknowledgements 5. F. Ross, "FDDI- I wish to acknowledge A Tutorial," IEEE all members of the DEMFA Communications Magazine, development group for vol. 24, no. 5 (May their help in modeling 1986): 10-17. the adapter. Their openness 6. S. Joshi, "High- to examine new designs Performance Networks: to enhance performance A Focus on the Fiber resulted in this high-speed Distributed Data adapter. I am also grateful Interface (FDDI) to the group for assisting Standard," IEEE MICRO with the performance (June 1986): 8-14. measurements. Finally, I wish to extend special 7. R. Stockdale and J. thanks to Howard Hayakawa, Weiss, "Design of Gerard Koeckhoven, Satish the DEC LANcontroller Rege, Andy Russo, and Dick 400 Adapter," Digital Stockdale. Technical Journal, vol. 3, no. 3 (Summer 1991, References this issue): 36-47. 8. S. Rege, "The 1. R. Gillett, "Interfacing Architecture and a VAX Microprocessor Implementation of a to a High-speed High-performance FDDI Multiprocessing Bus," Adapter," Digital Digital Technical Technical Journal, vol. Journal, no. 7 (August 3, no. 3 (Summer 1991, 1988): 28-46. this issue): 48-63. 2. W. Hawe, R. Graham, 9. Programmer's Reference and P. Hayden, "Fiber Manual for SIMULA for Distributed Data VAX under VMS Operating Interface Overview," System (North Berwick, Digital Technical Scotland: EASE Ltd., Journal, vol. 3, no. 1991). 2 (Spring 1991): 10-18. 22 Digital Technical Journal Vol. 3 No. 3 Summer 1991 Performance Analysis of a High-speed FDDI Adapter 12.D. Chiu and R. Sudama, "A Case Study of DECnet Applications and 10.G. Birtwistle, O. Protocol Performance," Dahl, B. Myhrhaug, Proceedings of the ACM and K. Nygaard, SIMULA SIGMETRICS Conference BEGIN (Kent, England: (May 1988). Chartwell-Bratt Ltd., 13.H. Yang, B. Spinney, 1980). and S. Towning, "FDDI 11.L. Kleinrock, Queueing Data Link Development," Systems, vols. 1 and 2 Digital Technical (New York: John Wiley Journal, vol. 3, no. and Sons, 1976). 2 (Spring 1991): 31-41. Digital Technical Journal Vol. 3 No. 3 Summer 1991 23 ============================================================================= Copyright 1991 Digital Equipment Corporation. Forwarding and copying of this article is permitted for personal and educational purposes without fee provided that Digital Equipment Corporation's copyright is retained with the article and that the content is not modified. This article is not to be distributed for commercial advantage. Abstracting with credit of Digital Equipment Corporation's authorship is permitted. All rights reserved. =============================================================================