The Architecture and Implementation of a High-performance FDDI Adapter By Satish L. Rege Abstract The architecture and With the advent of fiber implementation presented distributed data interface in this paper are for the (FDDI) technology, Digital DEC FDDIcontroller 400, saw the need to define Digital's high-performance, an architecture for a XMI-to-FDDI adapter known high-performance adapter as DEMFA. This adapter that could transmit data provides an interface 30 times faster than between an FDDI LAN and previously built Ethernet Digital's XMI-based CPUs, adapters. We specified presently the VAX 6000 a first generation FDDI and VAX 9000 series of data link layer adapter computers.[1,2] DEMFA architecture that is implements all functions capable of meeting the at the physical layer and maximum FDDI packet- most functions at the data carrying capacity. The link layer.[3,4] DEC FDDIcontroller 400 We begin the paper by is an implementation differentiating between of this architecture. an architecture and an This adapter acts as an implementation. Then we interface between XMI- present our project goal based CPUs, such as the VAX and analyze the problems 6000 and VAX 9000 series encountered in meeting of computers, and an FDDI this goal. Next we give a local area network. historical perspective of Fiber distributed data Digital's LAN adapters. interface (FDDI) is We follow this discussion the second generation by describing in detail local area network (LAN) the architecture and technology. FDDI is defined implementation of DEMFA. by the American National Finally, we close the paper Standards Institute (ANSI) by presenting some results FDDI standard and will of performance measurement coexist with Ethernet, at the adapter hardware the first generation LAN level. technology. Digital Technical Journal Vol. 3 No. 3 Summer 1991 1 The Architecture and Implementation of a High-performance FDDI Adapter Adapter Architecture and Our Goal and the problem Implementation Definition Before we discuss the Our goal was to define DEMFA architecture and an architecture for an its implementation, it is FDDI adapter that meets necessary to understand the ultimate performance what is meant by an goal of transmitting adapter architecture and approximately 450,000 an implementation of that packets per second architecture. An adapter (packets/s). This goal architecture specifies is considered ultimate a set of functions and because 450,000 packets the method of executing /s is the maximum packet- these functions. An carrying capacity of FDDI. implementation that Note that this transmission incorporates all of these rate is approximately 30 functions and conforms to times greater than that the method of executing of Ethernet, which can these functions becomes transmit approximately a member of the adapter 15,000 packets/s. architecture family. Thus, for a given architecture, many implementations are possible. To grasp the concept presented in the previous paragraph, consider the VAX CPU architecture. This architecture defines the instruction set, which is composed of a set of arithmetic, logical, and other functions, and a format for the instruction set that a processor should implement to be classified as a VAX computer. Examples of VAX implementations are the VAX 11/780 and the VAX 9000 computers, which both conform to the VAX CPU architecture. 2 Digital Technical Journal Vol. 3 No. 3 Summer 1991 The Architecture and Implementation of a High-performance FDDI Adapter Before defining the ANSI defines the protocol problem, the basic for interfacing an adapter properties of XMI and FDDI to an FDDI LAN.[6] But we must be understood. XMI is had to define the protocol a 64-bit-wide parallel bus between the adapter and the that can sustain a 100- VMS and ULTRIX operating megabyte-per-second (MB/s) systems used by most VAX bandwidth for multiple computers. Thus, solving interfaces.[5] Each the problem required us to interface attached to the architect a data link layer XMI bus is referred to as a adapter that would satisfy commander when it requests both protocols and meet data or a responder when the FDDI maximum packet it delivers data. XMI is an transfer capability. interconnect that can have transactions from several Historical Perspective commanders and responders in progress simultaneously. The computer industry has FDDI is a packet- built many LAN adapters oriented serial bus that since the inception of operates using the token Ethernet ten years ago. ring protocol and has a The first LAN adapter built bandwidth of 100 megabits by Digital was the UNIBUS- per second (Mb/s).[6] FDDI to-NI adapter (UNA). (NI is capable of transmitting is Digital's alias for packets as small as 28 Ethernet.) The Digital bytes, which take 2.24 Ethernet-to-XMI network microseconds to transmit. adapter, known as DEMNA, Therefore, FDDI can carry is Digital's most recent approximately 450,000 Ethernet adapter.[7] minimum-size packets Let us choose the maximum /s. The largest packet throughput rate expressed that FDDI can carry is in packets per second as a 4508 bytes. The ANSI/IEEE performance metric for LAN 802.5 standard defines the adapters. The historical FDDI operation; Digital perspective shows that has developed its own the first adapter to implementation of the meet the Ethernet packet- FDDI base technology as carrying capacity is the a superset of the ANSI DEMNA. Therefore, it took standard.[3] approximately eight years Our problem was to and six generations for architect an adapter that an Ethernet adapter to could interface XMI, i.e., achieve this throughput a parallel high-bandwidth rate. Consequently, many CPU bus for VAX computers, designers thought that to a serial fiber-optic our goal of meeting the networking bus. To avoid ultimate FDDI packet- being the bottleneck in carrying capacity was a system, such an adapter impossible. must be able to transmit or receive 450,000 packets/s. Digital Technical Journal Vol. 3 No. 3 Summer 1991 3 The Architecture and Implementation of a High-performance FDDI Adapter But the DEMFA architecture, a first generation FDDI adapter, e.g., DEMNA. In data link layer adapter such a design, a CPU in the architecture, can meet adapter operates on every the maximum FDDI packet- transmitted and received carrying capacity. In packet. Thus, using this this sense, the DEMFA traditional architecture architecture is ultimate. to build an ultimate FDDI adapter would require a CPU Traditional Adapter capable of handling 450,000 Architectures packets/s. To predict the performance of such a CPU, In this section, we we extrapolated from the analyze the traditional performance data of the adapter architecture and CPU used in DEMNA.[7] show that by using this This traditional adapter architecture we could can handle approximately not meet our performance 15,000 packets/s using a goal. Figure 1 is a block CPU rated at 3 VAX units of diagram of a traditional performance (VUPs). 4 Digital Technical Journal Vol. 3 No. 3 Summer 1991 The Architecture and Implementation of a High-performance FDDI Adapter If we assume a linear stages to proceed in an model to extrapolate the asynchronous fashion. performance of a CPU from o The architecture DEMNA to DEMFA, an ultimate requires a packet- FDDI adapter would require filtering capability at least a 90-VUP CPU. Such in the pipelined stage a CPU was neither available nearest to the FDDI nor cost-effective for ring; this capability timely shipment of our helps to minimize adapter. Besides, it would adapter and host be extravagant to use a resource utilization. 90-VUP CPU in an adapter o The architecture whose host CPU may have specifies the DEMFA a performance as low as port, which minimizes 3 to 4 VUPs. Therefore, the information transfer we looked for a different required to interact solution. with the host operating system. This interaction DEMFA Architecture takes place during both The DEMFA architecture initialization and is characterized by the the normal operation following specifications of receiving and for functionality and transmitting packets. the means to achieve this In the following sections, functionality: we elaborate on different o As mentioned earlier, features of the DEMFA the DEMFA architecture architecture. implements all functions Pipelined Architecture with at the physical layer No CPU Interference and a major subset of Once we determined that the the functions at the traditional architecture data link layer. of a CPU processing the o The architecture packets could not meet requires that this our performance goal, functionality be we began to investigate implemented in pipelined alternative architectures. stages, which are The requirement was used to receive and to either process one transmit packets over receive packet or queue the FDDI ring without one transmit packet in a CPU interference. time period less than or o The DEMFA architecture equal to the time it takes specifies a ring to transmit on an FDDI interface for ring. Thus, the device we communicating between architected must process the pipelined stages. 28-byte packets in less Rings operate as queues than 2.24 microseconds. A that allow buffering little thought will show between pipelined that if we are able to stages, enabling these meet the requirements for Digital Technical Journal Vol. 3 No. 3 Summer 1991 5 The Architecture and Implementation of a High-performance FDDI Adapter processing small packets is also responsible for at the FDDI bandwidth, then capturing the token on the the requirements for larger FDDI ring, transmitting packets can be easily met. packets, and implementing Our final choice was the physical layer, e.g., a three-stage pipeline media access control (MAC), approach which broke functionality required by down the complexity of the FDDI standard. implementation while The REM stage is meeting our performance responsible for goal. As shown in Figure distributing packets 2, the three stages of the received over the FDDI pipeline in the adapter are ring to the host computer the FDDI corner and parser and to the AM. This stage (FCP) stage, the ring entry also collects the packets mover (REM) stage, and from the host and the the host protocol decoder AM to queue for FDDI (HPD) stage. Figure 2 also transmission. shows two other functions The HPD stage interfaces required of the adapter: with the XMI bus to move the buffering of packets, received packets from PBM which requires a memory to the host memory and to called the packet buffer move transmit packets from memory (PBM) and a memory the host memory to the PBM. interface called the packet memory interface (PMI); The PBM stores the packets and the local intelligence, received over the FDDI also called the adapter ring and the packets to manager (AM). be transmitted over the DEMFA Functions FDDI ring. It also stores the control structures This section presents required for accessing brief descriptions of these packets. The PMI the DEMFA functions and arbitrates the requests the pipelined stages in made by the three pipelined which these functions are stages and the AM to access performed. This, according the PBM. to our definition, is the DEMFA architecture. A later section, One Implementation of the DEMFA Architecture, describes an implementation in detail. The FCP stage converts serial photons on the FDDI ring into packets and then writes the packets into PBM longwords, 32 bits at a time. The parser implements the logical link control (LLC) filtering functionality. This stage 6 Digital Technical Journal Vol. 3 No. 3 Summer 1991 The Architecture and Implementation of a High-performance FDDI Adapter The AM implements the By dividing the processing functionalities of self- of a packet over the test and initialization three stages and the ring in the adapter and also a interfaces used to queue subset of the SMT function packets between these required by the ANSI FDDI stages, we reduced the specification.[8] The complexity of the total adapter manager performs adapter functionality. no function in either the Any implementation of this receipt or transmission of architecture specification individual packets to the would consist of three host. loosely coupled designs that use ring interfaces We use ring interfaces to communicate with one to communicate within the another. adapter and between the Each stage must process a adapter and the host. These packet in less time than interfaces are described it takes to transmit the in detail immediately packet on the FDDI ring. following the next section. As we mentioned previously, Performance Constraints on this transmission time the Pipelined Stages is 2.24 microseconds for Consider the three the smallest packet. A pipelined stages and their larger packet may take ring interfaces. At any longer to process than a time, the three independent small packet, but such a stages are processing packet also takes longer to different packets. Thus, if transmit on the FDDI ring. the HPD stage is processing Thus, to meet our received packet 0, the REM performance goal, we stage may be working on architected a three-stage received packet 4 and the pipeline implementation, FCP on received packet 7. with each stage meeting Note that packets 1, 2, and a packet-processing time 3 wait on a ring between dependent upon the packet the REM stage and the HPD size. In addition, our stage. Similarly packets architecture specified a 5 and 6 wait on a ring PBM with sufficient memory between the FCP stage and bandwidth to service the the REM stage. The PBM must asynchronous requests have enough bandwidth to from the three stages with service the three stages. minimal latency. It also must service them Ring Interface-The Core of with low latency so that the DEMFA Architecture the first-in, first-out (FIFO) buffers in the FCP The ring interface forms stage do not overflow. the core of the DEMFA architecture. An interface is necessary to exchange data between the adapter and the host computer and also between the different Digital Technical Journal Vol. 3 No. 3 Summer 1991 7 The Architecture and Implementation of a High-performance FDDI Adapter stages and functional units Rings are divided into of the adapter. Such an entries that consist of interface usually consists several bytes each; the of a data structure and a number of bytes in an entry protocol for communication. is an integral multiple We evaluated various data of longwords. A ring, structures, including a in turn, must contain an linked list or queue data integral number of entries. structure, and found that The entry size and the a ring data structure is number of entries in a ring efficient to manipulate and determine the ring size. We would be easy to implement chose an entry size that is in state machines, if a power of two in bytes and desirable. the number of ring entries Implementation of Ring to be divisible by two, as Structures. Ring structure well. These choices helped implementation requires to simplify the hardware a set of consecutive implementation used to memory addresses, as peruse these rings. shown in Figure 3. The Each entry consists of ring begin pointer and the ring end pointer define o An ownership bit, which the beginning and end indicates whether the of a ring. Two entities, transmitter interface or the transmitter and the the receiver interface receiver, interface with owns the entry a ring to exchange data. o Buffer pointers, which The transmitter interface point to transmitted or delivers data to the received data receiver interface using o A buffer descriptor, the ring structure. This which contains the data resides in memory length of the buffers, that is managed by one and status and error of the two interfaces. If fields the transmitter interface manages the memory, the The definitions of these ring is called a transmit fields in an entry and ring. If the receiver the rules for using the interface manages the information in these memory, the ring is called fields constitute the ring a receive ring. protocol. Only the interface that owns an entry has the right to use all the information in that entry. This right includes using the buffer pointers to operate on data in the buffers. Both interfaces have the right to read the ownership bit, but only the interface with 8 Digital Technical Journal Vol. 3 No. 3 Summer 1991 The Architecture and Implementation of a High-performance FDDI Adapter ownership may write this The unit of data exchanged bit. between the transmitter The two interfaces can interface and the receiver exchange entries by interface is a packet. A toggling the ownership packet may be written in bit. After toggling this a single buffer if the bit, the transmitter and packet is small or over receiver interfaces need to multiple buffers if the prod each other to indicate packet is large. In this that the ownership bit paper, we use the term has been toggled. This buffer to refer generically is accomplished using two to buffers in the adapter hardwired Boolean values, or in the host. The buffers by means of an interrupt, in the adapter are always or by writing a single- 512 bytes in size and, when bit register. Hardwired referred to specifically, Boolean values are used are called pages. The when both the transmitter buffers in the host may and the receiver are on be of different sizes. the adapter. Either the An exchange of data interrupt scheme or the requires single or multiple method of writing a single- buffers, depending upon the bit register is used packet and buffer sizes. when the transmitter and One field of two bits receiver converse over an in the buffer descriptor external bus, e.g., an XMI is used to designate bus. the beginning and end The word "signal" is used of packet. These bits henceforth to represent the are called the start of prodding of one interface a packet (SOP) and the by the other. A transmitter end of a packet (EOP). interface uses "transmit Thus, for a one-buffer done" to signal the packet both the SOP and receiver interface that the EOP are asserted. For data has been transmitted. a multiple-buffer packet, A receiver interface uses the first buffer has the "receive done" to signal SOP asserted, the middle the transmitter interface buffers have both the SOP that the data has been and the EOP deasserted, received. Note that we and the last buffer has the have defined the DEMFA port EOP asserted. The buffer protocol in such a way that descriptor also contains the number of interrupts fields that we do not used to signal the host describe in this paper. across XMI is minimized to Data Exchange on a Transmit reduce the host performance Ring. Data exchange between degradation caused by a transmitter interface interrupts. and a receiver interface is accomplished in a similar manner on both transmit and receive rings. Therefore, we discuss the exchange in Digital Technical Journal Vol. 3 No. 3 Summer 1991 9 The Architecture and Implementation of a High-performance FDDI Adapter detail for a transmit ring; interface writes a single for a receive ring, we note entry and then toggles the only the dissimilarities. ownership bit and signals The events that occur the receiver interface. during the data exchange on For multiple buffers, a transmit ring are shown the transmitter interface in Figure 4. The process is increments the fill pointer as follows. The transmitter and repeats the two steps interface manages the described in the previous memory used to exchange paragraph to write all data and has two pointers the buffer addresses and to the ring entries, the length and status i.e., the fill pointer information. Then the and the transmitter free transmitter interface pointer. The transmitter toggles the ownership bits interface uses the fill of all later entries of pointer to deliver data the multiple buffers before to the receiver interface. toggling the ownership bit The transmitter interface of the first entry. This uses the transmitter free protocol preserves the pointer to recover and atomicity of the packet manage the buffers freed transfer between the by the receiver interface. transmitter and receiver The receiver interface uses interfaces. Then the only one pointer, i.e., transmitter interface the receive pointer, which signals the receiver points to the next entry interface that a packet is that the receiver interface available on the transmit interrogates to receive ring. This signal alerts data. the receiver interface, To understand how data is which then examines the transmitted, assume that entry pointed to by the the pointers move from receive pointer. The top to bottom, as shown in receiver interface operates Figure 4. Initially, all on the entry data if it the pointers designate the owns the entry. location indicated by the The receiver interface begin pointer. returns the entries to the A transmitter that has data transmitter interface by to transmit to a receiver toggling the ownership bits uses the entry indicated and then signals receipt by the fill pointer. First, of data to indicate the the transmitter verifies return of the entries that it owns the entry by (and hence the free checking the ownership bit. buffers). Note that there Second, the transmitter is no need to return writes the buffer address these free buffers in a and the remaining fields packet, atomic fashion. in the entry. In the The transmitter interface case of a single buffer uses the transmitter free packet, the transmitter pointer to examine the 10 Digital Technical Journal Vol. 3 No. 3 Summer 1991 The Architecture and Implementation of a High-performance FDDI Adapter ownership bits in the entry and to reclaim the buffers. The interfaces operate asynchronously, since each one can transmit or receive data at its own speed. If the transmitter interface can transmit faster than the receiver interface is able to receive, the transmit ring fills up. Under such circumstances, the receiver interface owns all the entries in a transmit ring, the fill pointer equals the transmitter free pointer, and data transmission stops. Conversely, if the receiver interface is faster than the transmitter interface, the transmit ring will be nearly empty. In this case, the transmitter free pointer and the receive pointer are almost always equal. Note the following invariants that apply to the pointers when data is exchanged on a transmit ring: the fill pointer cannot pass the transmitter free pointer; the transmitter free pointer cannot pass the receive pointer; and the receive pointer cannot pass the fill pointer. Digital Technical Journal Vol. 3 No. 3 Summer 1991 11 The Architecture and Implementation of a High-performance FDDI Adapter Data Exchange on a Receive pointers, the receiver Ring. As also shown in free pointer and the Figure 4, the operation of receive pointer, and the data exchange on a receive transmitter interface has ring is similar to that only one pointer, the fill operation on the transmit pointer. ring, with the following differences. The receiver Table 1 shows the various interface manages the DEMFA rings and the memory used for exchanging transmitters and receivers data. Consequently, the that interface with each receiver interface has two ring. Table 1 DEMFA Rings and Their Transmitter and Receiver Interfaces ___________________________________________________________________ Rings____________Transmitter___Receiver______Remarks_______________ ___________________Rings_in_Packet_Buffer_Memory___________________ RMC Receive FDDI Corner Ring Entry Contains data that Ring and Parser Mover Stage originated on the Stage FDDI ring. RMC Transmit Ring Entry FDDI Corner Contains data that Ring Mover Stage and Parser originated at the Stage host or the AM, destined for the FDDI ring. HPD Receive Host Ring Entry Contains data that Ring Protocol Mover Stage originated at the Decoder host, destined for Stage the FDDI ring. HPD Transmit Ring Entry Host Contains data that Ring Mover Stage Protocol originated at the Decoder FDDI ring, destined Stage for the host. AM Receive Ring Adapter Ring Entry Contains data that Manager Mover Stage originated at the AM, destined for the FDDI ring or the host. AM Transmit Ring Entry Adapter Contains data that Ring Mover Stage Manager originated at the FDDI ring, destined for the AM. 12 Digital Technical Journal Vol. 3 No. 3 Summer 1991 The Architecture and Implementation of a High-performance FDDI Adapter Table 1 (Cont.) DEMFA Rings and Their Transmitter and Receiver Interfaces ___________________________________________________________________ Rings____________Transmitter___Receiver______Remarks_______________ ___________________________________________________________________ ________________________Rings_in_Host_Memory_______________________ Host Receive Host Host Contains data that Ring Protocol originated at the Decoder FDDI ring or the Stage AM, destined for the host. Host Transmit Host Host Contains data that Ring Protocol originated at the Decoder host, destined for Stage the FDDI ring. Command Ring Host Adapter Contains commands (Transmit Ring) Manager that originated at the host for the AM; Note that the AM replies in the same ring. Unsolicited Adapter Host Manager Contains unsolicited Ring (Receive messages from the AM Ring)________________________________________to_the_host.__________ Subsystem Level The implementation of the Functionality complex CMT algorithm in The basic functions that an adapter requires an an FDDI LAN adapter is intelligent component, such required to perform are as a microprocessor, that receiving and transmitting can receive, interpret, packets over the FDDI ring. and transmit packets. Note The adapter must be able that the number of CMT to establish and maintain packets that flow over the connection to the FDDI FDDI ring constitutes only network. The connection a small fraction of the management (CMT) protocol, normal traffic. Therefore, a subset of the station a low-performance CPU management (SMT) protocol, is adequate to implement specifies the rules for connection management. The this connection.[8] CPU in the DEMFA device is called the adapter manager. Digital Technical Journal Vol. 3 No. 3 Summer 1991 13 The Architecture and Implementation of a High-performance FDDI Adapter The packets in the receive The DEMFA port specifies stream that originated the data structure on the FDDI ring and are and protocol used for addressed to this host or communication between adapter (together called the adapter and the host the node) can take one of computer. Rather than the following paths: invent a new protocol, o Packets not addressed to we modified the DEMNA this node are forwarded port specification.[7] over the FDDI ring. The data structure used to pass information between o Packets addressed to the host and the adapter this node are delivered is a ring structure. to the host computer. Such structures are more o Packets addressed to efficient to traverse than this node are delivered queue structures. to the AM. The DEMFA port defines the The delivery of packets to four separate host rings the host computer implies listed in Table 1: that the adapter has a o The host receive ring, pointer to a free memory which contains pointers buffer in which to deposit to free buffers into the received packet. The which a packet received DEMFA port, described in over the network can be the next section, specifies deposited the rules for extracting o The host transmit ring, free buffer pointers from which contains pointers the host memory. to filled buffers For each packet that the from which packets are host needs to transmit, removed and transmitted the adapter must know over the FDDI ring by the buffer address or the adapter addresses and the extent o The host command ring, of each buffer. The DEMFA which sends commands to port defines the method the AM to exchange this buffer information. In addition, o The unsolicited ring, the host and the adapter which the AM uses to microprocessor must be able initiate communication to exchange information. with the host CPU The DEMFA port defines By using four host rings, the protocol for this we differentiated between communication also. the fast and frequent DEMFA Port data movement to and from the FDDI ring and the comparatively slow and infrequent data movement required for communication with the AM. 14 Digital Technical Journal Vol. 3 No. 3 Summer 1991 The Architecture and Implementation of a High-performance FDDI Adapter One Implementation of the The three pipelined stages DEMFA architecture and the memory refresh Previous sections specified circuitry use the packet the DEMFA architecture. memory interface (PMI) The remainder of this paper to access PBM. The PMI describes an implementation arbitrates and prioritizes of the DEMFA architecture. the requests for memory In the following sections, access from these four we present details of requesters. Physically, the the implementation for PMI has three interfaces: the packet buffer memory the FCP stage, the REM and the packet memory stage, and the HPD stage. interface; the three Virtually, the PMI has pipelined stages, FCP, REM four interfaces; the HPD and HPD; and the adapter interface multiplexes manager. traffic from both the host and the adapter manager. Packet Buffer Memory and The PMI also has the Packet Memory Interface functionality to refresh The packet buffer memory the dynamic memory and to stores the data received implement a synchronizer over the FDDI ring before between the 80-nanosecond delivering this data to the FDDI clock and the 64- host. The PBM also stores nanosecond XMI clock. data from the host before All interfaces request transmitting over the FDDI access to the memory by ring. invoking a request/grant PBM consists of two protocol. Some accesses memories: the packet are longword (4-byte) buffer data memory and the transactions that require packet buffer ring memory. one to two memory cycles; Virtually, the packet others are hexaword (32- buffer data memory divides byte) transactions and into seven areas-one used require a burst of memory by the AM and three each cycles. for data reception and The interfaces have the data transmission to and following priorities: (1) from the three external refresh memory circuitry, interfaces. These three (2) the REM stage, (3) interfaces are the FCP the FCP stage, and (4) stage, the HPD stage, the HPD stage. The refresh and the AM. The areas are memory circuitry has the accessed and managed by the highest priority because six rings residing in the data loss in the dynamic packet buffer ring memory memory is disastrous. Also and listed in Table 1. the refresh circuitry makes Note that the division a request once every 5 is considered virtual to 10 microseconds, thus because the physical memory ensuring that the lower locations of the areas priority requesters always change over time. have access to the memory. The REM has the second Digital Technical Journal Vol. 3 No. 3 Summer 1991 15 The Architecture and Implementation of a High-performance FDDI Adapter highest priority because it always requests one occurs if memory access is longword, which requires denied for a theoretically one memory cycle. Once infinite amount of time. the REM receives data, Our adapter design has by design it waits at mechanisms that guarantee least two cycles before memory access to the HPD. making the next request. FDDI Corner and Parser Thus, the REM does not Stage monopolize the memory, and the FCP can always The FCP stage, illustrated get its requests serviced. in Figure 5, provides the The FCP stage requires interface between the guaranteed memory bandwidth FDDI ring and the packet with small latency to avoid buffer memory. This stage an overflow or underflow can receive or transmit condition in its FIFOs. the smallest packet in Finally, the HPD interface 2.24 microseconds, as has the lowest priority required by our performance because no data loss constraints. The receive stream in as a stream of photons. this stage converts the This stage can generate and incoming stream of photons append 16 bytes of cyclic from the FDDI ring into a redundancy code (CRC) serial bit stream using the to every packet before fiber-optic transceiver transmitting. (FOX) chip. The clock The parser component of and data conversion chip this stage interfaces with then recovers the clock the RMC bus to generate a and converts the incoming forwarding vector that has code from 5 to 4 bits. The a variety of information MAC chip converts this including the data link electronic serial bit user identity and the stream to a byte stream. destination of the packet, The MAC chip implements a i.e., the host or the superset of the ANSI MAC AM. The parser extracts standard.[9] Digital has a packet headers from the specific implementation of RMC bus and operates on the MAC chip.[3] The ring the FDDI and the LLC parts memory controller (RMC) of the packet headers. interfaces with the byte- The parser then processes wide stream from the MAC, this information in real converts the bytes into 32- time, using a content- bit words, and writes these addressable memory (CAM) words to the PBM, using the that stores the profiles of RMC receive ring and the data link and other users. ring protocol. As a result, the parser The transmit stream generates a forwarding accesses a packet from vector that contains the the PBM, waits for the destination address of token on the FDDI ring, either the host user or and transmits the packet the AM user. The forwarding 16 Digital Technical Journal Vol. 3 No. 3 Summer 1991 The Architecture and Implementation of a High-performance FDDI Adapter The ring entry mover stage performs four major functions: (1) moving filled packets from receive vector destination field rings to transmit rings, is given a "discard" value, (2) returning free packets if the packet header does from transmit rings to not match any user profile. receive rings, (3) managing Note that the forwarding buffers, and (4) collecting vector is a part of the statistics. Figure 6 shows buffer descriptor field in the various rings, the the RMC receive ring. ring entry mover, and the movement of filled and free Ring Entry Mover Stage packets. The REM moves filled the color field, a subset packets from receive of the buffer descriptor rings to transmit rings field. The color field by copying pointers rather contains color information than copying data. (Copying that designates the pointers is a much faster receive ring to which the operation than data copy.) buffers belong. This color Note in Figure 6 that for information is written into a given interface, no the buffer descriptors of filled packet moves from the free buffers during its receive ring to its initialization. Note that transmit ring. For example, during initialization, the no filled packet moves from adapter free buffers in the the RMC receive ring to the PBM are allocated to the RMC transmit ring. Also, three receive rings with in this design there is no which the REM interfaces. need for a path from the The REM also performs HPD receive ring to the AM buffer resource management. transmit ring. Note that a reserved A second function performed pool of buffers exists by the REM stage is to for traffic arriving return free packets from over the FDDI ring. This the transmit rings to FDDI traffic has two the proper receive rings. destinations, namely the Transmit rings point to host CPU and the adapter free packets after the manager. To ensure that receiver interface has one destination does consumed the information in not monopolize the pool the packet. The REM, which of buffers, the pool is is a transmitter interface divided into two parts: on all transmit rings in host allocation and the PBM, owns these buffers AM allocation. The REM after the appropriate delivers no more than the receiver interface toggles allocated number of buffers the ownership bit. The to one destination. REM returns the buffers to the original receive ring by using information in Digital Technical Journal Vol. 3 No. 3 Summer 1991 17 The Architecture and Implementation of a High-performance FDDI Adapter The host protocol decoder interfaces with the XMI bus, fetches and interprets entries from the host receive and transmit rings, and moves data between the host and the PBM. The fourth major function This stage also acts as that the REM performs is a gateway for the AM to get to collect statistics. The to the host memory or to REM collects statistics the PBM. in discard counters for Figure 7 is a block packets that cannot be diagram of the HPD stage. delivered due to lack The receive and transmit of resources. The REM pipelines store and interrupts the AM when retrieve receive and these counters are half transmit data from the host full. The AM reads, memory. The two pipelines processes, and stores these work in parallel. We now counters for statistical explain the operation purposes. The AM read of the receive pipeline operation resets these in detail. The transmit counters. There are a pipeline operates in a number of other counters similar manner; thus, in REM. we highlight only the Host Protocol Decoder Stage differences. HPD Receive Pipeline. The interlocks to signal the receive pipeline has three completion of work. stages: (1) the fetch and The fetch and decode host decode host receive entry receive entry stage has stage, (2) the data mover knowledge of the format stage, and (3) the receive and size of the ring and buffer descriptor write sequentially fetches host stage. Most pipelines work receive ring entries. If in a lockstep fashion; the adapter does not own an that is, each stage takes entry, this stage waits the same amount of time for a signal from the to process input. In our host before fetching the design, the processing time entry again. If the adapter varies for each stage in does own the entry, this the pipeline. For example stage decodes the entry the data mover stage will to determine the address take a much longer time to of the free buffer in the transfer 4500-byte packets host memory and the number than to transfer 100-byte of bytes in the buffer. packets. The fetch and The stage then passes decode host receive entry this buffer information stage, on the other hand, to the data mover stage may take the same amount and the address of the host of time to decode entries entry to the receive buffer for packets of either size. descriptor write stage. Consequently, stages use In addition, this stage 18 Digital Technical Journal Vol. 3 No. 3 Summer 1991 The Architecture and Implementation of a High-performance FDDI Adapter prefetches the next entry return the free buffer to to keep the pipeline full, the ring of origin. in case data is actively HPD Transmit Pipeline. The received over the FDDI HPD transmit and receive ring. pipelines are symmetrical. In parallel, the PMI The HPD receive pipeline interface stage part of the delivers data from the HPD chip fetches the next HPD transmit ring to the entry from the HPD transmit host receive ring. The HPD ring. Decoding this entry transmit pipeline delivers determines the address of data from the host transmit the buffer in the PBM and ring to the HPD receive the amount of data in the ring. buffer. The packet buffer There is one exception to bus interface passes the the symmetry. The transmit buffer address and length pipeline does not fetch an information to the data entry from the HPD receive mover stage and the address ring in PBM to determine of the HPD transmit ring if there are enough free entry to the receive buffer buffers available. A descriptor write stage. hardware interface between Now, the data mover stage the PMI and the HPD, i.e., has pointers to the host a Boolean signal, indicates free buffer and its extent whether there are enough and to the PBM filled buffers to accommodate buffer and its extent. the largest possible size The stage proceeds to move transmit packet. This the data from the PBM to exception is an artifact the host memory over the of our implementation; XMI bus. Depending on we wanted to reduce the the XMI memory design, accesses to the PBM, since this transfer involves its bandwidth is a scarce octaword or hexaword resource. bursts. The process of Adapter Manager moving data continues until the depletion of packet The local intelligence, data in the PBM. also known as the adapter The data mover stage manager, implements various signals the receive buffer necessary adapter functions descriptor stage when including self-test and the packet moving is the initialization. The complete. The receive AM also implements part of buffer descriptor stage the CMT code that manages writes in the status fields the FDDI connection.[10] In of the host receive ring addition, the AM interfaces entry and the HPD transmit with the host to start and ring entry. This stage stop data link users by also gives ownership of the dynamically manipulating filled buffer to the host the parser data base. and of the free buffer to the REM. The REM can then Digital Technical Journal Vol. 3 No. 3 Summer 1991 19 The Architecture and Implementation of a High-performance FDDI Adapter Tracing a Packet Through the The stage determines if Adapter packet P is addressed to The major steps for data this node, forwards the transfer incorporate the packet on the FDDI ring, subfunctions previously and copies the packet discussed. This section for this adapter if it traces the path of a packet is addressed to this node. P through the adapter, This stage also generates first on the receive stream a CRC for the packet. The and then on the transmit FCP stage then deposits the stream. We assume that copied packet into the free adapter initialization is buffer in the RMC receive complete and that all data ring entry shown in Figure structures in the packet 8(b). memory and parser data After depositing the base are properly set. In complete packet, this this example, we further stage writes the buffer assume that packet P is descriptor and toggles the small enough to fit into ownership bit. The ring a single buffer. Large entry mover now owns packet packets require multiple P. The FCP stage is free buffers. to receive the next packet, Receive Stream which is stored in the next buffer in the RMC receive A packet destined for ring. the host passes through Ring Entry Mover Stage. The the three major pipelined REM extracts the packet stages in the adapter. A buffer descriptor and brief description of the determines the number intrastage operation and of pages in packet P. details of the interstage This stage also has an functioning follow. The account of the number of four parts of Figure 8 pages outstanding on the illustrate the receive HPD transmit ring. The process. REM delivers packet P FDDI Corner and Parser to the HPD transmit ring Stage. Figure 8(a) shows provided the host resource packet P on the FDDI ring; allocation is not exceeded. the packet is actually a The REM delivers the packet stream of photons. This by copying page pointers stage converts the stream from the RMC receive ring of photons into a packet. to the HPD transmit ring, At this point, a free as shown in Figure 8(c). buffer is available for Note that the HPD transmit packet P in both the RMC ring is large enough to receive ring and the host write all pointers from the receive ring. The FCP stage RMC receive ring and the AM owns the free buffer in the receive ring. The REM then RMC receive ring. transfers ownership of the HPD transmit ring entry to the HPD stage and the RMC 20 Digital Technical Journal Vol. 3 No. 3 Summer 1991 The Architecture and Implementation of a High-performance FDDI Adapter receive ring entries to the FCP stage. HPD Stage. The HPD receive pipeline operates on a packet it owns in the HPD Ring Entry Mover Stage. The transmit ring. As shown in REM moves the packet from Figure 8(d), after fetching the HPD receive ring to the the address of the free RMC transmit ring. Again, host buffer, this pipeline the REM copies pointers moves packet P from the from ring to ring and PBM to the host memory toggles the ownership bit and toggles the ownership on the RMC transmit ring. bit of the host entry. Simultaneously, the HPD FDDI Corner and Parser returns ownership of the Stage. Although the packet free buffers in the HPD is available in PBM for transmit ring to the ring transmission, the FCP stage entry mover stage. The REM must receive a token before returns these buffers to transmitting over the FDDI the RMC receive ring as ring. Once the transmission free buffers. is complete, the buffer Transmit Stream on the RMC transmit ring is now free. The FCP stage To transmit data from the returns ownership of the host transmit ring to the buffer to the REM, which FDDI ring, the packet must then returns the free pass through the same three buffer back to the HPD stages as for the receive receive ring or the AM stream, but in the reverse receive ring, depending direction. upon the origin. Again, the HPD Stage. For the receive free buffers are returned stream, the HPD receive by copying buffer pointers. pipeline prefetches the The receive and transmit free buffer from the host streams for the adapter receive ring. In contrast, manager are similar the HPD transmit pipeline to those for the host; must wait for the host to therefore, we do not fill the transmit buffer describe these processes. and transfer ownership to the host transmit ring. The Hardware and Firmware HPD stage then moves the Implementation data from the host memory to the PBM if the hardwired The hardware implementation signal between the REM and of DEMFA consisted of the HPD indicates that a four large gate arrays, sufficient number of pages custom very large-scale is available. Finally, the integration (VLSI) chips, HPD transfers ownership dynamic and static random of the host transmit ring access memories (RAMs), and entry to the host and the jelly bean logic. Figure HPD receive ring entry to 9 is a photograph of the the REM. DEMFA board. Digital Technical Journal Vol. 3 No. 3 Summer 1991 21 The Architecture and Implementation of a High-performance FDDI Adapter Photo of DEMFA Board Table 2 shows various gate arrays, the total The four gate arrays gate count for each gate specified and designed array, and the percentage by the group are the of control gates and parser, the adapter manager data path gates. Control interface (AMI), the host gates are defined as gates protocol decoder, and the required for implementing packet memory controller state machines used for (PMC), which incorporated control. Data path gates the function of the packet are gates required for memory interface and the registers and multiplexors, ring memory controller. We for example. Note that now describe aspects of the the complexity of gate gate array development. arrays is proportional to Note that we used the the percentage of control Compacted Array technology gates. The gate arrays in developed using LSI logic Table 2 were fairly complex for our implementation. The because they consisted of gate arrays have 224-pin approximately 50 percent surface mount packaging. control gates. Table 2 Gate Counts for DEMFA Gate Arrays ___________________________________________________________________ Data Gates Control Gates (Percent of (Percent of Gate_Array_______Total_Gates______total)___________total)__________ Parser 20296 39 61 PMC 61537 40 60 HPD 81265 34 66 AMI______________15002____________49_______________51______________ Module Implementation our module. SPICE modeling We used the 11-by-9-inch helped in arriving at a XMI module for implementing correct module design with the adapter. Early in the the first fabrication. project we defined the pin The design was thorough functions for various gate and completed early in the arrays. Once these were project. defined we could design Firmware Implementation 22 Digital Technical Journal Vol. 3 No. 3 Summer 1991 The Architecture and Implementation of a High-performance FDDI Adapter The DEMFA firmware has timely fashion. For more three major functions: detailed performance data, self-test, FDDI management see the paper entitled (using Common Node "Performance Analysis of a Software), and adapter High-speed FDDI Adapter" in functional firmware. The this issue of the Digital DEMFA team implemented Technical Journal.[11] the adapter functional firmware while other groups Conclusion designed the two remaining components. The DEMFA The goal of the DEMFA functional firmware can project was to specify an initialize the adapter architecture for an adapter and then interact with that would be at least the host to start and stop 30 times faster than any data link layer users, previously built adapter. as well as perform other The architecture also had functions. The firmware to be easy to implement. is implemented in the C This paper describes language for the Motorola the architecture and an 68020 system. The total implementation of DEMFA. image size is approximately Performance measurements 160 kilobytes. of the adapter show that this first implementation Performance successfully meets close to the maximum FDDI The graph presented in throughput capacity; thus, Figure 10 shows the adapter the DEMFA performance can performance for the receive be considered ultimate. and transmit streams at Already, a number of the adapter hardware level adapters have been designed for this implementation. based on ideas borrowed The data represents from the DEMFA architecture throughput measured in and implementation. In a megabits per second as few years, architectures a function of packet similar to this one may size measured in bytes. become the norm for data Figure 10 illustrates that link and even transport the receive and transmit layer adapters, rather than streams meet the 100-Mb/s the exception. throughput when the packet size is approximately 69 Acknowledgements bytes. The bottlenecks in this implementation I wish to acknowledge of the DEMFA architecture and thank my manager, are (1) the PMI and (2) Howard Hayakawa, who out of the combination of the nowhere presented me with XMI interface, bus, and the challenge of defining memory. We implemented an architecture and an these interfaces in a implementation for an FDDI conservative manner to adapter that would have a reduce our risks and to performance 30 times that produce the product in a of any existing adapter. Digital Technical Journal Vol. 3 No. 3 Summer 1991 23 The Architecture and Implementation of a High-performance FDDI Adapter I must have taken leave adapter was ready. Also, of my senses to take on I would like to thank VMS such a challenge. But the and ULTRIX group members end results were worth the Dave Gagne, Bill Salkewicz, effort. Dick Stockdale, and Fred I would also like to Templin. thank Gerard Koeckhoven, who agreed to be the References engineering manager for this adapter project. He 1. B. Allison, "An Overview took on the consequent of the VAX 6200 Family challenge and the risk of Systems," Digital and supported me all along Technical Journal, no. 7 the way. In addition, I (August 1988): 10-18. want to recognize Mark 2. D. Fite, Jr., T. Fossum, Kempf for the many hours he and D. Manley, "Design spent helping us during the Strategy for the VAX conceptualization period 9000 System," Digital and for chiseling our Technical Journal, vol. design. The TAN architects 2, no. 4 (Fall 1990): were of great assistance 13-24. in making sure that our adapter met the FDDI 3. H. Yang, B. Spinney, standard. and S. Towning, "FDDI I also wish to acknowledge Data Link Development," the following members of Digital Technical the DEMFA project team Journal, vol. 3, no. for their contributions: 2 (Spring 1991): 31-41. Santosh Hasani and 4. A. Tanenbaum, Computer Ken Wong, who designed Networks (Englewood the parser gate array; Cliffs, NJ: Prentice Dave Valley and Dominic Hall, Inc., 1981). Gasbarro, who designed 5. B. Allison, "The the AMI gate array; Andy Architectural Definition Russo and John Bridge, Process of the VAX who designed the HPD 6200 Family," Digital gate array; Ron Edgar, Technical Journal, no. 7 along with the other PMC (August 1988): 19-27. designers, Walter Kelt, Joan Klebes, Lea Walton, 6. Token Ring Access Method and Ken Wong; Ed Wu and and Physical Layer Bob Hommel, who designed Specifications, ANSI and implemented the module; /IEEE Standard 802.5- the team that implemented 1989 (New York: The the functional firmware, Institute of Electrical Ed Sullivan, David Dagg, and Electronics Da-Hai Ding, and Martin Engineers, Inc.; 1989). Griesmer; and Ram Kalkunte, for designing and building a simulation model to accurately predict the performance well before the 24 Digital Technical Journal Vol. 3 No. 3 Summer 1991 The Architecture and Implementation of a High-performance FDDI Adapter 9. Token Ring Media Access Control (MAC), (International Standards Organization, reference no. ISO 9314-2, 1989). 10.P. Ciarfella, D. Benson, and D. Sawyer, "An 7. R. Stockdale and J. Overview of the Common Weiss, "Design of Node Software," Digital the DEC LANcontroller Technical Journal, vol. 400 Adapter," Digital 3, no. 2 (Spring 1991): Technical Journal, vol. 42-52. 3, no. 3 (Summer 1991, 11.R. Kalkunte, this issue): 36-47. "Performance Analysis 8. FDDI Station Management of a High-speed FDDI (SMT), Preliminary Adapter," Digital Draft, Proposed American Technical Journal, National Standard, ANSI vol.3, no.3 (Summer X3T9/90-X3T9.5/84-49, 1991, this issue): 64- REV. 6.2 (May 1990). 77. Digital Technical Journal Vol. 3 No. 3 Summer 1991 25 ============================================================================= Copyright 1991 Digital Equipment Corporation. Forwarding and copying of this article is permitted for personal and educational purposes without fee provided that Digital Equipment Corporation's copyright is retained with the article and that the content is not modified. This article is not to be distributed for commercial advantage. Abstracting with credit of Digital Equipment Corporation's authorship is permitted. All rights reserved. =============================================================================