DECdta - Digital's Distributed Transaction Processing Architecture By Philip A. Bernstein, William T. Emberton, Vijay Treba Abstract o Atomicity. Either all Digital's Distributed of the transaction's Transaction Processing operations execute, or Architecture (DECdta) the transaction has no describes the modules and effect at all. interfaces that are common o Serializability. The set to Digital's transaction of all operations that processing (DECtp) execute on behalf of the products. The architecture transaction appears to allows easy distribution execute serially with of DECtp products. In respect to the set of particular, it supports operations executed by client/server style every other transaction. applications. Distributed o Durability. The effects transaction management is of the transaction's the main function that ties operations are resistant DECdta modules together. to failures. It ensures that application A transaction terminates programs, database systems, by executing the commit and other resource managers or abort operation. interoperate reliably in a Commit tells the system distributed system. to install the effect Introduction of the transaction's Transaction processing operations permanently. (TP) is the activity of Abort tells the system to executing requests to undo the effects of the access shared resources, transaction's operations. typically databases. A For enhanced reliability computer system that is and availability, a configured to execute TP TP application uses applications is called a TP transactions to execute system. requests. That is, the A transaction is an application receives a execution of a set of request message (from operations on shared a display, computer, or resources that has the other device), executes following properties: one or more transactions to process the request, and possibly sends a reply Digital Technical Journal Vol. 3 No. 1 Winter 1991 1 DECdta-Digital's Distributed Transaction Processing Architecture to the originator of the request or to some other party specified by the originator. TP applications are essential to the operation of many industries, such as finance, retail, health care, transportation, government, communications, and manufacturing. Given the broad range of applications of TP, Digital offers a wide variety of products with which to build TP systems. DECtp is an umbrella term that refers to Digital's TP products. The goal of DECtp is to offer an integrated set of hardware and software products that supports the development, execution, and management of TP applications for enterprises of all sizes. DECtp systems include software components that are specialized for TP, notably TP monitors such as the ACMS and DECintact TP monitors, and transaction managers such as the DECdtm transaction manager. [1] [2] DECtp systems also require the integration of general-purpose hardware products (processors, storage, communications, and terminals) and software products (operating systems, database systems, and communication gateways). These products are typically integrated as shown in Figure 1. 2 Digital Technical Journal Vol. 3 No. 1 Winter 1991 DECdta-Digital's Distributed Transaction Processing Architecture Applications on DECtp and explains how DECdta systems can be designed components are integrated using a client/server by distributed transaction paradigm. This paradigm management. is especially useful Current versions of DECtp for separating the work products implement most, of preparing a request but not all, modules and from that of running interfaces in the DECdta transactions. Request architecture. Gaps between preparation can be done the architecture and by a front-end system, products will be filled that is, one that is over time. DECtp products close to the user, in that currently implement which processor cycles are DECdta components are inexpensive and interactive referenced throughout the feedback is easy to obtain. paper. Transaction execution can be done by a larger TP Application Structure back-end system, that is, one that manages large By analyzing TP databases and may be far applications, we can see from the user. Back-end where the need arises for systems may themselves be separate DECdta components. distributed. Each back-end A typical TP application is system manages a portion structured as follows: of the enterprise database Step 1: The client and executes applications, application interacts usually ones that make with a user (a person heavy use of the database or machine) to gather on that back end. DECtp input, e.g., using a products are modularized to forms manager. allow easy distribution Step 2: The client maps across front ends and the user's input into back ends, which enables a request, that is, a them to support client message that asks the /server style applications. system to perform some DECtp systems thereby work. The client sends simplify programming the request to a server and reconfiguration in a application to process distributed system. the request. Digital's Distributed A request may be direct Transaction Processing or queued. If direct, Architecture (DECdta) the client expects a defines the modularization server to process the and distribution structure request right away. that is common to DECtp If queued, the client products. Distributed deposits the request transaction management is in a queue from which a the main function that ties server can dequeue the this structure together. request later. This paper describes the DECdta structure Digital Technical Journal Vol. 3 No. 1 Winter 1991 3 DECdta-Digital's Distributed Transaction Processing Architecture Step 3: A server One may want to off- processes the request load presentation by executing one or services and related more transactions. Each functions to front transaction may ends, while allowing a. Access multiple programs on back ends to resources control which forms are displayed to users. This b. Call programs, some capability is useful of which may be in Steps 1, 3d, and 4 remote above to gather input c. Generate requests and display output. to execute other To ensure that the transactions presentation service and application can d. Interact with a user be distributed, the e. Return a reply when presentation service the transaction should correspond finishes to a separate DECdta Step 4: If the component. transaction produces o The client application a reply, then the client that sends a request interacts with the user and the server to display that reply, application that e.g., using a forms processes the request manager. may be distributed. Each of the above steps The applications may involves the interaction communicate through a of two or more programs. In network or a queue. many cases, it is desirable In Step 2, front-end that these programs be applications may want to distributed. To distribute send requests directly them conveniently, it is to back-end applications important that the programs or to place requests be in separate components. in queues that are For example, consider the managed on back ends. following: Similarly, in Step 3c, a transaction, T, may o The presentation enqueue a request to service that operates run another transaction, the display and the where the queue resides application that on a different system controls which form than T. To maximize to display may be the flexibility of distributed. distributing request management, request management should correspond to a separate DECdta component. 4 Digital Technical Journal Vol. 3 No. 1 Winter 1991 DECdta-Digital's Distributed Transaction Processing Architecture o Two transaction managers Interoperation of that want to run a transaction managers and commit protocol may resource managers, such be distributed. as database systems, also For a transaction to affects the modularization be distributed across of DECdta components. A different systems, as in transaction may involve Step 3b, the transaction different types of management services must be resources, as in Step 3a. distributed. To ensure that For example, it may update each transaction is atomic, data that is managed by the transaction managers on different database systems. these systems must control To control transaction transaction commitment commitment, the transaction using a common commit manager must interact protocol. To complicate with different resource matters, there is more than managers, possibly supplied one widely used protocol by different vendors. This for transaction commitment. requires that resource To the extent possible, managers be separate a system should allow components of DECdta. interoperation of these protocols. The DECdta Architecture To ensure that Having seen where the transaction managers need for DECdta components can be distributed, arises, we are now ready the transaction manager to describe the DECdta should be a component of architecture as a whole, DECdta. To ensure that including the functions they can interoperate, of and interfaces to each their transaction protocol component. should also be in DECdta. Most DECdta interfaces are To ensure that different public. Some of the public commit protocols can be interfaces are controlled supported, the part of by official standards transaction management that bodies and industry defines the protocol for consortia; i.e., they are interaction with remote "open" interfaces. Others transaction managers should are controlled solely by be separated from the Digital. DECdta interfaces part that coordinates and protocols will be transaction execution published and aligned with across local resources. industry standards, as In the DECdta architecture, appropriate. the former is called a communication manager, DECdta components are and the latter is called a abstract entities. They do transaction manager. not necessarily map one-to- one to hardware components, software components (e.g., programs or products), or execution environments Digital Technical Journal Vol. 3 No. 1 Winter 1991 5 DECdta-Digital's Distributed Transaction Processing Architecture (e.g., a single-threaded DECdta components are process, a multithreaded layered on services process, or an operating that are provided by the system service). Rather, underlying operating system a DECdta component may be and distributed system implemented as multiple platform, and are not software components, specific to TP, as shown for example, as several in Figure 2. processes. Alternatively, several DECdta components may be implemented as a single software component. For example, an operating system or TP monitor typically offers the facilities of more than one DECdta component. The following are the components of DECdta: o An application program is any program that uses services of DECdta components. o A resource manager manages resources that support transaction semantics. o A transaction manager coordinates transaction termination (i.e., commit and abort). o A communication manager supports a transaction communication protocol between TP systems. o A presentation manager supports device- independent interactions with a presentation device. o A request manager facilitates the submission of requests to execute transactions. 6 Digital Technical Journal Vol. 3 No. 1 Winter 1991 DECdta-Digital's Distributed Transaction Processing Architecture Application Program messages with the user, usually through the request We use the term application initiator. program to mean a program In principle, a request that uses the services initiator could also provided by other DECdta execute transactions components. An application (not shown in Figure 2). program could be a That is, the distinction customer-written program, between request initiators a layered product, or a and transaction servers DECdta component. is for clarity only, In the DECdta architecture, and does not restrict an we distinguish two special application from performing types of application request initiation program: request initiators functions in a transaction. and transaction servers. Architecturally, this A request initiator is amounts to saying that a DECdta component that request initiation prepares and submits a functions can execute in request for the execution a transaction server. of a transaction. To create Resource Manager a request, the request initiator usually interacts A resource manager with a presentation manager performs operations on that provides an interface shared resources. We are to a device, such as a especially interested terminal, a workstation, in recoverable resource a digital private branch managers, those that obey exchange, or an automated transaction semantics. In teller machine. particular, a recoverable A transaction server can resource manager undoes demarcate a transaction, a transaction's updates interact with one or more to the resources if the resource managers to access transaction aborts. Other recoverable resources on recoverable resource behalf of the transaction, manager activities in invoke other transaction support of transactions servers, and respond are described in the to calls from request next section. In the initiators. rest of this paper, we use "resource manager" to For a simple request, a mean "recoverable resource transaction server receives manager." the request, processes In a TP system, the most it, and optionally returns common kind of resource a reply to the request manager is a database initiator. A conversational system. Some presentation request is like a simple managers and communication request, except that while managers may also be processing the request, resource managers. A the transaction server resource manager may be exchanges one or more Digital Technical Journal Vol. 3 No. 1 Winter 1991 7 DECdta-Digital's Distributed Transaction Processing Architecture written by a customer, a A queue resource manager third party, or Digital. interface supports such Each resource manager type operations as open-queue, offers a resource-manager- close-queue, enqueue, specific interface that dequeue, and read-element. is used by application The ACMS and DECintact programs to access TP monitors both have and modify recoverable queue resource managers resources managed by as components. the resource manager. Transaction Manager A description of these A transaction manager resource manager interfaces supports the transaction is outside the scope abstraction. It is of DECdta. However, responsible for ensuring many of these resource the atomicity of each manager interfaces have transaction by telling architectures defined each resource manager by industry standards, in a transaction when to such as SQL (e.g., the commit. It uses a two- VAX Rdb/VMS product), phase commit protocol to CODASYL data manipulation ensure that either all language (e.g., the VAX resource managers accessed DBMS product), and COBOL by a transaction commit file operations (e.g., RMS the transaction or they in the VMS system). all abort the transaction. One type of resource [4] To support transaction manager that plays a atomicity, a transaction special role in TP systems manager provides the is a queue resource following functions: manager. It manages o Transaction demarcation recoverable queues, which operations allow are often used to store application programs requests. [3] It allows or resource managers application programs to to start and commit or place elements into queues abort a transaction. and retrieve them, so that (Resource managers application programs can sometimes start a communicate even though transaction to execute they execute independently a resource operation and asynchronously. For if the caller is not example, an application executing a transaction. program that sends The SQL standard elements can communicate requires this.) with one that receives elements even if the o Transaction execution two application programs operations allow are not operational resource managers and simultaneously. This communication managers communication arrangement to declare themselves improves availability and part of an existing facilitates batch input of transaction. elements. 8 Digital Technical Journal Vol. 3 No. 1 Winter 1991 DECdta-Digital's Distributed Transaction Processing Architecture o Two-phase commit states while recovering operations allow from a failure. resource managers A detailed description and communication of the DECdta transaction managers to change a manager component appears transaction's state (to in the Transaction Manager "prepared," "committed," Architecture section. or "aborted"). The serializability of Communication Manager transactions is primarily A communication manager the responsibility of the provides services for resource managers. Usually, communication between a resource manager ensures named objects in a serializability by setting TP system, such as locks on resources accessed application programs and by each transaction, and transaction managers. Some by releasing the locks communication managers after the transaction participate in coordinating manager tells the resource the termination of a manager to commit. (The transaction by propagating latter activity makes the transaction manager's serializability partly two-phase commit operations the responsibility of as messages to remote the transaction manager.) communication managers. If transactions become Other communication deadlocked, a resource managers propagate manager may detect application data and the deadlock and abort transaction context, one of the deadlocked such as a transaction transactions. identifier, from one node The durability of to another. Some do both. transactions is a A TP system can support responsibility of multiple communication transaction managers managers. These and resource managers. communication managers can The transaction manager interact with other nodes is responsible for the using different commit durability of the commit or protocols or message- abort decision. A resource passing protocols, and may manager is responsible be part of different name for the durability of spaces, security domains, operations of committed system management domains, transactions. Usually, etc. Examples are an IBM it ensures durability SNA LU6.2 communication by storing a description manager or an ISO-TP of each transaction's communication manager. resource operations and state changes in a stable (e.g., disk-resident) log. It can later use the log to reconstruct transactions' Digital Technical Journal Vol. 3 No. 1 Winter 1991 9 DECdta-Digital's Distributed Transaction Processing Architecture By supporting A forms manager is one type multiple communication of presentation manager. managers, the DECdta Just as a database system architecture enhances supports operations to the interoperability of define, open, close, and TP systems. Different TP access databases, a forms systems can interoperate manager supports operations by executing a transaction to define, enable, disable, using different commit and access forms. A form protocols. includes the definition of A communication manager the fields (with different offers an interface for attributes) that make application programs to up the form. It also communicate with other includes services to map application programs. the fields into device- Different communication independent application managers may offer records, to perform data different communication validation, and to perform paradigms, such as remote data conversion to map procedure call or peer-to- fields onto device-specific peer message passing. frames. A communication manager One presentation manager also has an interface is Digital's DECforms forms to its local transaction management product. The manager. It uses this DECforms product is the interface to tell the first implementation of the transaction manager when ANSI/ISO Forms Interface a transaction has spread Management Systems standard to a new node and to (CODASYL FIMS). [5] obtain information about Request Manager transaction commitment, A request manager provides which it exchanges with services to authenticate communication managers on the source of requests (a remote nodes. user and/or a presentation Presentation Manager device), to submit A presentation manager requests, and to receive provides an application replies from the execution program with a record- of requests. It supports oriented interface to such operations as send- a presentation device. request and receive-reply. Its services are used Send-request must provide by application programs, the identity of the source usually request initiators. device, the identity of By using presentation the user who entered the manager services, instead request, the identity of of directly accessing the application program to a presentation device, be invoked, and must input application programs become data to the program. device independent. 10 Digital Technical Journal Vol. 3 No. 1 Winter 1991 DECdta-Digital's Distributed Transaction Processing Architecture A request manager can by Digital's DECdtm either pass the request distributed transaction directly to an application manager. [2] program, or it can store requests in a queue. In the latter case, another request manager can subsequently schedule the request by dequeuing the request and invoking an application program. The ACMS System Interface is an example of an existing request manager interface for direct requests. The ACMS Queued Transaction Initiator is an example of a request manager that schedules queued requests. [1] Transaction Manager Architecture DECdta components are tied together by the transaction abstraction. Transactions allow application programs, resource managers, request managers (indirectly through queue resource managers), and communication managers to interoperate reliably. Since transactions play an especially important role in the DECdta architecture, we describe the transaction management functions in more detail. The DECdta architecture includes interfaces between transaction managers and application programs, resource managers, and communication managers, as shown in Figure 3. It also includes a transaction manager protocol, whose messages are propagated by communication managers. This protocol is used Digital Technical Journal Vol. 3 No. 1 Winter 1991 11 DECdta-Digital's Distributed Transaction Processing Architecture From a transaction manager to prepare, manager's viewpoint, a commit, or abort a transaction consists of transaction transaction demarcation - For a resource operations, transaction manager or execution operations, two- communication manager phase commit operations, to tell a transaction and recovery operations. manager whether o The transaction it has prepared, demarcation operations committed, or aborted are issued by an a transaction application program - For a communication to a transaction manager manager to ask and include operations a transaction to start and either end manager to prepare, or abort a transaction. commit, or abort a o Transaction execution transaction operations are issued - For a transaction by resource managers and manager to tell communication managers a communication to a transaction manager whether manager. They include it has prepared, operations committed, or aborted - For a resource a transaction manager or o Recovery operations communication manager are issued by a to join an existing resource manager to transaction its transaction manager - For a communication to determine the state manager to tell a of a transaction (i.e., transaction manager committed or aborted). to start a new branch In response to a start of a transaction that operation invoked by already exists at an application program, another node the transaction manager o Two-phase commit dispenses a unique operations are issued transaction identifier by a transaction for the transaction. manager to resource The transaction manager managers, communication that processes the managers, and through start operation is communication managers that transaction's home to other transaction transaction manager. managers, and vice- When an application program versa. They include invokes an operation operations supported by a resource - For a transaction manager, the resource manager to ask a manager must find out the resource manager transaction identifier of or communication the application program's 12 Digital Technical Journal Vol. 3 No. 1 Winter 1991 DECdta-Digital's Distributed Transaction Processing Architecture transaction. This can case, the result is as happen in different follows: the communication ways. For example, the manager at Node 1 becomes application program the subordinate of the may tag the operation transaction manager at Node with the transaction 1 for T and the superior of identifier, or the resource the communication manager manager may look up the at Node 2 for T; and the transaction identifier in communication manager at the application program's Node 2 becomes the superior context. When a resource of the transaction manager manager receives its first at Node 2 for T. This operation on behalf of arrangement allows the a transaction, T, it commit protocol between must join T, meaning transaction managers to that it must tell a be propagated properly by transaction manager that communication managers. it is a subordinate for T. After the transaction is Alternatively, the DECdta done with its application architecture supports a work, the application model in which a resource program that started manager may ask to be transaction T may invoke joined automatically to all an "end" operation at the transactions managed by its home transaction manager to transaction manager, rather commit T. This causes the than asking to join each home transaction manager transaction separately. to ask its subordinate A transaction, T, spreads resource managers and from one node, Node 1, to communication managers another node, Node 2, by to try to commit T. The sending a message (through transaction manager does a communication manager) this by using a two- from an application program phase commit protocol. that is executing T at The protocol ensures that Node 1 to an application either all subordinate program at Node 2. When T resource managers commit sends a message from Node the transaction or they all 1 to Node 2 for the first abort the transaction. time, the communication In phase 1, the home managers at Node 1 and transaction manager asks Node 2 must perform its subordinates for T to branch registration. prepare T. A subordinate This function may be prepares T by doing what performed automatically is necessary to guarantee by the communication that it can either commit managers. Or, it may T or abort T if asked to be done manually by the do so by its superior; this application program, which guarantee is valid even tells the communication if it fails immediately managers at Node 1 and Node after becoming prepared. To 2 that the transaction has prepare T, spread to Node 2. In either Digital Technical Journal Vol. 3 No. 1 Winter 1991 13 DECdta-Digital's Distributed Transaction Processing Architecture o Each subordinate for T for T. When the home recursively propagates transaction manager the prepare request to receives acknowledgments its subordinates for T from all of its o Each resource manager subordinates for T, the subordinate writes all transaction commitment is of T's updates to stable complete. storage To recover from a failure, o Each resource manager all resource managers and transaction manager that participated in a subordinate writes a transaction must examine prepare-record to stable their logs on stable storage storage to determine what A subordinate for T replies to do. If the log contains with a "yes" vote if and a commit or abort record when it has completed its for T, then T completed. No stable writes and all of action is required. If the its subordinates for T have log contains no prepare, voted "yes"; otherwise, commit, or abort record it votes "no." If any for T, then T was active. T subordinate for T does not must be aborted. If the log acknowledge the request contains a prepare record to prepare within the for T, but no commit or timeout period, then the abort record for T, T was home transaction manager between phases 1 and 2. The aborts T; the effect is the resource manager must ask same as issuing an abort its superior transaction operation. manager whether to commit or abort the transaction. In phase 2, when the home An inherent problem in all transaction manager has two-phase commit protocols received "yes" votes from is that a resource manager all of its subordinates for is blocked between phases T, it decides to commit T. 1 and 2, that is, after It writes a commit record voting "yes" and before for T to stable storage receiving the commit and tells its subordinates or abort decision. It for T to commit T. Each cannot commit or abort subordinate for T writes the transaction until the a commit record for T transaction manager tells to stable storage and it which to do. If its recursively propagates transaction manager fails, the commit request to the resource manager may be its subordinates for T. blocked indefinitely, until A subordinate for T replies either the transaction with an acknowledgment if manager recovers or an and when it has committed external agent, such as a the transaction (in the system manager, steps in to case of a resource manager tell the resource manager subordinate) and has whether to commit or abort. received acknowledgments from all subordinates 14 Digital Technical Journal Vol. 3 No. 1 Winter 1991 DECdta-Digital's Distributed Transaction Processing Architecture A transaction T may model will be made public spontaneously abort due via product offerings or to system errors at any architecture publications. time during its execution. Or, an application program Acknowledgments (prior to completing its work) or a resource This architecture grew manager (prior to voting from discussions with "yes") may tell its many colleagues. We thank transaction manager to them all for their help, abort T. In either case, especially Dieter Gawlick, the transaction manager Bill Laing, Dave Lomet, then tells all of its Bruce Mann, Barry Rubinson, subordinates for T to undo Diogenes Torres, and the the effects of T's resource TP architecture group, manager operations. including Edward Braginsky, Subordinate resource Tony DellaFera, George managers abort T, and Gajnak, Per Gyllstrom, and subordinate communication Yoav Raz. managers recursively propagate the abort request References to their subordinates for T. 1. T. Speer and M. Storm, The two-phase commit "Digital's TP Monitors," protocol is optimized Digital Technical for those cases in which Journal, vol. 3, no. the number of messages 1 (Winter 1991, this exchanged can be reduced issue): 18-32. below that of the general 2. J. Johnson, W. case (e.g., if there is Laing, and R. Landau, only one subordinate "Transaction Management resource manager, if Support in the VMS a resource manager did Operating System not modify resources, Kernel," Digital or if the presumed-abort Technical Journal, vol. protocol was used to save 3, no. 1 (Winter 1991, acknowledgments). [6] this issue): 33-44. Summary 3. P. Bernstein, V. Hadzilacos, and N. We have presented an Goodman, Concurrency overview of the DECdta Control and Recovery architecture. As part in Database Systems of this overview, we (Reading, MA: Addison- introduced the components Wesley, 1987. and explained the function of each interface. We also described the DECdta transaction management architecture in some detail. Over time, many interfaces of the DECdta Digital Technical Journal Vol. 3 No. 1 Winter 1991 15 DECdta-Digital's Distributed Transaction Processing Architecture 5. FIMS Journal of Development (Norfolk, VA: CODASYL FIMS Committee, July 1990). 4. P. Bernstein, M. 6. C. Mohan, B. Lindsay, Hsu, and B. Mann, and R. Obermarck, "Implementing "Transaction Recoverable Requests Management in the R* Using Queues," Distributed Database Proceedings 1990 ACM Management System," ACM SIGMOD Conference on Transactions on Database Management of Data (May Systems, vol. 11, no. 4 1990). (December 1986). 16 Digital Technical Journal Vol. 3 No. 1 Winter 1991 ============================================================================= Copyright 1991 Digital Equipment Corporation. Forwarding and copying of this article is permitted for personal and educational purposes without fee provided that Digital Equipment Corporation's copyright is retained with the article and that the content is not modified. This article is not to be distributed for commercial advantage. Abstracting with credit of Digital Equipment Corporation's authorship is permitted. All rights reserved. =============================================================================