Thursday, October 21, 2010

XA Transactions

Reading MySQL documentation I have found a good description of how XA distributed transactions and two-phases commit work. I like to share it because it's short, clear and applies in general situations. It also teaches us why distributed transaction, being much more complex, should be managed very carefully to avoid severe performance penalties.
Applications that use global transactions involve one or more Resource Managers and a Transaction Manager:

A Resource Manager (RM) provides access to transactional resources. A database server is one kind of resource manager. It must be possible to either commit or roll back transactions managed by the RM.

A Transaction Manager (TM) coordinates the transactions that are part of a global transaction. It communicates with the RMs that handle each of these transactions. The individual transactions within a global transaction are “branches” of the global transaction. Global transactions and their branches are identified by a naming scheme described later.

The MySQL implementation of XA MySQL enables a MySQL server to act as a Resource Manager that handles XA transactions within a global transaction. A client program that connects to the MySQL server acts as the Transaction Manager.

To carry out a global transaction, it is necessary to know which components are involved, and bring each component to a point when it can be committed or rolled back. Depending on what each component reports about its ability to succeed, they must all commit or roll back as an atomic group. That is, either all components must commit, or all components musts roll back. To manage a global transaction, it is necessary to take into account that any component or the connecting network might fail.

The process for executing a global transaction uses two-phase commit (2PC). This takes place after the actions performed by the branches of the global transaction have been executed.

In the first phase, all branches are prepared. That is, they are told by the TM to get ready to commit. Typically, this means each RM that manages a branch records the actions for the branch in stable storage. The branches indicate whether they are able to do this, and these results are used for the second phase.

In the second phase, the TM tells the RMs whether to commit or roll back. If all branches indicated when they were prepared that they will be able to commit, all branches are told to commit. If any branch indicated when it was prepared that it will not be able to commit, all branches are told to roll back.

In some cases, a global transaction might use one-phase commit (1PC). For example, when a Transaction Manager finds that a global transaction consists of only one transactional resource (that is, a single branch), that resource can be told to prepare and commit at the same time.

2 comments:

  1. Hi Maurizio,
    do you think it's feasible to add support for two-phase commit in Alfresco? On several occasions I needed to make two transactions (one of an external system and one of Alfresco) be bound together, so that if one fails the other will be rolled back. Unfortunately I couldn't do this the way I wanted because distributed transactions are not available in Alfresco (expecially in webscripts).

    Cheers

    Fabio

    ReplyDelete
  2. The problem is theoretically feasible, but Alfresco (as any ECM) has to manage two kinds of very different resources: databases and file systems. While most databases are transactional, file systems are usually not transaction-aware resources. Moreover, Web Scripts are meant to implement a single service, and by definition a service must be atomic (for a lot of good reasons, first of all speed and robustness). This is a more general problem of an orchestrator coordinating a set of services: the more wide are transactions, the less robust the whole system becomes (and less interoperable).
    In my experience I have seen working XA systems involving databases and messaging (JMS) only, but never when dealing with file system operations. You'll need a file system able to implement an application-level commit /rollback mechanism, much like a database, while file systems usually can (as far as I know) commit / rollback at the operating system level only. That's a drawback of implementing an ECM over a file system, but advantages in terms of speed and scalability are also pretty clear.

    ReplyDelete