In the world of microservices, a transaction is now distributed to multiple services that are called in a sequence to complete the entire transaction.
With the advent of microservice architecture, we are losing the ACID nature of databases. Transactions may now span multiple microservices and, therefore, databases.
To address the problems of distributed transactions, the following list of approaches have been described:
- Two-Phase Commit / XA
- Eventual Consistency and Compensation / SAGA
- Try, Confirm, Cancel / TCC
In classic solutions, the developers should choose one of the three patterns to handle distributed transactions.
But in this article, we introduce a workflow pattern in github.com/dtm-labs/dtm. Under this pattern, a mixture of XA, SAGA and TCC can be applied to different branches in a single distributed transactions, and a mixture of HTTP, gRPC, and local operations can also be used, allowing users to customize most of the contents of a distributed transaction, providing great flexibility.
Workflow Example
In the Workflow mode of DTM, both HTTP and gRPC protocols can be used. The following is an example of the gRPC protocol, which is divided into the following steps:
- Initialize the SDK
- Register workflow
- Execute workflow
Complete example can be found here workflow-grpc
Initialize the SDK
First, you need to Initialize the SDK’s workflow before you can use it.
Register Workflow
Then you need to register workflow’s handler function. The following is a saga workflow to do a cross-bank transfer:
- This registration operation needs to be executed after the business service is started because when the process crashes, dtm will call back to the business server to continue the unfinished task
- The above code
NewBranchwill create a transaction branch, one that will include a forward action and a callback on global transaction commit/rollback OnRollback/OnCommitwill register a callback on global transaction rollback/commit for the current transaction branch. In the above code, onlyOnRollbackis specified, so it is in Saga mode- The
busi.BusiCliin the above code needs to add a workflow interceptor which will automatically record the results of the rpc request to dtm as follows
You can, of course, add workflow.Interceptor to all gRPC clients, this middleware will only handle requests under wf.Context and wf.NewBranchContext()
- When the workflow function returns nil/ErrFailure, the global transaction enters the Commit/Rollbasck phase, calling the operations registered in OnCommit/OnRollback inside the function in reverse order
Execute Workflow
Finally, the workflow is executed.
- When the result of Execute is
nil/ErrFailure, the global transaction has succeeded/been rolled back. - When the result of Execute is other values, the dtm server will subsequently call back this workflow task to retry
Principle of Workflow
How does workflow ensure data consistency in distributed transactions? When a business process has a crash or other problem, the dtm server will find that this workflow global transaction has timed out and not completed. Then the dtm server will use an exponential retreat algorithm and retry the workflow transaction. When the workflow retry request reaches the business service, the SDK will query the progress of the global transaction from the dtm server.
For the completed branch, it will take the previously saved result and return the branch result directly through an interceptor such as gRPC/HTTP. Eventually, the workflow will complete successfully.
Workflow functions need to be idempotent, i.e., the first call, or subsequent retries, should get the same result.
Saga in Workflow
The core idea of the Saga pattern, derived from this paper SAGAS, is that long transactions are split into short transactions coordinated by the Saga transaction coordinator. If each short transaction operation successfully completes, then the global transaction completes normally, and if a step fails, the compensating operations are invoked one at a time in reverse order.
In Workflow mode, you can call the function for the operation directly in the function and then write the compensation operation to OnRollback of the branch, and then the compensation operation will be called automatically, achieving the effect of Saga mode
Tcc under Workflow
The Tcc pattern is derived from this paper, Life beyond Distributed Transactions: an Apostate’s Opinion. He divides a large transaction into multiple smaller transactions, each of which has three operations.
- Try phase: attempts to execute, completes all business checks, sets aside enough business resources
- Confirm phase: if the Try operation succeeds on all branches, it goes to the Confirm phase, which executes the transaction without any business checks, using only the business resources set aside in the Try phase
- Cancel phase: If one of the Try operations from all branches fails, we go to the Cancel phase, which releases the business resources reserved in the Try phase.
For our scenario of an interbank transfer from A to B, if SAGA is used and the balance is adjusted in the forward operation and is adjusted reversely in the compensating operation, then the following scenario would occur.
- A deducts the money successfully
- A sees the balance decrease and tells B
- The transfer of the amount to B fails, and the whole transaction is rolled back
- B never receives the funds
This causes great distress to both As and Bs. This situation cannot be avoided in SAGA, but TCC can resolve it with the following design technique:
- Introduce a
trading_balancefield in addition to the balance field in the account - Try phase to check if the account is frozen, check if the account
balance-trading_balanceis sufficient, and then adjust thetrading_balance(i.e., the funds frozen for business purposes) - Confirm phase, adjust balance, adjust
trading_balance(i.e., unfrozen funds for the business) - Cancel phase, adjust
trading_balance(i.e., unfsrozen funds on the business)
In this case, once end user A sees their sbalance deducted, then B must be able to receive the funds
In Workflow mode, you can call the Try operation directly in the function, then register the Confirm operation to OnCommit in the branch and register the Cancel operation to OnRollback in the branch, achieving the effect of the Tcc mode
XA Under Workflow
XA is a specification for distributed transactions proposed by the X/Open organization. The XA specification essentially defines the interface between a (global) Transaction Manager (TM) and a (local) Resource Manager (RM). Local databases such as MySQL play tse RM role in the XA.
XA is divided into two phases.
- Phase 1 (prepare): All participants RM prepare to execute the transaction and lock the required resources. When the participants are ready, they report to TM that they are ready.
- Phase 2 (commit/rollback): When the transaction manager (TM) confirms that all participants (RMs) are ready, it sends a commit command to all participants.
Currently, all major databases support XA transactions, including MySQL, Oracle, sqlserver, and postgres
In Workflow mode, you can call NewBranch().DoXa in the workflow function to open your XA transaction branch.
Mixing Multiple Modes
In Workflow mode, Saga, Tcc, and XA, as described above, are all patterns of branching transactions. So you can use one pattern for some branches and another pattern for others. The flexibility offered by this mixture of patterns allows for sub-patterns to be chosen according to the characteristics of the branch transaction, so the following is recommended.
- XA: If the business has no row lock contention, and the global transaction will not last long, XA can be used. This pattern requires less additional development and
Commit/Rollbackis done automatically by the database. For example, this pattern is suitable for an order creation business where different orders lock different order rows and have no effect on concurrency between each other. It is not suitable for deducting inventory because orders involving the same item will all compete for the row lock of this item, which will lead to low concurrency. - Saga: A common business unsuitable for XA can use this model. This model has less extra development than Tcc; it only needs to develop forward operation and compensation operation
- Tcc: suitable for high consistency requirements, such as the transfer described earlier, this pattern has the most additional development and requires the development of operations including
Try/Confirm/Cancel
Idempotency Requirements
In the Workflow pattern, when a crash occurs, a retry is performed so the individual operations are required to support idempotency, i.e., the result of the first call is the same as the next try, returning the same result.
In business, the unique key of the database is usually used to achieve idempotency, specifically insert ignore "unique-key". If the insert fails, it means this operation has been completed. This time directly ignored to return. If the insert succeeds, it means that this is the first operation, continue with the subsequent business operations.
If your business itself is idempotent, then you can operate your business directly. If your business does not provide idempotent functionality, then dtm provides a BranchBarrier helper class, based on the above unique-key principle, which can easily help developers implement idempotent operations for Mysql/Mongo/Redis.
Please note that the following two are typical non-idempotent operations:
- Timeout rollback: If you have an operation in your business that may take a long time, and you want your global transaction to roll back after waiting for the timeout to return a failure. Then this is not an idempotent operation because, in the extreme case of two processes calling the operation simultaneously, one returns a timeout failure and the other a success, resulting in different results.
- Rollback after reaching the retry limit: the analysis process is the same as above.
Workflow mode does not support the above timeout rollback and retry limit rollback at the moment. If you have a relevant scenario, please send us the specific scenario. We will actively consider whether to add this kind of support.
Branch Operation Results
Branching operations will return the following results.
- Success: the branch operation returns
HTTP-200/gRPC-nil - Business failure: the branch operation returns
HTTP-409/gRPC-Aborted, no more retries, and the global transaction needs to be rolled back - In progress: branch operation returns
HTTP-425/gRPC-FailPrecondition. This result indicates that the transaction is in progress normally and requires the dtm to retry not with the exponential retreat algorithm but with fixed interval retries - Unknown error: the branch operation returns other results, indicating an unknown error, and dtm will retry this workflow using the exponential retreat algorithm
If your existing service has a different result than the above, then you can customise this part of the result with workflow.Options.HTTPResp2DtmError/GRPCError2DtmError .
Saga’s Compensation operation and Tcc’s Confirm/Cancel operation are not allowed to return business failures according to the Saga and Tcc protocols, because when in the second stage of the workflow, Commit/Rollback, is neither successful nor allowed to retry, then the global transaction cannot be completed, so please take care to avoid this when designing.
Transaction Completion Notification
Some business scenarios where you want to be notified of the completion of a transaction can be achieved by setting an OnFinish callback on the first transaction branch. By the time the callback is called, all business operations have been performed, and the global transaction is substantially complete. The callback function can determine whether the global transaction has finally been committed or rolled back based on the isCommit passed in.
One thing to note is that when the OnFinish callback is called, the state of the transaction has not yet been modified to the final state on the dtm server. So if you use a mixture of transaction completion notifications and querying global transaction results, the results of the two may not be consistent. It is recommended that users use only one of these methods rather than a mixture.
Conclusion
We introduce a workflow pattern to support the mixed usage of Saga, XA, and Tcc. This pattern also supports HTTP, gRPC, and local transactions.
We also illustrate the usage of workflow in Golang with runnable examples.
Want to Connect?You are welcome to visit https://github.com/dtm-labs/dtm
所有评论(0)