« TODO | Overview » |
The FSM implementation is inspired by the paper
[1] J. van Gurp, J. Bosch, “On the Implementation of Finite State Machines”, in Proceedings of the 3rd Annual IASTED International Conference Software Engineering and Applications,IASTED/Acta Press, Anaheim, CA, pp. 172-178, 1999. (www.jillesvangurp.com/static/fsm-sea99.pdf)
Architecture
The system is designed to run on AWS.
Code execution
Code execution is accomplished via both:
- AWS Lambda and
- (EXPERIMENTAL) AWS ECS.
AWS ECS is used to run any containerized applications you may want to add to a
workflow, but it is not the only option. It is very straightforward
to implement an Action
class that starts a container in any system that is able
to execute containerized applications, be it open-sourced or in-house.
Event dispatch
Event dispatch can be handled by several different AWS services. Since AWS Lambda functions can be driven from several event sources, it is possible to select any of the following for event dispatch:
- AWS SQS or
- AWS Kinesis or
- AWS SNS or
- AWS DynamoDB
The system also supports the notion of primary and secondary/failover event sources, so it is possible to specify two of the above sources. In the event of service issues on the primary source, the system will automatically dispatch events to the secondary source.
Retries
Retries of failed state transitions can be handled by several different AWS services:
- AWS SQS or
- AWS Kinesis or
- AWS SNS or
- AWS DynamoDB
Either AWS DynamoDB or AWS SQS are the preferred mechanism for retries, since they are the only two sources that support the notion of backoff (running something after a specified delay).
Checkpointing
Checkpointing can be handled by a only a single system at the moment:
AWS DynamoDB is able to store the offset of the most recently dispatched AWS Kinesis message.
Locks and Idempotency
A cache is used to prevent re-execution of state transitions and to prevent concurrent execution in error scenarios. Two AWS services are suitable for this task:
Monitoring
When problems occur during code execution, these problems are recorded as custom metrics in:
« TODO | Overview » |