AWS Lambda Finite State Machines

A Python framework for developing finite state machine-based workflows on AWS Lambda.

View project on GitHub
« TODO Overview »

The FSM implementation is inspired by the paper

[1] J. van Gurp, J. Bosch, “On the Implementation of Finite State Machines”, in Proceedings of the 3rd Annual IASTED International Conference Software Engineering and Applications,IASTED/Acta Press, Anaheim, CA, pp. 172-178, 1999. (www.jillesvangurp.com/static/fsm-sea99.pdf)

Architecture

The system is designed to run on AWS.

AWS Architecture

Code execution

Code execution is accomplished via both:

  1. AWS Lambda and
  2. (EXPERIMENTAL) AWS ECS.

AWS ECS is used to run any containerized applications you may want to add to a workflow, but it is not the only option. It is very straightforward to implement an Action class that starts a container in any system that is able to execute containerized applications, be it open-sourced or in-house.

Event dispatch

Event dispatch can be handled by several different AWS services. Since AWS Lambda functions can be driven from several event sources, it is possible to select any of the following for event dispatch:

  1. AWS SQS or
  2. AWS Kinesis or
  3. AWS SNS or
  4. AWS DynamoDB

The system also supports the notion of primary and secondary/failover event sources, so it is possible to specify two of the above sources. In the event of service issues on the primary source, the system will automatically dispatch events to the secondary source.

Retries

Retries of failed state transitions can be handled by several different AWS services:

  1. AWS SQS or
  2. AWS Kinesis or
  3. AWS SNS or
  4. AWS DynamoDB

Either AWS DynamoDB or AWS SQS are the preferred mechanism for retries, since they are the only two sources that support the notion of backoff (running something after a specified delay).

Checkpointing

Checkpointing can be handled by a only a single system at the moment:

  1. AWS DynamoDB

AWS DynamoDB is able to store the offset of the most recently dispatched AWS Kinesis message.

Locks and Idempotency

A cache is used to prevent re-execution of state transitions and to prevent concurrent execution in error scenarios. Two AWS services are suitable for this task:

  1. Memcache/Redis
  2. AWS DynamoDB

Monitoring

When problems occur during code execution, these problems are recorded as custom metrics in:

  1. AWS CloudWatch
« TODO Overview »