v0.6.0 Release: Simplified Deployment And Improved Error Management

Building Infinitic, an event-driven orchestration engine providing reliable and scalable workflows, even in distributed environments.

Hi! 

It has been a while since the last email, but a lot happened meanwhile:

  • Infinitic v0.6.0 is out with simplified deployment and improved error management.
  • Infinitic had its first formal load testing during the recent StreamNative Pulsar Hackathon.
  • Infinitic will be at the next Pulsar Summit North America!

Simplified deployment

You may remember that Infinitic is based on engines (workflow engines, task engines, tag engines…) that need to be deployed along with workers processing tasks and workflows. Well, since Infinitic v0.6, the deployment is greatly simplified with only task workers and workflow workers now needed.

With v0.6 the task and workflow engines are embedded as default into task workers and workflow workers. And each task and workflow has their own engine instances. The motivation for this is to optimize the flow of messages into Pulsar: a large influx of messages for a specific workflow should not delay workflows. 

Incidentally, it provided the opportunity to simplify the deployment by embedding every task/workflow needed into a task/workflow worker. The Pulsar topics architecture is still the same: 

But now, a lot of this complexity is internal to workers.

Error management 

Up to now, a task could not fail in a workflow. Of course, a task could fail, but Infinitic would automatically retry it until its completion, and the workflow would resume. It implied that a workflow could be stuck forever if a task failed unrecoverably.

I recognize that there are situations where workflow needs to continue even if a task could not complete. So, from v0.6, you can catch the task failure directly within the workflow code to react to this situation. It is quite a sophisticated piece of code; I recommend you look at the documentation.

A useful consequence of this new feature is that a workflow now raises an exception when stalled due to a task failure. And this exception recursively contains the reason for the chain of failures. It means you can easily find the root cause of the issue (that could be a failed task in a child workflow, for example).

Debugging an event-driven architecture is notoriously tricky, and this new feature will tell you where exactly the root failure occurred in your distributed infrastructure.

Load testing Infinitic during the last Pulsar Hackaton

With Matthieu Jacquet (from marketing company Splio) and John Kinson, we participated in the last Pulsar hackathon. We decided to build the prototype of a performance benchmarking tool load-test Infinitic. During this 48h hackathon:

  • We wrote a workflow launcher to be able to dispatch workflows according to a scenario defined in a configuration file;
  • We built a workflow that emulated 2 different supply chain component providers for a product,  with a purchaser workflow placing an order, using the supplier that responded first successfully;
  • We set up a local Docker configuration to run Prometheus and Grafana to get nice dashboards;
  • We were able to run all developments on a hosted Pulsar instance provided by StreamNative;
  • We did a nice 10 minute video of this work that you can see on our YouTube Channel.

I’m thrilled with the results, as we were able to reach consistently a completion rate of 20 workflows/second (nearly 2 million workflows per day) with a single worker host (a basic Macbook Pro) and a minimal Pulsar cluster.

Workflow engines often have a bad reputation for being slow and a single point of failure. Event-driven workflow engines like Infinitic are pushing those limits. I really believe that this technology will be more and more common in the coming years.

Pulsar Summit

Last but not least, I’m thrilled to share that my paper “Infinitic: Building a Workflow Engine on Top of Pulsar” has been accepted for the next Pulsar Summit:

This is a great event for anyone interested in messaging and event streaming to learn the latest Pulsar project updates, use cases, and best practices! I can’t recommend enough you to join.

That’s it for today. Be safe.