Actor Model

2024-05-10

Tags:

Concurrent programming has always been tricky. Accessing a shared state from multiple threads is a central problem, where hard-to-catch errors may easily appear.

Protected access to identified shared variables is easy to realize. Mechanisms to avoid deadlocks or starvation are cumbersome to implement and almost impossible to validate.

The need for concurrent programming has increased tremendously with the fact that all modern processors and many microcontrollers have multiple cores.

Java or C++ have all the instruments for safe and flawless concurrency, but the compiler would deliberately let developers write dangerous code.

Java has native threading support since inception. Powerful synchronization primitives were added later with the addition of the concurrent package. The latest evolutions are virtual threads enabling the creation of thousands of parallel tasks and structured concurrency to write legible parallel constructs.

C and C++ have threading support and synchronization primitives were added with the 2011 revision. Later versions added small refinement to the synchronization mechanisms.

There is a strong need for higher-level frameworks ensuring safe concurrent programming [1, 2].

The Actor Model is one of the approaches to safe concurrency. The model is based on Message Passing. The actor model originated in 1973 and was first described by Carl Hewitt.

Although multiple actors can run at the same time, an actor will process a given message sequentially. This means that if you send three messages to the same actor, it will sequentially process one at a time. To have these three messages being executed concurrently, you need to create three actors and send one message each.

Messages are sent asynchronously to an actor that needs to store them somewhere while it is processing another message. The mailbox is the place where these messages are stored ^[1].

Another interesting aspect of the actor model is that it does not matter if the actor which I am sending a message to is running locally or on another node.

An actor is a computational entity that, in response to a message it receives, can concurrently [2]:

Send a finite number of messages to other actors.
Create a finite number of new actors.
Designate the behavior to be used for the next message it receives. An actor shall only access local data and information from the current message.

Each actor owns a private mailbox or message queue. The owning actor is the sole instance empowered to retrieve messages.

Information hiding prohibits direct access to the mailbox from other actors. Use the provided send(Message msg) operation to send a message to another actor.

Any actor can send a message to another actor. Multiple actors can send messages to a specific actor.

The send operation is performed in the context of the sender actor and associated thread.

The sender cannot modify an already sent message. Ideally, messages should be immutable objects.

The receiver is the owner of a received message and is responsible for its disposal.

The retrieving of a message from the message queue and further processing is performed in the context of the receiver actor and associated thread.

Advantages and Downsides

Here are four benefits of this model:

It removes the need for lock-based synchronization and decreases temporal coupling. Actors change only their private state. Other actors’ states are off-limits to them.
There is no shared state in the actor model, so there is no possibility of that disaster when multiple threads try to modify the same data.
It was designed for distributed computing and is resilient to receiver failures. Every actor has a mailbox that stores messages until they can be processed. It does not matter if the actor runs locally, on another core, or on another device. As long as it can receive messages, it makes no difference.
It allows for better scalability and the receiver controls consumption rate. If your actor-based solution sees an increase in the traffic load ^[2], you can create more actors ^[3]. Actors can also be programmed to create child actors under certain conditions.
This makes scaling an application a lot easier.
If your application has constrained resources, you can monitor the number of unprocessed messages in the actor queue. If the average or maximum size of the queue exceeds expectations, your system is wrongly dimensioned. You must either reduce processing requirements or increase available processor and memory resources.
It does not require the sender to block waiting for a return value. This is an overall benefit of using messaging instead of method calls. The receiver will send a return value in another message when it is done handling the original one. The sender can send out more messages in the meantime.

Embedded applications using traditional approaches with synchronization primitives such as mutexes and semaphores are arduous to implement correctly.

The MISRA committee has recognized the problems and defined a set of restrictions in MISRA C 2012 Amendment 4, or should we say MISRA C 2023 revision. The recommendations are:

No dynamic creation of threads.
No data races between threads are acceptable. Thus, threads must be protected by a synchronization mechanism.
All synchronization objects shall be created and instantiated during the initialization of the application.
No cyclic calling chains with threads shall exist to avoid deadlocks on synchronization primitives.

The actor model solves all the identified synchronization difficulties and fulfills the above restrictions.

Actor-based concurrency comes with some drawbacks:

An actor processes messages one at a time. So, if several actors send five messages to the same receiving actor, it will execute them one after the other. If you need these messages executed simultaneously, you need five different actors running simultaneously.
Deadlock is still possible. Two actors may end up waiting for a message from each other, thus creating a deadlock. Overall, the model is considered susceptible to deadlocks.
However, it is worth noting that concurrent programming is more error-prone and complex than sequential programming.
You need to enforce message immutability. Using the actor model in languages that do not enforce immutability out of the box means it is up to the developers to ensure messages remain immutable.
If this verification is overlooked, messages may become mutable and lead to thread-safety concerns.
Unexpected failures can be critical. If an actor failure occurs, other actors may get perpetually stuck awaiting a message from it. To avoid this situation, developers have to employ defensive programming techniques and handle exceptions within the scope of each actor.

Minimal Actor Library

To send a message to an actor, you need a reference to the receiving entity.

Introduce naming conventions for actors to profit from search capabilities and improved logging. An actor should have a unique external identifier to support identification and querying.

Consider using a sealed class hierarchy to define messages actors receive ^[4]. Pattern matching with instanceof would provide exhaustive and type-secure handling of all alternatives.

Java provides the necessary mechanisms. Modern pattern matching in Java provides elegant solutions.

The behavior of an actor is specified as a flat or a hierarchical state machine.

The net.tangly.fsm library provides the abstractions to elegantly implement applications with Java:

A library providing actors and asynchronous message-passing communication.
A timer manager to schedule time-triggered tasks. This approach slightly simplifies the programming of actors. An actor only needs to wait on regular messages or timeout messages on its mailbox.
Hierarchical state machines as described in the UML standard.
A flow library to publish and subscribe data and realize transformation pipelines.

A similar library net::tangly::vinci provides similar abstractions to elegantly implement applications with C++:

A library providing actors and asynchronous message-passing communication.
A timer manager to schedule time-triggered tasks.
Embedded applications often have more simple flat finite state machines. Such machines should be implemented programmatically with switch statements.
Additionally, the Boost library provides statechart libraries.
A message pool feature to programmatically acquire and release message instances. The pool is useful when exchanging messages between actors. Otherwise, either the object will be deleted when leaving the scope in one actor, or you have to dynamically allocate on the heap. Both approaches are suboptimal for realtime embedded applications.

Data processing pipelines should be implemented with flow approaches and not with actors.

Multiple publishers and multiple consumers for a data channel should be provided.

References

[1] D. Schmidt, M. Stal, H. Rohnert, and F. Buschmann, Pattern-Oriented Software Architecture, vol. Volume 2. Wiley, 2000 [Online]. Available: https://www.amazon.com/dp/B00CHK5SIA

[2] V. Vernon, Reactive Messaging Patterns with the Actor Model : Applications and Integration in Scala and Akka. Addison-Wesley Professional, 2015 [Online]. Available: https://www.amazon.com/dp/B011S8YC5G

1. The mailbox is often called message queue in the literature.

2. A higher traffic load has a direct impact on the number of unprocessed messages in actor mailboxes.

3. The creation of additional actors is prohibited in security relevant applications. For example, the MISRA 2022 standard explicitly prohibits dynamic creation of processes. The only possible approach is to dimension your systems to fulfill your non-functional requirements.

4. A similar approach can be implemented in C++ with std::variant construct.