Domain-Driven Design with ROS-2

2024-08-10

Tags:

At its core, Robot Operation System provides a message-passing system, often called middleware.

Communication is one of the first needs to arise when implementing a new robot application, or really any software system that will interact with hardware. ROS’s built-in and well-tested messaging system saves you time by managing the details of communication between distributed nodes via an anonymous Publish and Subscribe Pattern.

This approach encourages good practices in your software development, including fault isolation, Seperation of Concerns, and clear interfaces. Using ROS results in systems that are easier to maintain, contribute to, and reuse.

The ROS ecosystem is a cornucopia of robot software. Whether you need a device driver for your GPS, a walk and balance controller for your quadruped, or a mapping system for your mobile robot, ROS has something for you. From drivers to algorithms, to user interfaces, ROS provides the building blocks that allow you to focus on your application.

The goal of the ROS project is to continually raise the bar on what is taken for granted and thus to lower the barrier to entry to building robot applications. Anyone with a good idea for a useful robot should be able to make that idea real without having to understand everything about the underlying hardware and software.

How do you map your application design to the ROS-2 framework?

The ROS community is focused on robotics and less on sound software engineering practices. We recommend using the de-facto standard for modern distributed software architectures named domain-driven design.

Domain-driven design heavily emphasizes object-oriented modeling and event-based communication [1, 2, 3].

So your architecture should follow the recommendations of this design method and evolve during realization [4].

Ubiquitous Language

The user language is used to name the abstractions and operations used in the source code.

The physical battery pack should have a corresponding class in the source code. A motor component or a temperature alarm should also map to the corresponding source abstractions ^[1].

The UML class diagrams should only contain classes with names associated to the domain vocabulary.

Ubiquitous language is the language used by the users and domain experts. The developers have to learn this language through interactions with the users. ^[2].

The UML diagrams reflect the language used in the source code. In other words, the truth always lays in the source code.

Bounded Domain as ROS-2 package

A package is an organizational unit for your Robot Operation System ROS-2 code. A package is automatically a distribution unit that can be installed on specific hardware.

A package is also used as a UML logical component. A logical component should be a bounded domain in your architecture partitioning.

Any bounded domain is modeled as a logical UML package and as a ROS distribution package.

Bounded domain examples are battery pack, tracker, sensor package, vehicle, and user interface. If you decide to have a modular user interface, you could have additional domains such as engine user interface, or mission definition user interface.

Bounded domains are the smallest unit to define a system behavior.

Designers can define cross-cutting solutions to solve recurring problems. Examples are configuration, alarm handling, monitoring, observability, logging, or data persistence ^[3].

The implementation is always confined to domains. Pair working and code reviews can help to ensure similar concrete solutions are used in multiple domains ^[4].

With packages, you can release your ROS 2 work and allow others to build and use it easily.

Beware, ROS architecture does not distinguish logical components from deployment units. You should use a ROS package to simultaneously define a design abstraction and an operational deployment object ^[5].

Your UML component diagrams and deployment diagrams will almost contain the same elements ^[6].

Bounded Domain API

Three aspects define the interface of each bounded domain in the context of ROS-2.

First, we have the messages published on the topics owned by the domain. These messages define the asynchronous interface of the bounded domain ^[7]. The messages for a domain are part of the related ROS-2 package and are defined in a description file included in the package. Each message and all its fields are syntactically well-documented using Doxygen.

Syntactically means at least the following:

What is the type of the field?
Is the field mandatory or optional?
Has the field a default value, and is it specified in the message description?
What is the range of admissible values if not encoded in as a custom type?
If strings are used, try to define the pattern of legal values. Evaluate the usage of bounded strings to improve interface quality.
Are application relevant values defined as constants? An example would be the pattern for ISO 8601 date format.

Additional documentation can provide semantic information. Examples of messages are always helpful to new developers.

Some teams add a semantic meaning to topics. For example, they use a specific topic to publish recovery instructions for a set of components. The topic is specific to the recovery process and also defines the involved parties. Such conventions should be documented.

Second, we have the services provided by the domain. Services are remote procedure calls and should be documented with the same conventions as a method. Each service is documented using Doxygen.

Third the domain sends events and messages to other domains. Here we define the dependencies to other bounded domains. This aspect can either be documented or inferred with the ROS-2 tools.

A component diagram is provided for each package. The diagram contains all topics and services the domain exposes. Additional class or sequence diagrams can be provided to explain specific aspects of the domain. We recommend documenting the state machine implemented in each node with statechart diagrams ^[8].

A system-wide component view is also provided ^[9].

Event-based changes: DDD requires that relevant state changes are communicated to interesting parties as events [3, 5]. You should never propagate system state changes through services. A domain cannot be aware of all parties interested in state changes.
Node as Parallel Processing Unit: ROS-2 uses node as an independent process unit. A node should be small and offer cohesive services. We recommend having as many as possible single threaded nodes to profit from the ROS-2 provided solutions ^[10].

Nodes never share state or computed values with another node. Each node has a local copy of all data needed to fulfill its responsibilities. State changes or updated data shall be sent as messages to interested parties.
Asynchronous vs. Synchronous Communication: DDD and ROS-2 heavily emphasize the importance of a distributed asynchronous system. Synchronous communication should be avoided as much as possible if you follow the principles of domain-driven design and of the robot framework.
Layered Architecture: Bigger packages could profit from a layered architecture to reduce coupling inside a module. Smaller robots and machines seldom have very complex domains and need no or minimal layering. Try to have a layer for the hardware access, a layer for the logical sensor or actuator abstraction, and a layer for the application logic.
Domain Documentation: The domain is documented using UML as promoted in DDD. Finite State Machine should be used to document complex nodes. Stateless nodes logically do not need such finite state machines. These nodes implement filtering or data processing algorithms.

The engine has semantic layers of functionalities.

The lowest layer defines the hardware access and the physical sensors and actuators.

A device groups a set of related sensors and actuators to provide a clear abstraction with single responsibility. An example would be a tractor unit with motors, movement sensors, actuators to avoid slipping on the path.

A tractor manager coordinates multiple tractor units to implement more complex operations such as a traveling a path with an expected speed.

A tractor handler implements more complex functions such as executing a planned route and handling obstacles.

An optional user interface displays information about the handler route planning, tractor manager status, and state information of tractors.

An event-based system supports observability on all layers. A layered message architecture restricts sending commands only to the next underlying layer.

Entities and Identity

Topic names are the identity mechanism of all internal abstractions [1] [2].

The architect shall define naming conventions reflecting the ubiquitous language. Do not use technical designations related to ROS-2 internals.

The architecture of the application is implemented in the source code. A programmer is also a software designer and architect [5]. Every programming act is also a design act. It can be a good or bad design, but it is a design act.

A similar approach is used to identify application-specific entities. Current examples are alarms.

You could use the same approach if you need session or transaction identifiers.

Asynchronous applications seldom need this kind of identifier. The essence of event-based communication is fire and forget.

Factory and Repository

Embedded devices are often statically configured to avoid memory allocation problems. Therefore, we do need to implement any repository to retrieve and construct objects.

Factories are implemented in the code using factory patterns. Most often, a regular factory method is enough to create an aggregate.

Configuration parameters are currently the only identified configurable values. ROS-2 provides the parameter server as a standardized approach to configure, store, and retrieve configuration values.

A parameter server is a shared, multi-variable dictionary that is accessible via network APIs. Nodes use this server to store and retrieve parameters at runtime.

As it is not designed for high-performance, it is best used for static, non-binary data such as configuration parameters. It is meant to be globally viewable so that tools can easily inspect the configuration state of the system and modify if necessary.

Avoid changing node parameter values during runtime. Devices should be configured when starting-up. ROS nodes should get their required parameter values when they are launched.

Value Objects

All messages sent to nodes are value-immutable objects [3]. No entities can ever be sent as part of a message. You can send the external identifier of an entity as a field in the message.

Good Practices

ROS-2 Senior Expert: A senior expert aware of the architecture tradeoffs of the framework and good practices must be available to guide architectural decisions. Expertise of object-oriented models, domain-driven design, UML and C4 Model approaches are required. This person shall have a good grasp of the programming languages used to implement the application. Modern C++ and Python technology stacks are the ones used in ROS-2 framework [6, 7, 8, 9] ^[11]. Being in an architecture role, he needs an agile software architecture training and a reasonable understanding of domain-driven design method.
Favor C++: C++ provides tremendously better performance for heavy lifting algorithms. Static typed code is often easier to maintain. Prefer modern C++ such as C++ 23. Errors are often caught at compile time.

If you are not constrained by performance or maintainability requirements, feel free to use Python.
Technical Excellence: Technical excellence as one of twelve 12 Agile Manifesto Principles of the Agile Manifesto. Your development team shall pursue technical excellence in all used technology. Static and dynamic checkers help to measure progress.

Developers shall also be trained as designers. They should know how to document their domain-driven design solution. They should understand how to describe conceptual ideas with C4 Model, UML and write decisions with Architecture Design Records ADR.
Living documentation: The documentation shall be published as living documentation accessible and searchable to all interested parties. The team should integrate documentation generation as part of the continuous delivery pipeline. The architecture method is domain-driven design, being the industrial standard approach for software design. The structure and artifacts are based on the arc42 approach and associated templates.+

If the documentation is not published daily automatically through a continuous delivery pipeline, do not be surprised if nobody cares about it. The team will not update or use the documentation artifacts because they have no advantage if they explore it.
Finite State Machine: Finite state machines are documented as UML statecharts [10]. You should avoid composite states. Hierarchical finite state machines are fine but should be implemented with the help of a statechart library. Hardcoded solutions are error-prone due to history states and parallel execution of composite states. ROS-2 provides two such libraries.

Flat Finite State Machine can directly be implemented in code as a double nested switch statements.
Asynchronous Communication: ROS-2 nodes are independent execution units. The framework will allocate operating system threads based on the overall configuration and the pending requests. If your architecture follows the ROS-2 recommendations and favors message-passing communication, you would avoid most realtime problems.

Nodes as actors, message-passing communication, and finite state machine is a well-documented approach to implement communication and distributed solutions.

Finite state machines and statecharts are part of the UML notation [8] [9].

Finite state machines should be deterministic. The set of all relevant events must be identified and documented.

At most, one transition can be fired when a specific event is processed. Use guard conditions to restrict which transition could be fired when multiple efferent transitions have the same triggering event. A guard is a predicate used to decide if a transition can be fired. If your transition guards are using timeouts, you need a global time reference with the expected resolution.

A state in a finite state machine must have a duration. The actor implementing the finite state machine is in a state and waits for an input meaning a message, or a timeout event.

A transition has a trigger event, an optional guard, and an optional action. The guard is a predicate to decide if the transition can be fired or not. A transition has semantically no duration. The action of a transition must be processed in a brief time. Try to limit the parameter list of an action to the owner of the finite state machine and the triggering event.

Entry and exit actions are executed when the state is entered or exited.

State activities are active as long as the state is active. Try to avoid activities in control systems. An activity needs to be implemented as a separate thread and must be abruptly stopped when leaving the state.

References

[1] E. Evans, Domain-driven design. Addison-Wesley, 2004 [Online]. Available: https://www.amazon.com/dp/0321125215

[2] V. Vernon, Implementing Domain driven Design. Addison-Wesley Professional, 2012 [Online]. Available: https://www.amazon.com/dp/B00BCLEBN8

[3] V. Vernon, Domain-Driven Design Distilled. Addison-Wesley Professional, 2016 [Online]. Available: https://www.amazon.com/dp/B01JJSGE5S/

[4] N. Ford, R. Parsons, and P. Kua, Building Evolutionary Architectures: Automated Software Governance, Second. O’Reilly Media, 2023 [Online]. Available: https://www.amazon.com/dp/B0BN4T1P27

[5] V. Vernon, Reactive Messaging Patterns with the Actor Model : Applications and Integration in Scala and Akka. Addison-Wesley Professional, 2015 [Online]. Available: https://www.amazon.com/dp/B011S8YC5G

[6] B. Stroustrup, Tour of C++, Third. Pearson Education, Limited Edition, 2021, p. 320 [Online]. Available: https://www.amazon.com/dp/B0B8S35JWV

[7] J. Davidson and K. Gregory, Beautiful C++. Pearson Education, Limited, 2021978-0136875673, p. 352 [Online]. Available: https://www.amazon.com/dp/B09HTH1X38

[8] S. Meyers, Effective Modern C: 42 Specific Ways to Improve Your Use of C11 and C++14. O’Reilly Media, p. 334 [Online]. Available: https://www.amazon.com/dp/B00PGCMGDQ

[9] R. Grimm, C++ Core Guidelines. Pearson Education, Limited, 2021 [Online]. Available: https://www.amazon.com/dp/B09NSGQ4JK

[10] M. Fowler, UML Distilled, Third. Addison-Wesley Professional, 2003 [Online]. Available: https://www.amazon.com/dp/B07H4WN84Z

1. I strongly recommend using English for the ubiquitous language. Digital product development is an international activity. It is worth the effort to document in a language all engineers understand.

2. arc42 documents the ubiquitous language term in chapter 13.

3. arc42 defines explicit the cross-cutting concepts as chapter 8 of their architecture documentation structure.

4. When the cross-cutting aspect is well understood, you could provide a reference implementation as a library.

5. The UML component diagram and deployment diagram contain the same items with the same identifier.

6. arc42 documents the component diagrams of a system in chapter 5 and the deployment diagrams in chapter 7.

7. This should be the primary interface of any domain because ROS-2 is a message-passing middleware.

8. plantUML supports all these UML diagram types. Define the diagrams and add them to the arc42 documentation. Link them to the detailed API specification documented with Doxygen.

9. C4 Model provides the structure and guidance how to create such views. The approach is similar to the old RUP approach with the 4 +1 Architectural View Model.

10. As stated in ROS-2 documentation, it is complex and error-prone to avoid synchronization troubles such as deadlocks, lifelocks, priority inversion, or starvation when writing multithreaded nodes. Further information can be found in the ROS-2 documentation about calling groups.

11. Java is partially supported. Client libraries to access communication functions are available.