Programming Distributed Systems with YAMI4
7.2 Message Broker
In one of the previous chapters devoted to common concepts of messaging with YAMI4, a lot of stress was put in defining YAMI4 as a peer-to-peer messaging system, where direct connections are considered to be a foundation for all communication scenarios. One of the advantages of YAMI4 is its simplicity and the ability to set up communication channels without relying on additional infrastructure components like message brokers. The ability to run without separate message brokers is further underlined by the fact that the high-level YAMI4 libraries offer a basic implementation of publish-subscribe messaging, where data producers can be to a large extent isolated from the administrative difficulties of managing the set of data subscribers.
Still, even though a separate message broker is not an obligatory part of a distributed system built with YAMI4, there are several valid reasons to consider such a component:
- Load management - In many distributed systems, especially related to control and monitoring, it is frequently the case that data producers are embedded devices of small computational capacity. Even though it might be possible to run a fully functional server on such devices with publish-subscribe data flow, the hardware limitations can become visible when the number of data subscribers becomes very large, as the server software will have the responsibility of generating large outgoing message traffic. For this reason it makes sense to move the heavy message processing part out of the actual data producer to some more capable hardware that will have enough capacity to serve large number of subscribers. In other words, the message broker can help to reduce the load on the data producer by decoupling data source from message delivery services.
- Data flow concentration - In larger distributed systems with many data producers the total number of communication channels can quickly grow as new client programs are added to the system. In fact, the worst case scenario leads to the NxM communication patterns, which are difficult to diagnose and maintain. A separate message broker can help to concentrate data flow into a smaller number of logical channels and therefore to reduce the total number of network connections in the system.
- Hot-swap of critical components - In direct peer-to-peer communication a terminating process will cause the channel to close and thus interrupt the message flow. In some cases the reliability and robustness of the whole system can be greatly improved by introducing the concept of "hot-swap", where individual components can be added, removed or replaced without affecting the whole. A separate message broker can play the role of a "message bus" to which other nodes are connected - in such a configuration a failure of any individual component does not need to be visible to other components and can contribute to the resilience of the whole distributed system. This, of course, will put a stress on high reliability of the broker itself, but if the broker is considered to be a standard service, it need not be updated as frequently as other, more "logic-oriented" components.
In order to properly address the above usage scenarios, the message broker needs to implement at least two basic commands:
- "Publish" - This command is supposed to be executed by the data producer whenever new data value is ready. The new value is sent to the broker together with some meta-information attached to it to facilitate message routing to all interested parties.
- "Subscribe" - This command is executed by a component that is interested in receiving data values based on some condition.
A message broker that implements the above basic operations allows to set up a data flow from producers (publishers) to subscribers.
In the most trivial case the message broker is fully transparent and passes all incoming messages to all subscribers. This is a perfectly valid way of using the broker, but in a typical system there are well defined rules for message routing, as not all subscribers need to get all data from all publishers. Message brokers differ widely in their supported routing schemes and the YAMI4 broker offers a routing scheme that can be described as multi-way hierarchic tag matching.
Another important feature of the broker is its ability to specify expected behaviour in case of overflow. In a badly dimensioned system publishers can produce new message faster than the subscribers can consume them - clearly it is not possible for the broker to deliver them all, if there is a mismatch between processing speeds on producer and consumer sides. In order to solve this problem the message broker allows to specify the intended behaviour for the case of queue overflow - this is explained in further subsection on configuration parameters.
Last but not least, the YAMI4 message broker was intentionally built with small resource consumption in mind, as this approach allows both lean deployments in constrained environments as well as customized clustered installations.
Further subsections describe message routing scheme in YAMI4 broker, its messaging "API", the strategies for clustered installations and configuration parameters.
7.2.1 Message Routing Based On Tag Matching
7.2.2 Clustered Installations And Forwarding
7.2.3 Publish
7.2.4 Confirmed Publish
7.2.5 Subscribe
7.2.6 Get Statistics
7.2.7 Startup and Configuration