Communication Models in Distributed Systems

3.1 Communication Models in Distributed Systems

Software applications are becoming increasingly distributed. A node in a distributed application must find the right data, know where to send it, and deliver it to the right place at the right time. Simplifying access to this data would enable a whole new class of distributed applications. The challenge, especially in embedded and real-time networks, is to quickly find and disseminate information to many nodes.

Three major communication paradigms have emerged to meet this need: client-server, message passing, and publish-subscribe.

Client-server is fundamentally a many-to-one design that works well for systems with centralized information, such as databases, transaction processing systems, and central file servers. However, if multiple nodes generate information, client-server architectures require that all the information be sent to the server for later redistribution to the clients, resulting in inefficient client-to-client communication. The central server is a potential bottleneck and single-point of failure. It also adds an unknown delay (and therefore indeterminism) to the system, because the receiving client does not know when it has a message waiting.

Message-passing architectures work by implementing queues of messages. Processes can create queues, send messages, and service messages that arrive. This extends the many-to-one client-server design to a more distributed topology. Message passing allows direct peer-to-peer connection; it is much easier to exchange information between many nodes in the system with a simple messaging design. However, the message-passing architecture does not support a data-centric model. Applications have to find data indirectly by targeting specific sources (e.g., by process ID or "channel" or queue name) on specific nodes. So, this architecture doesn't address how applications know where a process/channel is, what happens if that process/channel doesn't exist, etc. The application must determine where to get data, where to send it, and when to perform the transaction. In the message-passing architecture, there is a model of the means to transfer data but no real model of the data itself.

Publish-subscribe adds a data model to messaging. Publish-subscribe nodes simply "publish" information they have and "subscribe" to data they need. Messages logically pass directly between the communicating nodes. The fundamental communications model implies both discovery (i.e. what data should be sent) and delivery (i.e. when and where to send the data). This design mirrors time-critical information delivery systems in everyday life (e.g. television, radio, magazines and newspapers). Publish-subscribe systems are good at distributing large quantities of time-critical information quickly, even in the presence of unreliable delivery mechanisms.

Publish-subscribe architectures map well to the real-time communications challenge. Finding the right data is straight forward; nodes just declare their interest once and the system delivers it. Sending the data at the right time is also natural; publishers send data when the data is available. Publish-subscribe can be efficient because the data flows directly from source to destination without requiring intermediate servers. Multiple sources and destinations are easily defined within the model, making redundancy and fault tolerance natural. Finally, the intent declaration process provides an opportunity to specify per-data-stream Quality of Service (QoS), requirements. Properly implemented, publish-subscribe delivers the right data to the right place at the right time.

In summary, client-server middleware is best for centralized data designs and for systems that are naturally service oriented, such as file servers and transaction systems. Client-server middleware is not the best choice in systems that entail many, often-poorly-defined data paths. Message passing, with "send that there" semantics, map well to systems with clear, simple dataflow needs. Message passing middleware is better than client-server middleware at free-form data sharing, but still require the application to discover where data resides. Publish-subscribe, by providing both discovery and messaging, implements a data centric information distribution system. Nodes communicate simply by sending the data they have and asking for the data they need.