High Availability with RTI Connext

4 posts / 0 new
Last post
Offline
Last seen: 23 hours 55 min ago
Joined: 06/12/2022
Posts: 3
High Availability with RTI Connext

I am working for a company which is planning to use RTI Connext for its message queue needs. I have started exploring what all RTI Connext has to offer and had some questions.

As there is no broker used in DDS, the data samples are directly sent from the publishing node to the subscribing node through the network stack. Is my understanding here correct?

What all techniques can be used to provide high availability at the publishing and subscribing nodes? Is there any method similar to clustering of nodes that can be used? If yes, please share the link to that part of documentation.

Can running multiple instances of DDS application over different nodes somehow provide high availability? If so, how exactly will that work? How will the application instances communicate with each other?

Howard's picture
Offline
Last seen: 4 hours 2 min ago
Joined: 11/29/2012
Posts: 381

Hi,

So, the topic of how DDS can help you build a distributed system that has high availability requirements is something that is better discussed in a meeting where questions and information can be dynamically exchanged...I assume that your project is in contact with an RTI account team who can setup such a meeting to help you efficiently evaluate the pros and cons of using DDS.

If you are not in contact with an RTI account team, if you can let me know where you're located, I can put them in touch with you.

For your specific questions, yes, fundamentally DDS operation does not use or require a broker.  Data is sent directly from applications that publish data to applications that subscribe to data.

So...here's where a meeting is more effective than the forum...what do you mean by "high availability"?  What are your requirements?  Use case?  How do you want to implement high availability?  (+ lots of other questions and followup questions)

Without knowing more about your answers to the question above, the comments that I can provide may or may not be relevant to what you want to do...

Generally, a system that can be characterized as "highly available" fundamentally must be able to continue operation through the failure of any single component...whether that be an application, a node, or some part of a network itself.  And the way that the "failure" can be handled, depends greatly on the requirements of the system...how long can the system take to adjust to a failure, what failures must the system be resilient to? Etc.

The most common approach to providing High Availability in a system is to have redundant components: redundant sensors, actuators, processing elements (applications, computers, networks).

The very nature of the communication design pattern provided by DDS, publish/subscribe, already supports HA requirements.  The data sent by an application can be received by any number of subscribing applications without any apriori constraints.  Likewise, the data received by an application can be sent by any number of publishing applications.

So, to have redundant applications that publish or subscribe to data streams is as simple as running multiple instances of that application...on the same or multiple hosts.  The user application does not need to specify to which other applications data must be send or from which received.  DDS will automatically and dynamically ensure that the data goes to where it needs to go...even if applications go down (crashes) or comes back up (starts, restarts).

There are a number of specific features in DDS that support building systems with redundancy

  • Ability to use multiple and different transport networks simultaneously.  Data can be sent over multiple networks that can be built from different physical technologies such as 5G and Ethernet and Fibre channel
  • Ability to specify which data source (aka DataWriter) is the "owner" of the data stream such that all DataReaders (of the same Topic) will receive data from the DataWriter with the highest ownership strength until the DataWriter with the highest strength "stops working"....which can be that its application crashed or has otherwise been disconnected from the system...then at that instant, data from the next highest strength DataWriter will be received.  Please look into the DDS OWNERSHIP QoS Policy and the associated OWNERSHIP_STRENGTH QoS Policy
  • For systems in which HA is supported by a system service that monitors the health of applications and then can kill, restart or move applications to a different node (there are commercial and open source HA frameworks...and also can be found in "orchestration" container frameworks such as Docker/Kubernetes), DDS offers the DURABILITY QoS policy and the associated Persistence Service that can store and automatically forward data that was sent when a user application was "down" to the application as soon as it is restarted

There are a variety of other DDS features that are useful to build HA systems...but it would be more efficient to explore them in a setting where questions could be asked and answered in face to face setting.

Offline
Last seen: 23 hours 55 min ago
Joined: 06/12/2022
Posts: 3

Hey Howard. Thanks for the comments. They were helpful.

It would be great if you could get me in touch with a RTI accounts team. A meeting would be far more beneficial to discuss the requirements and their solutions. We are located in Banglore, India. Do you need my contact info?

Howard's picture
Offline
Last seen: 4 hours 2 min ago
Joined: 11/29/2012
Posts: 381

Please send your info to my email address "howard@rti.com", and I'll get the correct folks to contact you.