Configuring DDS to mitigate performance degradation due to packet loss

If you are using Reliable communication, you could see a performance degradation due to packet loss and resending data sample fragments. This is more probable when using data bigger than 1500 Bytes (over Ethernet).

In general, a DDS application using a transport such as UDPv4 splits data samples into submessages so they can be transferred over the network. Those submessages can be also splitted by the IP stack as well. You can see the details on how DDS data samples are fragmented in this article in our blog.

If your network is lossy (due to congestion, noise, etc) and (a lot of) data fragments are lost,  DDS will need to repair them. Moreover, if you are sending large data, this can be more dramatic since the live data stream and the reliability one (repairing lost samples) will compete for the same socket receive buffer and can create even more message drop.

There are some considerations to reduce and mitigate the effect of fragmentation in your DDS communication:

  1. Use strongly typed data instead of loosely typed data. If you are using XML or JSON-like types (not strongly typed ones), you will be sending more data per sample since you will be sending tags to characterize your information. Defining your type’s fields, will enable DDS to optimize the communication by sending a binary representation of the data.

  2. Configure DDS to send packets smaller than the MTU packet size: See this KB article on how to configure Connext DDS to split data samples in MTU size packets.

  3. Configure batching. You could modify max_data_bytes and max_samples in the BatchQosPolicy to constrain the batch size within the MTU size (see here for the current release or here for release 5.0.0). Take into account that max_data_bytes does not count the metadata per sample (up to 52 bytes for keyed topics and 20 for unkeyed topics - see the System Resource Considerations in the BatchQosPolicy section of the User's Manual, here for the current release or here for release 5.0.0). In a latency-constraint scenario, you could disable it completely.

  4. Large data

    1. Use the Asynchronous Publisher QoS (described here for the current release or here for release 5.2.3). This will create an additional thread on charge of not only sending new data, but also repairing lost samples in reliable communication. Its behavior is governed by a FlowController (described here for the current release or here for release 5.2.3), which prevents bursts from overwhelming the network. See an example of how to use an asynchronous publisher here.

    2. If possible, configure MTU to use Jumbo packets. This will setup the MTU to 9000 and will improve performance. Make sure the routers along the way are able to handle that MTU size.

    3. Choose the correct Reliability for your application. Strict Reliability (using KEEP_ALL History QoS) may not be necessary for most cases and could cause unresponsive or non-progressing DataReaders to block your communication. You could change the history to be KEEP_LAST with Reliable reliability or configure DDS to mark non-progressing DataReaders as inactive (described here for the current release or here for release 5.2.3).

    4. Configure the memory used for fragmented samples on the reader side using the DataReaderResourceLimits QosPolicy's max_fragments_per_sample, max_fragmented_samples, and max_fragmented_samples_per_remote_writer. You could also control the memory usage on the writer side when sending large, variable length data by using the property pool_buffer_max_size (described here for the current release or here for release 5.2.3).

Even when there are multiple layers “chopping” your samples, you can use DDS to control this and reduce the performance effect of lost fragments.