34.3 Large Data Fragmentation

There are two types of fragmentation: IP-level fragmentation and DDS-level fragmentation.

IP-level fragmentation occurs when the payload provided from the transport layer (typically UDP or TCP) exceeds the maximum payload size that fits in a link frame (also known as the link maximum transmission unit, or link MTU). If the network is an Ethernet network, then the link MTU is the maximum size of an Ethernet frame. When the receiver NIC gets IP fragments, it stores them in a buffer until all the fragments are received and can be reassembled to form UDP datagrams or TCP segments. When all the fragments are received, the reassembly is performed and the message is provided to the application layer.

If you try to send a DDS sample whose size is bigger than the MTU and you have not set up DDS-level fragmentation, you will see IP-level fragmentation. IP-level fragmentation is known to be fragile and can lead to communication issues if your system is not configured properly. For example, when your application relies on the transport to fragment the data and one fragment is lost, then all of the fragments need to be resent to repair the missing fragment—whereas if you use Reliable reliability (see 47.21 RELIABILITY QosPolicy), Connext can repair a single lost DDS fragment.

The following diagrams show the differences between IP-level fragmentation and DDS-level fragmentation. RTPS, UDP, and IP headers are not shown in the diagrams, for simplification purposes.

 

The main advantages of letting DDS do the fragmentation instead of letting the IP layer do it are as follows:

  • IP packets containing DATA_FRAG messages (DDS fragments) are automatically provided from the NIC’s buffer to the DDS application without having to wait for reassembly. This helps prevent overflow of the NIC’s buffer due to many fragments.
  • The middleware handles fragmentation and reassembly of fragments. As a result, when using the Reliable 47.21 RELIABILITY QosPolicy, if an IP packet containing a DATA_FRAG is not received, Connext's reliable protocol will try to repair the missing DATA_FRAG instead of the entire DDS packet. This may help reduce network traffic in scenarios with reliable communication. It is highly recommended to use Reliable reliability in combination with fragmentation; otherwise a single lost fragment will cause the entire sample to be dropped, leading to excessive sample losses.

The main cost of using DDS-level fragmentation is that having Connext handle fragmentation may introduce a performance degradation compared to an ideal case where there are no IP-level fragmentation issues. However, if there are IP-level fragmentation issues in your system, DDS-level fragmentation is a good way to avoid them. There are many different types of IP-level fragmentation issues, including, but not limited to, mismatched MTU sizes across your network path, OS-specific implementation limitations, and hardware that simply does not allow IP fragment forwarding.

Note: Batching does not currently support DDS-level fragmentation (also known as RTPS fragmentation). If you use batching, you will currently not be able to take advantage of Connext-level fragmentation. This means that your batch size has to be set to a value smaller than the minimum transport MTU across all the installed Connext transports. (You configure the MTU by setting message_size_max in the transport properties. See the next section, 34.3.1 Avoiding IP-Level Fragmentation.)

You can configure the batch size for user data using either the max_data_bytes or max_samples QoS values in the 47.2 BATCH QosPolicy (DDS Extension). In either case, you need to take into account that there is some overhead of metadata per sample in a batch that can be as big as 120 bytes per sample depending on what DDS features you use. A common value when using keyed topics is 40 bytes of metadata per sample and 12 for unkeyed topics.

34.3.1 Avoiding IP-Level Fragmentation

IP-level fragmentation can be avoided if the DDS payload (plus UDP headers) size is shorter than the Ethernet MTU. The most common Ethernet MTU size is 1500 bytes (although this size should not be assumed, since there are many cases in which it is set to a value other than 1500). The maximum UDP payload that fits on a 1500-byte Ethernet MTU is 1472 bytes. This is because, out of the 1500 bytes in the Ethernet MTU, 20 bytes are used by the IP header and 8 more by the UDP header. You can easily know the size of your NIC’s MTU in Linux systems with the following command:

> ifconfig

In Windows systems, the MTU for your NICs is shown by this command:

> netsh interface ipv4 show subinterface

Connext provides a property, message_size_max, to set the maximum size of an RTPS packet. See 51.6 Setting Builtin Transport Properties with the PropertyQosPolicy for information on how to set transport properties. Samples that have a serialized size larger than the message_size_max will be fragmented by DDS. Therefore, setting this property to a value less than or equal to the maximum UDP payload that fits in the Ethernet MTU (that is, smaller than 1472 bytes in the common case) makes DDS fragment the data packets so that each RTPS message can fit in a single Ethernet frame. These DDS fragments are referred to as DATA_FRAG messages.

Note: MTU sizes are not necessarily uniform across an entire network path from source to destination. In these cases, it is important to understand the MTU sizes throughout your network and to set the DDS message_size_max to a value smaller than the smallest payload that fits in the MTU size in your network. TCP avoids IP-level fragmentation and automatically detects MTU sizes across a network path through a process called Path MTU Discovery. If you’re using UDP, then it is currently up to you to know and understand the MTU sizes in your network if you want to avoid IP-level fragmentation.

A more granular configuration of DDS-level fragment management can be controlled with properties such as max_fragments_per_sample (see 48.2 DATA_READER_RESOURCE_LIMITS QosPolicy (DDS Extension)).

While Connext supports unbounded types and data fragmentation, there are practical serialization limits for any given sample. These limits are described in 17.10 Data Sample Serialization Limits.

Note: Features that are targeted at applications that handle large data, like the FlatData language binding and Zero Copy over shared memory features (see Chapter 34 Sending Large Data), have no effect on how data is fragmented by DDS.

Connext provides a builtin-in XML snippet that can be used to configure the middleware to avoid IP fragmentation. The snippet name is Transport.UDP.AvoidIPFragmentation, and you can use it as follows:

<qos_profile name="AvoidIPFragmentation">
    <base_name>
        <element>Transport.UDP.AvoidIPFragmentation</element>
    </base_name>
</qos_profile>

The snippet does two things:

The following XML shows what the Transport.UDP.AvoidIPFragmentation snippet sets, for reference:

<qos_profile name="Transport.UDP.AvoidIPFragmentation">
     <domain_participant_qos>
        <discovery_config>
            <publication_writer_publish_mode>
                <kind>ASYNCHRONOUS_PUBLISH_MODE_QOS</kind>
            </publication_writer_publish_mode>
            <subscription_writer_publish_mode>
                <kind>ASYNCHRONOUS_PUBLISH_MODE_QOS</kind>
            </subscription_writer_publish_mode>
            <secure_volatile_writer_publish_mode>
                <kind>ASYNCHRONOUS_PUBLISH_MODE_QOS</kind>
            </secure_volatile_writer_publish_mode>
            <service_request_writer_publish_mode>
                <kind>ASYNCHRONOUS_PUBLISH_MODE_QOS</kind>
            </service_request_writer_publish_mode>
        </discovery_config>
        <transport_builtin>
            <udpv4>
                <message_size_max>1400</message_size_max>
            </udpv4>
            <udpv6>
                <message_size_max>1400</message_size_max>
            </udpv6>
            <udpv4_wan>
                <message_size_max>1400</message_size_max>
            </udpv4_wan>
        </transport_builtin>
    </domain_participant_qos>  
 
    <datawriter_qos>
        <publish_mode>
            <kind>ASYNCHRONOUS_PUBLISH_MODE_QOS</kind>
        </publish_mode>
    </datawriter_qos>
</qos_profile>

(See 50.2.3.3 QoS Profile Composition for more information on QoS snippets in XML files.)

34.3.2 Reliable Reliability

If you use Best Effort reliability (see 47.21 RELIABILITY QosPolicy), the application is not going to try to recover any lost DDS-level fragments, so if any fragments are lost, the DataReader will discard the entire sample. Depending on its size, the sample could have a lot of fragments, in which case the DataReader is more likely to lose a fragment (and therefore, the entire sample). By using Reliable 47.21 RELIABILITY QosPolicy, if a fragment is lost, Connext will try to recover it. This is why it's usually recommended to use Reliable reliability if you are using DDS-level fragmentation.

For more information, see the 47.21 RELIABILITY QosPolicy.

34.3.3 Asynchronous Publishing

DDS-level fragmentation requires asynchronous publication if you are using Reliable 47.21 RELIABILITY QosPolicy. Sending reliable samples larger than the transport's message_size_max requires asynchronous publication so that the fragmentation process can take place outside of the context of the thread that wrote the sample.

If you're using Best Effort reliability, samples larger than the message_size_max will be fragmented; however, this configuration (Best Effort, plus fragmentation) is not recommended because you're more likely to drop samples. The error "COMMENDSrWriterService_on_Submessage:!write resend. Reliable large data requires asynchronous write" comes from having a serialized sample that is greater than the transport's message_size_max while the 47.21 RELIABILITY QosPolicy is set to RELIABLE_RELIABILITY_QOS without asynchronous publishing being enabled.

To fragment DDS packets while using Reliable reliability, set kind in the 47.20 PUBLISH_MODE QosPolicy (DDS Extension) to ASYNCHRONOUS_PUBLISH_MODE_QOS. With these settings, Connext will use a separate thread to send the fragments. This will relieve your application thread from doing the fragmentation and sending work. For more information about the asynchronous publisher, see 46.1 ASYNCHRONOUS_PUBLISHER QosPolicy (DDS Extension).

It may also be necessary to set the builtin PublicationBuiltinTopicData and SubscriptionBuiltinTopicData DataWriters’ publish mode to be asynchronous. This is done through the 44.3 DISCOVERY_CONFIG QosPolicy (DDS Extension) (see details in 34.3.5 Example). The most common cause of a large PublicationBuiltinTopicData or SubscriptionBuiltinTopicData sample is the serialized TypeCode or TypeObject, but you may also be sending a lot of properties (via the 47.19 PROPERTY QosPolicy (DDS Extension) ) or have a large ContentFilteredTopic filter expression, among other variably sized fields, which could be leading to larger sample sizes. It may also be the case that the samples are not particularly large, but if you have set the message_size_max to be a small value to force DDS-level fragmentation, the samples sent by the builtin DataWriters may exceed this size and require fragmentation.

For more information on TypeObjects, see the following:

34.3.4 Flow Controllers

The asynchronous publish mode requires a FlowController. If no FlowController is defined, the default FlowController will be used. With the default FlowController, the DATA_FRAGs will be written as fast as the DataWriter can write them, which might overload the network or the DataReaders. See 34.4 FlowControllers (DDS Extension).

An example on how to set the DataWriter to be asynchronous is shown below.

34.3.5 Example

The following example shows the QoS settings that do the following:

  • Set the DataWriter to be asynchronous.
  • Set the builtin DataWriters to be asynchronous.
  • Enable Reliable 47.21 RELIABILITY QosPolicy on the DataWriter and DataReader. DataWriters are configured as reliable by default, so this is technically not required. DataReaders are configured for best effort communication by default, so enabling reliability on the DataReader is a required step in order for the DataWriter and DataReader to communicate reliably with each other. See 47.21 RELIABILITY QosPolicy.
  • Disable the shared memory transport (since our discussion thus far has focused on IP transports and the relationship between IP-layer fragmentation and DDS-layer fragmentation, not shared memory fragmentation).
  • Set the maximum payload size for RTPS packets by configuring message_size_max.
<!-- Set the DataWriter to be asynchronous and reliable -->
<datawriter_qos>
    <publish_mode>
        <kind>ASYNCHRONOUS_PUBLISH_MODE_QOS</kind>
        <flow_controller_name>DEFAULT_FLOW_CONTROLLER_NAME</flow_controller_name>
    </publish_mode>
    <reliability>
        <kind>RELIABLE_RELIABILITY_QOS</kind>
    </reliability>
</datawriter_qos>
 
<!-- Set the DataReader to be reliable -->
<datareader_qos>
    <reliability>
        <kind>RELIABLE_RELIABILITY_QOS</kind>
    </reliability>
</datareader_qos>
<domain_participant_qos>
    <transport_builtin>
        <mask>UDPv4</mask>
    </transport_builtin>
 
<!-- Set the builtin DataWriters to be asynchronous if the TypeCode/TypeObject 
or other configuration parameters are larger than the MTU -->
<discovery_config>
    <publication_writer_publish_mode>
        <kind>ASYNCHRONOUS_PUBLISH_MODE_QOS</kind>
    </publication_writer_publish_mode>
    <subscription_writer_publish_mode>
        <kind>ASYNCHRONOUS_PUBLISH_MODE_QOS</kind>
    </subscription_writer_publish_mode>
</discovery_config>
 
<!-- Set this property to something lower than the MTU. 
For this example, the MTU is 1500 bytes -->
<property>
    <value>
        <element>
            <name>dds.transport.UDPv4.builtin.parent.message_size_max</name>
            <value>1450</value>
        </element>
    </value>
</property>
</domain_participant_qos>

34.3.6 Fragmentation Statistics

You can monitor fragmented (DATA_FRAG) messages via the 31.6.3 DATA_WRITER_PROTOCOL_STATUS and 40.7.3 DATA_READER_PROTOCOL_STATUS, which are also visible through RTI Monitor (see Chapter 59 RTI Monitoring Library).