1.1.2.1. Batching¶

The Batch QoS Policy can be used to decrease the amount of communication overhead associated with the transmission and (in the case of reliable communication) acknowledgment of small DDS samples, in order to increase throughput. It does this by collecting multiple user data DDS samples to be sent in a single network packet, to take advantage of the efficiency of sending larger packets.

Batching increases effective throughput dramatically for small data DDS samples. Throughput for small DDS samples is typically limited by CPU capacity and not by network bandwidth. Batching many smaller DDS samples to be sent in a single large packet increases network utilization and thus throughput in terms of DDS samples per second.

For more information, see BATCH QosPolicy, in the RTI Connext DDS Core Libraries User’s Manual.

The tests below show how powerful the use of batching can be. In these tests, we compare the RTI Perftest default configuration for throughput, which enables batching (with a batch size of 8192 bytes), and a configuration where batching is disabled.

Note

We choose 8192 bytes for the batch size because it is the closest power of two below the maximum MTU that can be configured for a regular NIC, which is 9000 bytes. This way the RTPS packet doesn’t need to be fragmented.

This configuration, by itself, cannot ensure that the packet will not be fragmented, since if the MTU of the NIC is set to a lower value (like the default value in many NICs, of 1500 bytes), then the packet will have fragmentation.

Note

Since some samples are delayed until the batch is ready, you may observe increased latency. The maximum delay, as well as maximum bytes or maximum number of samples, can be configured. See BATCH QosPolicy, in the RTI Connext DDS Core Libraries User’s Manual.

Unkeyed, UDPv4 10Gbps Network, C++98¶

The graph below shows the expected throughput behavior when performing a 1-1 communication between two Linux nodes in a 10Gbps network with and without batching (with a batch size of 8192 bytes). The numbers are for best-effort as well as strict reliable reliability scenarios.

Detailed Statistics

This table contains the raw numbers presented by RTI Perftest. These numbers are the exact output with no further processing.

Best Effort, Batching

Sample Size (Bytes)	Total Samples	Avg Samples/s	Avg Mbps	Lost Samples	Lost Samples (%)
32	99848960	5033936	1288.7	151040	0.15
64	99973760	4668791	2390.4	26240	0.03
128	100000000	4111291	4210.0	0	0.00
256	100000000	3340956	6842.3	0	0.00
512	69812480	2335490	9575.2	0	0.00
1024	35708504	1190222	9750.3	0	0.00
8192	4508508	150280	9848.8	0	0.00
63000	590273	19675	9916.5	2	0.00

Best Effort, No Batching

Sample Size (Bytes)	Total Samples	Avg Samples/s	Avg Mbps	Lost Samples
32	11733830	390842	100.1	0
64	11811834	393434	201.4	0
128	11609904	386713	396.0	0
256	11662605	388464	795.6	0
512	11542759	384478	1574.8	0
1024	11349140	378025	3096.8	0
2048	10952612	364820	5977.2	0
4096	8886128	296075	9701.8	0
8192	4508641	150280	9848.8	0
16384	2263520	75448	9889.2	71
32768	1134080	37801	9909.5	17
63000	590282	19675	9916.5	9

Reliable, Batching

Sample Size (Bytes)	Total Samples	Avg Samples/s	Avg Mbps
32	100000000	4776439	1222.8
64	100000000	4333897	2219.0
128	100000000	3470061	3553.3
256	82417984	2744986	5621.7
512	56613633	1885618	7723.5
1024	35265288	1174554	9621.9
8192	4506938	150221	9844.9
63000	590289	19674	9916.0
100000	371720	12388	9911.1
500000	68000	2265	9060.3
1048576	29076	968	8124.9
1548576	22044	734	9098.6
4194304	7793	259	8702.6
10485760	2911	96	8129.6

Reliable, No Batching

Sample Size (Bytes)	Total Samples	Avg Samples/s	Avg Mbps
32	7625956	253971	65.0
64	7654812	254939	130.5
128	7482362	249196	255.2
256	7588068	252715	517.6
512	7457710	248365	1017.3
1024	7428494	247380	2026.5
2048	7216087	240299	3937.1
4096	6731026	224156	7345.2
8192	4506759	150221	9844.9
16384	2263121	75433	9887.3
32768	1134019	37798	9908.6
63000	590277	19674	9916.0

Perftest Scripts

To produce these tests, we executed RTI Perftest for C++98. The exact commands used can be found here:

Publisher Side

sudo /set_thr_mode.sh

echo EXECUTABLE IS $1
export executable=$1

echo OUTPUT PATH IS $2
export output_folder=$2

export exec_time=30
export nic=172.16.0.1
export pub_string="-pub \
        -transport UDPv4 \
        -nic $nic \
        -noPrint \
        -batchSize 0\
        -noOutputHeaders \
        -exec $exec_time \
        -noXML"

mkdir -p $output_folder

echo ">> UNKEYED BE N Batching"
export my_file=$output_folder/thr_udpv4_pub_unkeyed_be_noBatch.csv
touch $my_file
export extra_args=""
for index in 1 2 3; do
    for DATALEN in 32 64 128 256 512 1024 2048 4096 8192 16384 32768 63000; do
        export command="taskset -c 0 \
        $executable -best $pub_string -datalen $DATALEN $extra_args -batchSize 0"
        echo $command
        $command >> $my_file;
        sleep 3;
        export extra_args=" -noOutputHeaders "
    done
done
sleep 5;

echo ">> UNKEYED REL N Batching"
export my_file=$output_folder/thr_udpv4_pub_unkeyed_rel_noBatch.csv
touch $my_file
export extra_args=""
for index in 1 2 3; do
    for DATALEN in 32 64 128 256 512 1024 2048 4096 8192 16384 32768 63000; do
        export command="taskset -c 0 \
        $executable $pub_string -datalen $DATALEN $extra_args -batchSize 0"
        echo $command
        $command >> $my_file;
        sleep 3;
        export extra_args=" -noOutputHeaders "
    done
done
sleep 5;

Subscriber Side

sudo /set_thr_mode.sh

echo EXECUTABLE IS $1
export executable=$1

echo OUTPUT PATH IS $2
export output_folder=$2

export nic=172.16.0.2
export sub_string="-sub \
        -transport UDPv4 \
        -nic $nic \
        -noPrint \
        -noOutputHeaders \
        -noXML"

mkdir -p $output_folder

echo ">> UNKEYED BE"
export my_file=$output_folder/thr_udpv4_sub_unkeyed_be_noBatch.csv
touch $my_file
export extra_args=""
for index in 1 2 3; do
    for DATALEN in 32 64 128 256 512 1024 2048 4096 8192 16384 32768 63000; do
        export command="taskset -c 0 \
        $executable -best -datalen $DATALEN $sub_string $extra_args"
        echo $command ---- $index
        $command >> $my_file;
        sleep 10;
        export extra_args=" -noOutputHeaders "
    done
done
sleep 5;


echo ">> UNKEYED REL"
export my_file=$output_folder/thr_udpv4_sub_unkeyed_rel_noBatch.csv
touch $my_file
export extra_args=""
for index in 1 2 3; do
    for DATALEN in 32 64 128 256 512 1024 2048 4096 8192 16384 32768 63000; do
        export command="taskset -c 0 \
        $executable -datalen $DATALEN $sub_string $extra_args"
        echo $command ---- $index
        $command >> $my_file;
        sleep 10;
        export extra_args=" -noOutputHeaders "
    done
done
sleep 5;

Test Hardware

The following hardware was used to perform these tests:

Linux Nodes

Processor: Intel® Xeon® E-2186G 3.8GHz, 12M cache, 6C/12T, turbo (95W)
RAM: 16GB 2666MT/s DDR4 ECC UDIMM
NIC 1: Intel X550 Dual Port 10GbE BASE-T Adapter, PCIe Full Height
NIC 2: Intel Ethernet I350 Dual Port 1GbE BASE-T Adapter, PCIe Low Profile
OS: Ubuntu 18.04 -- gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0

Switch

Dell Networking S4048T-ON, 48x 10GBASE-T and 6x 40GbE QSFP+ ports, IO to PSU air, 2x AC PSU, OS9