1.1.2.1. Batching
The Batch QoS Policy can be used to decrease the amount of communication overhead associated with the transmission and (in the case of reliable communication) acknowledgment of small DDS samples, in order to increase throughput. It does this by collecting multiple user data DDS samples to be sent in a single network packet, to take advantage of the efficiency of sending larger packets.
Batching increases effective throughput dramatically for small data DDS samples. Throughput for small DDS samples is typically limited by CPU capacity and not by network bandwidth. Batching many smaller DDS samples to be sent in a single large packet increases network utilization and thus throughput in terms of DDS samples per second.
For more information, see BATCH QosPolicy, in the RTI Connext DDS Core Libraries User’s Manual.
The tests below show how powerful the use of batching can be. In these tests, we compare the RTI Perftest default configuration for throughput, which enables batching (with a batch size of 8192 bytes), and a configuration where batching is disabled.
Note
We choose 8192 bytes for the batch size because it is the closest power of two below the maximum MTU that can be configured for a regular NIC, which is 9000 bytes. This way the RTPS packet doesn’t need to be fragmented.
This configuration, by itself, cannot ensure that the packet will not be fragmented, since if the MTU of the NIC is set to a lower value (like the default value in many NICs, of 1500 bytes), then the packet will have fragmentation.
Note
Since some samples are delayed until the batch is ready, you may observe increased latency. The maximum delay, as well as maximum bytes or maximum number of samples, can be configured. See BATCH QosPolicy, in the RTI Connext DDS Core Libraries User’s Manual.
Unkeyed, UDPv4 10Gbps Network, C++98
The graph below shows the expected throughput behavior when performing a 1-1 communication between two Linux nodes in a 10Gbps network with and without batching (with a batch size of 8192 bytes). The numbers are for best-effort as well as strict reliable reliability scenarios.
Detailed Statistics
This table contains the raw numbers presented by RTI Perftest. These numbers are the exact output with no further processing.
Best Effort, Batching
Sample Size (Bytes) |
Total Samples |
Avg Samples/s |
Avg Mbps |
Lost Samples |
Lost Samples (%) |
---|---|---|---|---|---|
32 |
98447835 |
4916862 |
1258.7 |
608768 |
0.61 |
64 |
91799845 |
4585163 |
2347.6 |
57216 |
0.06 |
128 |
80906785 |
4038802 |
4135.7 |
0 |
0.00 |
256 |
66036513 |
3298344 |
6755.0 |
19936 |
0.03 |
512 |
46765762 |
2336180 |
9569.0 |
0 |
0.00 |
1024 |
23812202 |
1190253 |
9750.6 |
0 |
0.00 |
2048 |
11949288 |
597398 |
9787.8 |
0 |
0.00 |
4096 |
4876518 |
243815 |
9889.3 |
1109040 |
18.53 |
8192 |
3005854 |
150283 |
9849.0 |
0 |
0.00 |
16384 |
1509117 |
75450 |
9889.5 |
79 |
0.01 |
32768 |
756061 |
37802 |
9909.8 |
16 |
0.00 |
63000 |
393526 |
19676 |
9916.8 |
8 |
0.00 |
Best Effort, No Batching
Sample Size (Bytes) |
Total Samples |
Avg Samples/s |
Avg Mbps |
Lost Samples |
Lost Samples (%) |
---|---|---|---|---|---|
32 |
7512988 |
375290 |
96.1 |
0 |
0.00 |
64 |
7501457 |
374714 |
191.9 |
0 |
0.00 |
128 |
7497866 |
374534 |
383.5 |
0 |
0.00 |
256 |
7154053 |
357364 |
781.9 |
0 |
0.00 |
512 |
7133771 |
356350 |
1559.6 |
0 |
0.00 |
1024 |
7044914 |
351913 |
3082.9 |
0 |
0.00 |
2048 |
6888612 |
344104 |
5937.8 |
0 |
0.00 |
4096 |
5927297 |
296076 |
9701.8 |
0 |
0.00 |
8192 |
3005854 |
150283 |
9849.0 |
0 |
0.00 |
16384 |
1509117 |
75450 |
9889.5 |
79 |
0.01 |
32768 |
756061 |
37802 |
9909.8 |
16 |
0.00 |
63000 |
393526 |
19676 |
9916.8 |
8 |
0.00 |
Reliable, Batching
Sample Size (Bytes) |
Total Samples |
Avg Samples/s |
Avg Mbps |
Lost Samples |
Lost Samples (%) |
---|---|---|---|---|---|
32 |
93771579 |
4686195 |
1199.7 |
0 |
0.00 |
64 |
84316173 |
4209912 |
2155.5 |
0 |
0.00 |
128 |
71793365 |
3584812 |
3670.8 |
0 |
0.00 |
256 |
54521333 |
2722685 |
5576.1 |
0 |
0.00 |
512 |
38241843 |
1909759 |
7822.4 |
0 |
0.00 |
1024 |
22874688 |
1142600 |
9360.2 |
0 |
0.00 |
2048 |
11953124 |
597062 |
9782.3 |
0 |
0.00 |
4096 |
5988600 |
299098 |
9800.9 |
0 |
0.00 |
8192 |
3004950 |
150226 |
9845.3 |
0 |
0.00 |
16384 |
1508850 |
75439 |
9888.0 |
0 |
0.00 |
32768 |
756049 |
37799 |
9908.9 |
0 |
0.00 |
63000 |
393534 |
19675 |
9916.4 |
0 |
0.00 |
100000 |
247815 |
12389 |
9911.5 |
0 |
0.00 |
500000 |
47611 |
2380 |
9523.5 |
0 |
0.00 |
1048576 |
16285 |
1012 |
9440.4 |
0 |
0.00 |
1548576 |
15096 |
755 |
9356.2 |
0 |
0.00 |
4194304 |
5598 |
280 |
9395.4 |
0 |
0.00 |
10485760 |
2055 |
103 |
8695.8 |
0 |
0.00 |
Reliable, No Batching
Sample Size (Bytes) |
Total Samples |
Avg Samples/s |
Avg Mbps |
Lost Samples |
Lost Samples (%) |
---|---|---|---|---|---|
32 |
4989963 |
249154 |
63.8 |
0 |
0.00 |
64 |
4938398 |
246654 |
126.3 |
0 |
0.00 |
128 |
4863735 |
242922 |
248.8 |
0 |
0.00 |
256 |
4892469 |
244359 |
500.4 |
0 |
0.00 |
512 |
4894170 |
244442 |
1001.2 |
0 |
0.00 |
1024 |
4876949 |
243583 |
1995.4 |
0 |
0.00 |
2048 |
4703617 |
234928 |
3849.1 |
0 |
0.00 |
4096 |
4450197 |
222272 |
7283.4 |
0 |
0.00 |
8192 |
3004950 |
150226 |
9845.3 |
0 |
0.00 |
16384 |
1508850 |
75439 |
9888.0 |
0 |
0.00 |
32768 |
756049 |
37799 |
9908.9 |
0 |
0.00 |
63000 |
393534 |
19675 |
9916.4 |
0 |
0.00 |
Perftest Scripts
To produce these tests, we executed RTI Perftest for C++98. The exact commands used can be found here:
Publisher Side
1sudo /set_thr_mode.sh
2
3echo EXECUTABLE IS $1
4export executable=$1
5
6echo OUTPUT PATH IS $2
7export output_folder=$2
8
9export exec_time=30
10export nic=172.16.0.1
11export pub_string="-pub \
12 -transport UDPv4 \
13 -nic $nic \
14 -noPrint \
15 -batchSize 0\
16 -noOutputHeaders \
17 -exec $exec_time \
18 -noXML"
19
20mkdir -p $output_folder
21
22echo ">> UNKEYED BE N Batching"
23export my_file=$output_folder/thr_udpv4_pub_unkeyed_be_noBatch.csv
24touch $my_file
25export extra_args=""
26for index in 1 2 3; do
27 for DATALEN in 32 64 128 256 512 1024 2048 4096 8192 16384 32768 63000; do
28 export command="taskset -c 0 \
29 $executable -best $pub_string -datalen $DATALEN $extra_args -batchSize 0"
30 echo $command
31 $command >> $my_file;
32 sleep 3;
33 export extra_args=" -noOutputHeaders "
34 done
35done
36sleep 5;
37
38echo ">> UNKEYED REL N Batching"
39export my_file=$output_folder/thr_udpv4_pub_unkeyed_rel_noBatch.csv
40touch $my_file
41export extra_args=""
42for index in 1 2 3; do
43 for DATALEN in 32 64 128 256 512 1024 2048 4096 8192 16384 32768 63000; do
44 export command="taskset -c 0 \
45 $executable $pub_string -datalen $DATALEN $extra_args -batchSize 0"
46 echo $command
47 $command >> $my_file;
48 sleep 3;
49 export extra_args=" -noOutputHeaders "
50 done
51done
52sleep 5;
Subscriber Side
1sudo /set_thr_mode.sh
2
3echo EXECUTABLE IS $1
4export executable=$1
5
6echo OUTPUT PATH IS $2
7export output_folder=$2
8
9export nic=172.16.0.2
10export sub_string="-sub \
11 -transport UDPv4 \
12 -nic $nic \
13 -noPrint \
14 -noOutputHeaders \
15 -noXML"
16
17mkdir -p $output_folder
18
19echo ">> UNKEYED BE"
20export my_file=$output_folder/thr_udpv4_sub_unkeyed_be_noBatch.csv
21touch $my_file
22export extra_args=""
23for index in 1 2 3; do
24 for DATALEN in 32 64 128 256 512 1024 2048 4096 8192 16384 32768 63000; do
25 export command="taskset -c 0 \
26 $executable -best -datalen $DATALEN $sub_string $extra_args"
27 echo $command ---- $index
28 $command >> $my_file;
29 sleep 10;
30 export extra_args=" -noOutputHeaders "
31 done
32done
33sleep 5;
34
35
36echo ">> UNKEYED REL"
37export my_file=$output_folder/thr_udpv4_sub_unkeyed_rel_noBatch.csv
38touch $my_file
39export extra_args=""
40for index in 1 2 3; do
41 for DATALEN in 32 64 128 256 512 1024 2048 4096 8192 16384 32768 63000; do
42 export command="taskset -c 0 \
43 $executable -datalen $DATALEN $sub_string $extra_args"
44 echo $command ---- $index
45 $command >> $my_file;
46 sleep 10;
47 export extra_args=" -noOutputHeaders "
48 done
49done
50sleep 5;
Test Hardware
The following hardware was used to perform these tests:
Linux Nodes
Processor: Intel® Xeon® E-2186G 3.8GHz, 12M cache, 6C/12T, turbo (95W)
RAM: 16GB 2666MT/s DDR4 ECC UDIMM
NIC 1: Intel X550 Dual Port 10GbE BASE-T Adapter, PCIe Full Height
NIC 2: Intel Ethernet I350 Dual Port 1GbE BASE-T Adapter, PCIe Low Profile
OS: Ubuntu 18.04 -- gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0
Switch
Dell Networking S4048T-ON, 48x 10GBASE-T and 6x 40GbE QSFP+ ports, IO to PSU air, 2x AC PSU, OS9