1.1.2.1. Batching

The Batch QoS Policy can be used to decrease the amount of communication overhead associated with the transmission and (in the case of reliable communication) acknowledgment of small DDS samples, in order to increase throughput. It does this by collecting multiple user data DDS samples to be sent in a single network packet, to take advantage of the efficiency of sending larger packets.

Batching increases effective throughput dramatically for small data DDS samples. Throughput for small DDS samples is typically limited by CPU capacity and not by network bandwidth. Batching many smaller DDS samples to be sent in a single large packet increases network utilization and thus throughput in terms of DDS samples per second.

For more information, see BATCH QosPolicy, in the RTI Connext DDS Core Libraries User’s Manual.

The tests below show how powerful the use of batching can be. In these tests, we compare the RTI Perftest default configuration for throughput, which enables batching (with a batch size of 8192 bytes), and a configuration where batching is disabled.

Note

We choose 8192 bytes for the batch size because it is the closest power of two below the maximum MTU that can be configured for a regular NIC, which is 9000 bytes. This way the RTPS packet doesn’t need to be fragmented.

This configuration, by itself, cannot ensure that the packet will not be fragmented, since if the MTU of the NIC is set to a lower value (like the default value in many NICs, of 1500 bytes), then the packet will have fragmentation.

Note

Since some samples are delayed until the batch is ready, you may observe increased latency. The maximum delay, as well as maximum bytes or maximum number of samples, can be configured. See BATCH QosPolicy, in the RTI Connext DDS Core Libraries User’s Manual.

Unkeyed, UDPv4 10Gbps Network, C++98

The graph below shows the expected throughput behavior when performing a 1-1 communication between two Linux nodes in a 10Gbps network with and without batching (with a batch size of 8192 bytes). The numbers are for best-effort as well as strict reliable reliability scenarios.

Detailed Statistics

This table contains the raw numbers presented by RTI Perftest. These numbers are the exact output with no further processing.

  • Best Effort, Batching

Sample Size (Bytes)

Total Samples

Avg Samples/s

Avg Mbps

Lost Samples

Lost Samples (%)

32

98447835

4916862

1258.7

608768

0.61

64

91799845

4585163

2347.6

57216

0.06

128

80906785

4038802

4135.7

0

0.00

256

66036513

3298344

6755.0

19936

0.03

512

46765762

2336180

9569.0

0

0.00

1024

23812202

1190253

9750.6

0

0.00

2048

11949288

597398

9787.8

0

0.00

4096

4876518

243815

9889.3

1109040

18.53

8192

3005854

150283

9849.0

0

0.00

16384

1509117

75450

9889.5

79

0.01

32768

756061

37802

9909.8

16

0.00

63000

393526

19676

9916.8

8

0.00

  • Best Effort, No Batching

Sample Size (Bytes)

Total Samples

Avg Samples/s

Avg Mbps

Lost Samples

Lost Samples (%)

32

7512988

375290

96.1

0

0.00

64

7501457

374714

191.9

0

0.00

128

7497866

374534

383.5

0

0.00

256

7154053

357364

781.9

0

0.00

512

7133771

356350

1559.6

0

0.00

1024

7044914

351913

3082.9

0

0.00

2048

6888612

344104

5937.8

0

0.00

4096

5927297

296076

9701.8

0

0.00

8192

3005854

150283

9849.0

0

0.00

16384

1509117

75450

9889.5

79

0.01

32768

756061

37802

9909.8

16

0.00

63000

393526

19676

9916.8

8

0.00

  • Reliable, Batching

Sample Size (Bytes)

Total Samples

Avg Samples/s

Avg Mbps

Lost Samples

Lost Samples (%)

32

93771579

4686195

1199.7

0

0.00

64

84316173

4209912

2155.5

0

0.00

128

71793365

3584812

3670.8

0

0.00

256

54521333

2722685

5576.1

0

0.00

512

38241843

1909759

7822.4

0

0.00

1024

22874688

1142600

9360.2

0

0.00

2048

11953124

597062

9782.3

0

0.00

4096

5988600

299098

9800.9

0

0.00

8192

3004950

150226

9845.3

0

0.00

16384

1508850

75439

9888.0

0

0.00

32768

756049

37799

9908.9

0

0.00

63000

393534

19675

9916.4

0

0.00

100000

247815

12389

9911.5

0

0.00

500000

47611

2380

9523.5

0

0.00

1048576

16285

1012

9440.4

0

0.00

1548576

15096

755

9356.2

0

0.00

4194304

5598

280

9395.4

0

0.00

10485760

2055

103

8695.8

0

0.00

  • Reliable, No Batching

Sample Size (Bytes)

Total Samples

Avg Samples/s

Avg Mbps

Lost Samples

Lost Samples (%)

32

4989963

249154

63.8

0

0.00

64

4938398

246654

126.3

0

0.00

128

4863735

242922

248.8

0

0.00

256

4892469

244359

500.4

0

0.00

512

4894170

244442

1001.2

0

0.00

1024

4876949

243583

1995.4

0

0.00

2048

4703617

234928

3849.1

0

0.00

4096

4450197

222272

7283.4

0

0.00

8192

3004950

150226

9845.3

0

0.00

16384

1508850

75439

9888.0

0

0.00

32768

756049

37799

9908.9

0

0.00

63000

393534

19675

9916.4

0

0.00


Perftest Scripts

To produce these tests, we executed RTI Perftest for C++98. The exact commands used can be found here:

Publisher Side

 1sudo /set_thr_mode.sh
 2
 3echo EXECUTABLE IS $1
 4export executable=$1
 5
 6echo OUTPUT PATH IS $2
 7export output_folder=$2
 8
 9export exec_time=30
10export nic=172.16.0.1
11export pub_string="-pub \
12        -transport UDPv4 \
13        -nic $nic \
14        -noPrint \
15        -batchSize 0\
16        -noOutputHeaders \
17        -exec $exec_time \
18        -noXML"
19
20mkdir -p $output_folder
21
22echo ">> UNKEYED BE N Batching"
23export my_file=$output_folder/thr_udpv4_pub_unkeyed_be_noBatch.csv
24touch $my_file
25export extra_args=""
26for index in 1 2 3; do
27    for DATALEN in 32 64 128 256 512 1024 2048 4096 8192 16384 32768 63000; do
28        export command="taskset -c 0 \
29        $executable -best $pub_string -datalen $DATALEN $extra_args -batchSize 0"
30        echo $command
31        $command >> $my_file;
32        sleep 3;
33        export extra_args=" -noOutputHeaders "
34    done
35done
36sleep 5;
37
38echo ">> UNKEYED REL N Batching"
39export my_file=$output_folder/thr_udpv4_pub_unkeyed_rel_noBatch.csv
40touch $my_file
41export extra_args=""
42for index in 1 2 3; do
43    for DATALEN in 32 64 128 256 512 1024 2048 4096 8192 16384 32768 63000; do
44        export command="taskset -c 0 \
45        $executable $pub_string -datalen $DATALEN $extra_args -batchSize 0"
46        echo $command
47        $command >> $my_file;
48        sleep 3;
49        export extra_args=" -noOutputHeaders "
50    done
51done
52sleep 5;

Subscriber Side

 1sudo /set_thr_mode.sh
 2
 3echo EXECUTABLE IS $1
 4export executable=$1
 5
 6echo OUTPUT PATH IS $2
 7export output_folder=$2
 8
 9export nic=172.16.0.2
10export sub_string="-sub \
11        -transport UDPv4 \
12        -nic $nic \
13        -noPrint \
14        -noOutputHeaders \
15        -noXML"
16
17mkdir -p $output_folder
18
19echo ">> UNKEYED BE"
20export my_file=$output_folder/thr_udpv4_sub_unkeyed_be_noBatch.csv
21touch $my_file
22export extra_args=""
23for index in 1 2 3; do
24    for DATALEN in 32 64 128 256 512 1024 2048 4096 8192 16384 32768 63000; do
25        export command="taskset -c 0 \
26        $executable -best -datalen $DATALEN $sub_string $extra_args"
27        echo $command ---- $index
28        $command >> $my_file;
29        sleep 10;
30        export extra_args=" -noOutputHeaders "
31    done
32done
33sleep 5;
34
35
36echo ">> UNKEYED REL"
37export my_file=$output_folder/thr_udpv4_sub_unkeyed_rel_noBatch.csv
38touch $my_file
39export extra_args=""
40for index in 1 2 3; do
41    for DATALEN in 32 64 128 256 512 1024 2048 4096 8192 16384 32768 63000; do
42        export command="taskset -c 0 \
43        $executable -datalen $DATALEN $sub_string $extra_args"
44        echo $command ---- $index
45        $command >> $my_file;
46        sleep 10;
47        export extra_args=" -noOutputHeaders "
48    done
49done
50sleep 5;

Test Hardware

The following hardware was used to perform these tests:

Linux Nodes

Processor: Intel® Xeon® E-2186G 3.8GHz, 12M cache, 6C/12T, turbo (95W)
RAM: 16GB 2666MT/s DDR4 ECC UDIMM
NIC 1: Intel X550 Dual Port 10GbE BASE-T Adapter, PCIe Full Height
NIC 2: Intel Ethernet I350 Dual Port 1GbE BASE-T Adapter, PCIe Low Profile
OS: Ubuntu 18.04 -- gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0

Switch

Dell Networking S4048T-ON, 48x 10GBASE-T and 6x 40GbE QSFP+ ports, IO to PSU air, 2x AC PSU, OS9