1.1.2.1. Batching

The Batch QoS Policy can be used to decrease the amount of communication overhead associated with the transmission and (in the case of reliable communication) acknowledgment of small DDS samples, in order to increase throughput. It does this by collecting multiple user data DDS samples to be sent in a single network packet, to take advantage of the efficiency of sending larger packets.

Batching increases effective throughput dramatically for small data DDS samples. Throughput for small DDS samples is typically limited by CPU capacity and not by network bandwidth. Batching many smaller DDS samples to be sent in a single large packet increases network utilization and thus throughput in terms of DDS samples per second.

For more information, see BATCH QosPolicy, in the RTI Connext DDS Core Libraries User’s Manual.

The tests below show how powerful the use of batching can be. In these tests, we compare the RTI Perftest default configuration for throughput, which enables batching (with a batch size of 8192 bytes), and a configuration where batching is disabled.

Note

We choose 8192 bytes for the batch size because it is the closest power of two below the maximum MTU that can be configured for a regular NIC, which is 9000 bytes. This way the RTPS packet doesn’t need to be fragmented.

This configuration, by itself, cannot ensure that the packet will not be fragmented, since if the MTU of the NIC is set to a lower value (like the default value in many NICs, of 1500 bytes), then the packet will have fragmentation.

Note

Since some samples are delayed until the batch is ready, you may observe increased latency. The maximum delay, as well as maximum bytes or maximum number of samples, can be configured. See BATCH QosPolicy, in the RTI Connext DDS Core Libraries User’s Manual.

Unkeyed, UDPv4 10Gbps Network, C++98

The graph below shows the expected throughput behavior when performing a 1-1 communication between two Linux nodes in a 10Gbps network with and without batching (with a batch size of 8192 bytes). The numbers are for best-effort as well as strict reliable reliability scenarios.

Detailed Statistics

This table contains the raw numbers presented by RTI Perftest. These numbers are the exact output with no further processing.

  • Best Effort, Batching

Sample Size (Bytes)

Total Samples

Avg Samples/s

Avg Mbps

Lost Samples

Lost Samples (%)

32

99848960

5033936

1288.7

151040

0.15

64

99973760

4668791

2390.4

26240

0.03

128

100000000

4111291

4210.0

0

0.00

256

100000000

3340956

6842.3

0

0.00

512

69812480

2335490

9575.2

0

0.00

1024

35708504

1190222

9750.3

0

0.00

8192

4508508

150280

9848.8

0

0.00

63000

590273

19675

9916.5

2

0.00

  • Best Effort, No Batching

Sample Size (Bytes)

Total Samples

Avg Samples/s

Avg Mbps

Lost Samples

Lost Samples (%)

32

11733830

390842

100.1

0

0.00

64

11811834

393434

201.4

0

0.00

128

11609904

386713

396.0

0

0.00

256

11662605

388464

795.6

0

0.00

512

11542759

384478

1574.8

0

0.00

1024

11349140

378025

3096.8

0

0.00

2048

10952612

364820

5977.2

0

0.00

4096

8886128

296075

9701.8

0

0.00

8192

4508641

150280

9848.8

0

0.00

16384

2263520

75448

9889.2

71

0.00

32768

1134080

37801

9909.5

17

0.00

63000

590282

19675

9916.5

9

0.00

  • Reliable, Batching

Sample Size (Bytes)

Total Samples

Avg Samples/s

Avg Mbps

Lost Samples

Lost Samples (%)

32

100000000

4776439

1222.8

0

0.00

64

100000000

4333897

2219.0

0

0.00

128

100000000

3470061

3553.3

0

0.00

256

82417984

2744986

5621.7

0

0.00

512

56613633

1885618

7723.5

0

0.00

1024

35265288

1174554

9621.9

0

0.00

8192

4506938

150221

9844.9

0

0.00

63000

590289

19674

9916.0

0

0.00

100000

371720

12388

9911.1

0

0.00

500000

68000

2265

9060.3

0

0.00

1048576

29076

968

8124.9

0

0.00

1548576

22044

734

9098.6

0

0.00

4194304

7793

259

8702.6

0

0.00

10485760

2911

96

8129.6

0

0.00

  • Reliable, No Batching

Sample Size (Bytes)

Total Samples

Avg Samples/s

Avg Mbps

Lost Samples

Lost Samples (%)

32

7625956

253971

65.0

0

0.00

64

7654812

254939

130.5

0

0.00

128

7482362

249196

255.2

0

0.00

256

7588068

252715

517.6

0

0.00

512

7457710

248365

1017.3

0

0.00

1024

7428494

247380

2026.5

0

0.00

2048

7216087

240299

3937.1

0

0.00

4096

6731026

224156

7345.2

0

0.00

8192

4506759

150221

9844.9

0

0.00

16384

2263121

75433

9887.3

0

0.00

32768

1134019

37798

9908.6

0

0.00

63000

590277

19674

9916.0

0

0.00


Perftest Scripts

To produce these tests, we executed RTI Perftest for C++98. The exact commands used can be found here:

Publisher Side

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
sudo /set_thr_mode.sh

echo EXECUTABLE IS $1
export executable=$1

echo OUTPUT PATH IS $2
export output_folder=$2

export exec_time=30
export nic=172.16.0.1
export pub_string="-pub \
        -transport UDPv4 \
        -nic $nic \
        -noPrint \
        -batchSize 0\
        -noOutputHeaders \
        -exec $exec_time \
        -noXML"

mkdir -p $output_folder

echo ">> UNKEYED BE N Batching"
export my_file=$output_folder/thr_udpv4_pub_unkeyed_be_noBatch.csv
touch $my_file
export extra_args=""
for index in 1 2 3; do
    for DATALEN in 32 64 128 256 512 1024 2048 4096 8192 16384 32768 63000; do
        export command="taskset -c 0 \
        $executable -best $pub_string -datalen $DATALEN $extra_args -batchSize 0"
        echo $command
        $command >> $my_file;
        sleep 3;
        export extra_args=" -noOutputHeaders "
    done
done
sleep 5;

echo ">> UNKEYED REL N Batching"
export my_file=$output_folder/thr_udpv4_pub_unkeyed_rel_noBatch.csv
touch $my_file
export extra_args=""
for index in 1 2 3; do
    for DATALEN in 32 64 128 256 512 1024 2048 4096 8192 16384 32768 63000; do
        export command="taskset -c 0 \
        $executable $pub_string -datalen $DATALEN $extra_args -batchSize 0"
        echo $command
        $command >> $my_file;
        sleep 3;
        export extra_args=" -noOutputHeaders "
    done
done
sleep 5;

Subscriber Side

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
sudo /set_thr_mode.sh

echo EXECUTABLE IS $1
export executable=$1

echo OUTPUT PATH IS $2
export output_folder=$2

export nic=172.16.0.2
export sub_string="-sub \
        -transport UDPv4 \
        -nic $nic \
        -noPrint \
        -noOutputHeaders \
        -noXML"

mkdir -p $output_folder

echo ">> UNKEYED BE"
export my_file=$output_folder/thr_udpv4_sub_unkeyed_be_noBatch.csv
touch $my_file
export extra_args=""
for index in 1 2 3; do
    for DATALEN in 32 64 128 256 512 1024 2048 4096 8192 16384 32768 63000; do
        export command="taskset -c 0 \
        $executable -best -datalen $DATALEN $sub_string $extra_args"
        echo $command ---- $index
        $command >> $my_file;
        sleep 10;
        export extra_args=" -noOutputHeaders "
    done
done
sleep 5;


echo ">> UNKEYED REL"
export my_file=$output_folder/thr_udpv4_sub_unkeyed_rel_noBatch.csv
touch $my_file
export extra_args=""
for index in 1 2 3; do
    for DATALEN in 32 64 128 256 512 1024 2048 4096 8192 16384 32768 63000; do
        export command="taskset -c 0 \
        $executable -datalen $DATALEN $sub_string $extra_args"
        echo $command ---- $index
        $command >> $my_file;
        sleep 10;
        export extra_args=" -noOutputHeaders "
    done
done
sleep 5;

Test Hardware

The following hardware was used to perform these tests:

Linux Nodes

Processor: Intel® Xeon® E-2186G 3.8GHz, 12M cache, 6C/12T, turbo (95W)
RAM: 16GB 2666MT/s DDR4 ECC UDIMM
NIC 1: Intel X550 Dual Port 10GbE BASE-T Adapter, PCIe Full Height
NIC 2: Intel Ethernet I350 Dual Port 1GbE BASE-T Adapter, PCIe Low Profile
OS: Ubuntu 18.04 -- gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0

Switch

Dell Networking S4048T-ON, 48x 10GBASE-T and 6x 40GbE QSFP+ ports, IO to PSU air, 2x AC PSU, OS9