1.1.2.2. FlatData and Zero Copy

RTI FlatData™ language binding and Zero Copy over shared memory are two very powerful tools in RTI Connext DDS Professional that can boost performance by reducing latency.

With FlatData language binding, the in-memory representation of a sample matches the wire representation, reducing the cost of serialization/deserialization to zero. You can directly access the serialized data without deserializing it first. FlatData language binding reduces the number of copies of a sample from four to two for both SHMEM and UDP transports, by removing the serialization and deserialization copies.

Zero Copy transfer over shared memory reduces the number of copies to zero for communications within the same host. This feature accomplishes zero copies by using the shared memory (SHMEM) built-in transport to send 16-byte references to samples within a SHMEM segment owned by the DataWriter, instead of using the SHMEM built-in transport to send the serialized sample content by making a copy.

For more information, see Sending Large Data, in the RTI Connext DDS Core Libraries User’s Manual.

In the following tests, we compare improvements in performance over shared memory (SHMEM) for these two features.

FlatData, Unkeyed, Reliable, Shared Memory, C++98

The graph below shows the one-way latency without load between a Publisher and a Subscriber running within a single node in three different cases:

  • Using SHMEM

  • Using SHMEM + FlatData

  • Using SHMEM + FlatData + ZeroCopy

Note

We use the median (50th percentile) instead of the average in order to get a more stable measurement that does not account for spurious outliers. We also calculate the average value and other percentile values, which can be seen in the Detailed Statistics section below.

Detailed Statistics

The following tables contain the raw numbers presented by RTI Perftest. These numbers are the exact output with no further processing.

  • Shared Memory (Reliable)

Sample Size (Bytes)

Avg (μs)

Std (μs)

Min (μs)

Max (μs)

50% (μs)

90% (μs)

99% (μs)

99.99% (μs)

99.9999% (μs)

32

11

1.2

9

37

11

14

18

21

35

64

11

1.1

9

39

11

14

16

22

35

128

11

1.3

9

36

11

14

18

21

35

256

11

1.2

10

38

11

14

16

22

37

512

11

1.2

9

47

11

13

18

21

41

1024

11

1.3

10

146

11

14

18

22

42

8192

12

1.4

11

45

12

14

19

24

45

63000

22

1.7

19

94

21

22

31

69

94

100000

31

2.0

27

166

30

31

40

105

166

500000

129

39.5

89

709

104

178

218

528

709

1048576

357

85.6

184

1452

368

378

673

1015

1452

1548576

571

112.0

306

2132

587

686

843

1508

2132

4194304

1684

314.3

1154

5797

1866

1938

1978

5797

5797

10485760

3923

384.8

3849

15913

3900

3930

3962

15913

15913

  • Shared Memory + FlatData (Reliable)

Sample Size (Bytes)

Avg (μs)

Std (μs)

Min (μs)

Max (μs)

50% (μs)

90% (μs)

99% (μs)

99.99% (μs)

99.9999% (μs)

64

13

1.3

11

44

12

16

17

24

37

128

13

1.4

11

39

12

16

17

24

39

256

13

1.4

11

43

12

16

17

24

43

512

13

1.4

11

39

12

16

17

24

39

1024

13

1.5

11

43

12

16

17

24

43

8192

14

1.3

12

39

13

17

17

24

39

63000

20

1.2

18

95

20

21

24

59

95

100000

24

1.4

22

156

24

24

30

85

156

500000

68

7.6

65

625

67

68

75

361

625

1048576

128

19.9

123

1296

127

131

138

770

1296

1548576

185

30.4

178

1861

183

188

196

1422

1861

4194304

652

99.8

629

4870

647

658

671

4017

4870

10485760

2247

326.1

2153

12579

2162

2268

2288

12579

12579

  • Shared Memory + FlatData + ZeroCopy (Reliable)

Sample Size (Bytes)

Avg (μs)

Std (μs)

Min (μs)

Max (μs)

50% (μs)

90% (μs)

99% (μs)

99.99% (μs)

99.9999% (μs)

64

14

0.7

13

37

14

15

16

23

37

128

14

0.7

13

39

14

15

16

23

39

256

14

0.8

14

179

14

15

16

23

179

512

14

0.7

14

34

14

15

16

23

34

1024

14

0.7

13

38

14

15

16

23

38

8192

14

0.8

13

42

14

15

16

24

42

63000

14

0.7

13

39

14

15

16

23

39

100000

15

0.5

13

37

15

15

17

23

37

500000

15

0.7

13

235

15

15

18

24

235

1048576

15

0.7

13

41

15

16

18

25

41

1548576

15

0.7

13

41

15

16

18

25

41

4194304

15

0.7

13

41

15

16

18

25

41

10485760

15

1.6

13

1289

15

16

18

25

1289


Test Hardware

The following hardware was used to perform these tests:

Linux Nodes

Processor: Intel® Xeon® E-2186G 3.8GHz, 12M cache, 6C/12T, turbo (95W)
RAM: 16GB 2666MT/s DDR4 ECC UDIMM
NIC 1: Intel X550 Dual Port 10GbE BASE-T Adapter, PCIe Full Height
NIC 2: Intel Ethernet I350 Dual Port 1GbE BASE-T Adapter, PCIe Low Profile
OS: Ubuntu 18.04 -- gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0

Switch

Dell Networking S4048T-ON, 48x 10GBASE-T and 6x 40GbE QSFP+ ports, IO to PSU air, 2x AC PSU, OS9