1.1.2.2. FlatData and Zero Copy

RTI FlatData™ language binding and Zero Copy over shared memory are two very powerful tools in RTI Connext DDS Professional that can boost performance by reducing latency.

With FlatData language binding, the in-memory representation of a sample matches the wire representation, reducing the cost of serialization/deserialization to zero. You can directly access the serialized data without deserializing it first. FlatData language binding reduces the number of copies of a sample from four to two for both SHMEM and UDP transports, by removing the serialization and deserialization copies.

Zero Copy transfer over shared memory reduces the number of copies to zero for communications within the same host. This feature accomplishes zero copies by using the shared memory (SHMEM) built-in transport to send 16-byte references to samples within a SHMEM segment owned by the DataWriter, instead of using the SHMEM built-in transport to send the serialized sample content by making a copy.

For more information, see Sending Large Data, in the RTI Connext DDS Core Libraries User’s Manual.

In the following tests, we compare improvements in performance over shared memory (SHMEM) for these two features.

FlatData, Unkeyed, Reliable, Shared Memory, C++98

The graph below shows the one-way latency without load between a Publisher and a Subscriber running within a single node in three different cases:

  • Using SHMEM

  • Using SHMEM + FlatData

  • Using SHMEM + FlatData + ZeroCopy

Note

We use the median (50th percentile) instead of the average in order to get a more stable measurement that does not account for spurious outliers. We also calculate the average value and other percentile values, which can be seen in the Detailed Statistics section below.

Detailed Statistics

The following tables contain the raw numbers presented by RTI Perftest. These numbers are the exact output with no further processing.

  • Shared Memory (Reliable)

Sample Size (Bytes)

Avg (μs)

Std (μs)

Min (μs)

Max (μs)

50% (μs)

90% (μs)

99% (μs)

99.99% (μs)

99.9999% (μs)

32

11

2.0

9

42

10

14

19

34

42

64

11

2.2

9

42

10

15

18

34

42

128

11

2.0

9

39

10

15

18

33

39

256

11

2.0

9

48

10

14

18

33

48

512

11

2.0

9

42

11

12

19

35

42

1024

11

2.0

9

44

11

15

19

34

44

2048

12

2.1

9

103

11

15

19

35

103

4096

12

2.1

10

53

11

13

20

35

53

8192

12

1.8

10

46

12

12

20

34

46

16384

14

2.3

12

88

13

17

24

36

88

32768

16

2.1

14

307

16

18

25

43

307

63000

22

2.2

18

103

21

23

32

69

103

100000

35

6.3

27

190

30

42

57

107

190

500000

114

31.3

84

713

104

174

205

555

713

1048576

420

108.0

192

1433

367

552

580

1160

1433

1548576

637

169.8

311

2081

597

862

912

1722

2081

4194304

1925

412.6

1171

5540

1901

2445

2535

5540

5540

10485760

4716

752.7

4123

15317

3900

5494

5542

15317

15317

  • Shared Memory + FlatData (Reliable)

Sample Size (Bytes)

Avg (μs)

Std (μs)

Min (μs)

Max (μs)

50% (μs)

90% (μs)

99% (μs)

99.99% (μs)

99.9999% (μs)

64

13

1.5

12

54

12

16

18

25

54

128

13

1.4

12

79

12

15

18

24

79

256

13

1.1

12

44

12

15

18

23

44

512

14

1.0

12

56

12

15

16

24

56

1024

14

1.7

12

140

13

17

18

26

140

2048

14

1.1

13

44

14

16

17

24

44

4096

14

1.2

12

53

14

16

17

25

53

8192

14

1.0

13

53

13

16

17

26

53

16384

16

1.3

14

103

15

18

20

31

103

32768

17

1.0

16

106

17

18

20

42

106

63000

22

1.1

19

111

21

22

24

63

111

100000

25

1.7

23

165

25

26

31

87

165

500000

71

10.3

63

648

68

71

79

402

648

1048576

129

25.8

122

1312

128

132

139

1013

1312

1548576

189

39.4

180

1890

187

193

200

1492

1890

4194304

661

125.7

608

5043

657

678

695

5010

5043

10485760

2402

422.3

2234

12896

2179

2480

2562

12896

12896

  • Shared Memory + FlatData + ZeroCopy (Reliable)

Sample Size (Bytes)

Avg (μs)

Std (μs)

Min (μs)

Max (μs)

50% (μs)

90% (μs)

99% (μs)

99.99% (μs)

99.9999% (μs)

64

15

1.1

13

97

14

17

18

24

97

128

15

1.2

13

95

14

17

18

25

95

256

14

0.9

13

56

14

16

17

24

56

512

14

1.0

13

45

14

16

17

24

45

1024

14

1.0

13

52

14

16

17

24

52

2048

14

0.9

13

99

14

16

17

24

99

4096

14

1.0

13

96

14

16

17

24

96

8192

15

1.0

13

37

14

17

17

20

37

16384

14

1.0

13

38

14

16

18

25

38

32768

14

0.9

13

37

14

16

17

23

37

63000

14

0.9

13

90

14

16

17

24

90

100000

15

0.4

14

54

14

15

17

24

54

500000

15

0.9

14

54

14

15

18

27

54

1048576

15

0.9

13

56

14

16

18

28

56

1548576

15

1.0

14

54

14

16

18

28

54

4194304

15

0.8

14

52

14

16

18

25

52

10485760

16

1.1

14

92

14

17

19

28

92


Test Hardware

The following hardware was used to perform these tests:

Linux Nodes

Processor: Intel® Xeon® E-2186G 3.8GHz, 12M cache, 6C/12T, turbo (95W)
RAM: 16GB 2666MT/s DDR4 ECC UDIMM
NIC 1: Intel X550 Dual Port 10GbE BASE-T Adapter, PCIe Full Height
NIC 2: Intel Ethernet I350 Dual Port 1GbE BASE-T Adapter, PCIe Low Profile
OS: Ubuntu 18.04 -- gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0

Switch

Dell Networking S4048T-ON, 48x 10GBASE-T and 6x 40GbE QSFP+ ports, IO to PSU air, 2x AC PSU, OS9