3.1.1.5. FlatData and Zero Copy

RTI FlatData™ language binding and Zero Copy over shared memory are two very powerful tools in RTI Connext Professional that can boost performance by reducing latency.

With FlatData language binding, the in-memory representation of a sample matches the wire representation, reducing the cost of serialization/deserialization to zero. You can directly access the serialized data without deserializing it first. FlatData language binding reduces the number of copies of a sample from four to two for both SHMEM and UDP transports, by removing the serialization and deserialization copies.

Zero Copy transfer over shared memory reduces the number of copies to zero for communications within the same host. This feature accomplishes zero copies by using the shared memory (SHMEM) built-in transport to send 16-byte references to samples within a SHMEM segment owned by the DataWriter, instead of using the SHMEM built-in transport to send the serialized sample content by making a copy.

For more information, see Sending Large Data, in the RTI Connext Core Libraries User’s Manual.

In the following tests, we compare improvements in performance over shared memory (SHMEM) for these two features.

FlatData, Unkeyed, Reliable, Shared Memory, C++98

The graph below shows the one-way latency without load between a Publisher and a Subscriber running within a single node in three different cases:

  • Using SHMEM

  • Using SHMEM + FlatData

  • Using SHMEM + FlatData + ZeroCopy

Note

We use the median (50th percentile) instead of the average in order to get a more stable measurement that does not account for spurious outliers. We also calculate the average value and other percentile values, which can be seen in the Detailed Statistics section below.

Detailed Statistics

The following tables contain the raw numbers presented by RTI Perftest. These numbers are the exact output with no further processing.

  • Shared Memory (Reliable)

Sample Size (Bytes)

Ave (μs)

Std (μs)

Min (μs)

Max (μs)

50% (μs)

90% (μs)

99% (μs)

99.99% (μs)

99.9999% (μs)

32

11

1.1

9

52

10

13

14

22

52

64

11

1.2

9

52

10

13

14

21

52

128

10

1.1

9

56

10

12

14

21

56

256

11

1.3

9

124

10

13

18

21

124

512

11

1.3

9

56

10

13

18

21

56

1024

11

1.1

9

54

10

13

14

22

54

2048

11

1.2

9

53

11

13

18

21

53

4096

11

1.1

10

51

11

13

18

22

51

8192

12

1.2

10

57

11

13

18

22

57

16384

13

0.9

12

39

12

14

16

24

39

32768

15

0.9

14

62

15

15

19

36

62

63000

20

1.4

18

71

19

20

28

55

71

100000

28

2.5

25

122

27

28

42

82

122

500000

93

14.0

81

515

91

95

143

367

515

1048576

276

58.6

162

1056

288

297

485

734

1056

1548576

415

77.2

249

1540

434

446

595

1090

1540

4194304

1332

148.1

882

4157

1359

1383

1396

4157

4157

10485760

2610

387.3

2374

10798

2548

2581

3643

10798

10798

  • Shared Memory + FlatData (Reliable)

Sample Size (Bytes)

Ave (μs)

Std (μs)

Min (μs)

Max (μs)

50% (μs)

90% (μs)

99% (μs)

99.99% (μs)

99.9999% (μs)

64

12

1.8

11

1150

12

15

16

20

1150

128

13

1.4

11

167

12

15

16

24

167

256

13

1.4

11

169

12

15

16

24

169

512

13

1.4

11

173

12

15

16

24

173

1024

14

1.1

12

52

13

16

17

24

52

2048

14

1.1

12

52

13

16

17

24

52

4096

14

1.1

13

55

13

16

17

24

55

8192

14

1.2

13

174

13

16

16

24

174

16384

15

1.2

13

182

14

16

17

26

182

32768

17

1.1

15

171

16

18

19

37

171

63000

20

1.1

18

73

20

21

23

52

73

100000

23

1.5

21

216

22

23

27

71

216

500000

60

6.7

57

468

59

60

67

286

468

1048576

109

16.9

105

953

108

111

118

718

953

1548576

156

26.1

150

1395

154

159

164

1068

1395

4194304

439

77.9

421

3745

436

443

451

2930

3745

10485760

1269

246.7

1220

9383

1258

1280

1293

9383

9383

  • Shared Memory + FlatData + ZeroCopy (Reliable)

Sample Size (Bytes)

Ave (μs)

Std (μs)

Min (μs)

Max (μs)

50% (μs)

90% (μs)

99% (μs)

99.99% (μs)

99.9999% (μs)

64

15

1.1

13

162

14

16

17

25

162

128

15

0.9

14

33

14

15

17

24

33

256

15

1.1

13

190

14

16

17

26

190

512

15

1.3

14

530

14

16

17

26

530

1024

15

7.0

13

3802

14

16

17

46

3802

2048

15

1.1

14

177

14

15

17

26

177

4096

15

1.1

13

186

14

16

17

25

186

8192

15

1.1

13

194

14

16

17

25

194

16384

15

1.1

13

170

14

15

17

26

170

32768

15

2.6

14

1840

14

16

17

25

1840

63000

15

0.8

14

54

15

16

18

25

54


Test Hardware

The following hardware was used to perform these tests:

Linux Nodes

Dell R340 Servers (13 Units)
Processor: Intel Xeon E-2278G (3.4-5GHz, 8c/16t, 16MB cache, 2 memory channels @2666MHz)
RAM: 4x 16GB 2666MHz DIMM (64GB RAM)
HD: 480GB SATA SSD
NIC 1: Intel 710 dual port 10Gbps SFP
OS: Ubuntu 20.04 -- gcc 9.3.0

Switch

Dell 2048 -- 10Gbps switch (10Gbps and 1Gbps interfaces)