9.1. Latency Benchmarks¶
Latency measurements are provided for two different environments:
- Xeon – End-to-End latency measured on high-performance Xeon machines in a dedicated network using the RTI Connext DDS Performance Test tool.
- Raspberry Pi – Round-trip latencies measured on stock Raspberry Pi’s in a large, non-dedicated network.
9.1.1. Xeon¶
The end-to-end latency is measured between two identical machines using the test configuration below and running the RTI Connext DDS Performance Test tool.
The test environment consists of:
- x86_64 CentOS Linux release 7.1.1503
- RTI Perftest 3.0
- Switch Configuration: D-Link DXS-3350 SR:
- 176Gbps Switching Capacity
- Dual 10-Gig stacking ports and optional 10-Gig uplinks
- Stacks up to 8 units per stack
- 4MB (Packet Buffer Size)
- 48 x 10/100/1000BASE-T ports
- Machine:
- Intel I350 Gigabit NIC
- Intel Core i7 CPU:
- 12MB cache
- 6 Cores (12 threads)
- 3.33 GHz CPU speed
- 12GB memory
The latency is measured by sending one PING sample and wait for the Echoer to return the PONG sample. The sender records the time it took to receive the PONG sample and divides the result by 2. The test is repeated a number of times for each size. Note that the end-to-end latency is measured.
Interpretation of the measurements (all numbers are reported in micro-seconds):
- Bytes - The size of the DDS sample payload (UDP overhead is _not_ included) in bytes.
- Ave - Average latency
- Std - Standard deviation
- Min - The minimum latency
- Max - The maximum latency
- 50% - The 50th percentile latency
- 90% - The 90th percentile latency
- 99% - The 99th percentile latency
- 99.99% - The 99.99th percentile latency
9.1.1.1. C++ Best Effort keyed 1 Gbps¶
Bytes | Ave (us) | Std | Min (us) | Max (us) | 50% | 90% | 99% | 99.99% |
---|---|---|---|---|---|---|---|---|
32 | 29 | 0.6 | 27 | 91 | 29 | 29 | 30 | 33 |
64 | 28 | 0.7 | 27 | 328 | 28 | 29 | 30 | 34 |
128 | 30 | 0.6 | 28 | 52 | 30 | 30 | 31 | 35 |
256 | 33 | 0.6 | 31 | 285 | 33 | 33 | 35 | 38 |
1024 | 47 | 0.7 | 46 | 338 | 47 | 47 | 49 | 53 |
4096 | 80 | 0.6 | 79 | 272 | 80 | 81 | 82 | 86 |
8192 | 117 | 0.7 | 116 | 302 | 117 | 118 | 119 | 123 |
63000 | 609 | 1.0 | 606 | 630 | 608 | 610 | 611 | 624 |
9.1.1.2. C++ Best Effort Unkeyed 1 Gbps¶
Length | Ave (us) | Std | Min (us) | Max (us) | 50% | 90% | 99% | 99.99% |
---|---|---|---|---|---|---|---|---|
32 | 27 | 0.4 | 26 | 88 | 27 | 28 | 29 | 32 |
64 | 28 | 0.5 | 27 | 285 | 28 | 28 | 30 | 33 |
128 | 29 | 0.6 | 28 | 328 | 29 | 30 | 31 | 35 |
256 | 32 | 0.5 | 31 | 333 | 32 | 32 | 34 | 37 |
1024 | 46 | 0.8 | 45 | 345 | 46 | 47 | 48 | 52 |
4096 | 79 | 0.9 | 78 | 349 | 79 | 80 | 81 | 86 |
8192 | 116 | 0.9 | 115 | 335 | 116 | 117 | 119 | 123 |
63000 | 608 | 1.0 | 606 | 635 | 608 | 610 | 611 | 624 |
9.1.1.3. C++ Reliable Keyed 1 Gbps¶
Length | Ave (us) | Std | Min (us) | Max (us) | 50% | 90% | 99% | 99.99% |
---|---|---|---|---|---|---|---|---|
32 | 32 | 1.7 | 29 | 322 | 31 | 35 | 37 | 40 |
64 | 32 | 2.1 | 30 | 90 | 31 | 36 | 38 | 43 |
128 | 34 | 2.0 | 31 | 617 | 33 | 36 | 40 | 43 |
256 | 37 | 1.6 | 35 | 333 | 37 | 39 | 42 | 46 |
1024 | 52 | 1.6 | 50 | 414 | 52 | 54 | 57 | 61 |
4096 | 84 | 1.1 | 82 | 360 | 84 | 85 | 88 | 93 |
8192 | 122 | 1.9 | 120 | 604 | 121 | 123 | 126 | 131 |
63000 | 613 | 2.7 | 610 | 976 | 613 | 615 | 618 | 635 |
9.1.1.4. C++ Reliable Unkeyed 1 Gbps¶
Length | Ave (us) | Std | Min (us) | Max (us) | 50% | 90% | 99% | 99.99% |
---|---|---|---|---|---|---|---|---|
32 | 31 | 1.9 | 29 | 575 | 31 | 34 | 37 | 40 |
64 | 32 | 1.7 | 29 | 75 | 31 | 35 | 37 | 41 |
128 | 33 | 2.1 | 31 | 591 | 32 | 37 | 39 | 45 |
256 | 37 | 1.7 | 34 | 336 | 36 | 38 | 42 | 45 |
1024 | 51 | 1.5 | 48 | 328 | 51 | 53 | 56 | 60 |
4096 | 84 | 1.5 | 82 | 357 | 84 | 86 | 89 | 94 |
8192 | 121 | 1.4 | 119 | 412 | 121 | 123 | 126 | 130 |
63000 | 614 | 1.6 | 611 | 665 | 614 | 616 | 619 | 634 |
9.1.2. Raspberry Pi¶
The round-trip latencies are measured between two identical machines using the latency application available in the Connext DDS Micro example directory.
The test environment consists of:
- 2 x Raspberry Pi Model B+ with ARMv7 and 1 GB of memory
- Linux 4.14
- 1 Gbps network
Note that these tests are running on stock Raspberry Pis without any tuning for performance. In addition, these Raspberry Pis are part of a larger network used for scalability testing. Thus, the latency numbers provided here have a wider spread than the numbers in the dedicated Xeon test environment.
The latency is measured by sending one PING sample and wait for the Echoer to return the PONG sample. The sender then records the time it took to receive the PONG sample. The test is repeated a number of times for each size. Note that the round-trip latency is measured.
Interpretation of the measurements (all numbers are reported in micro-seconds):
- Bytes - The size of the DDS sample payload in bytes (UDP overhead is _not_ included)
- Mean - Average latency
- Min - The minimum latency
- 50% - The 50th percentile latency
- 90% - The 90th percentile latency
- 99% - The 99th percentile latency
- 99.99% - The 99.99th percentile latency
9.1.2.1. Round-trip Latency¶
Bytes | Mean (us) | Min (us) | 50% (us) | 90% (us) | 99% (us) | 99.99% (us) |
---|---|---|---|---|---|---|
16 | 1032.37 | 864.63 | 1010 | 1090 | 1370 | 6680 |
32 | 1045.11 | 910.63 | 1020 | 1090 | 1370 | 10500 |
64 | 1052.94 | 882.63 | 1030 | 1110 | 1380 | 9420 |
128 | 1096.95 | 915.63 | 1070 | 1150 | 1470 | 12000 |
256 | 1157.19 | 992.63 | 1130 | 1200 | 1470 | 9150 |
512 | 1294.60 | 1141.63 | 1260 | 1360 | 1660 | 7670 |
1024 | 1555.94 | 1401.63 | 1520 | 1640 | 1960 | 6440 |
2048 | 1964.19 | 1712.63 | 1930 | 2040 | 2380 | 13000 |
4096 | 2408.46 | 2109.63 | 2360 | 2500 | 2840 | 11200 |
8192 | 3181.26 | 2933.63 | 3120 | 3300 | 3660 | 12800 |
16384 | 4612.76 | 4337.63 | 4540 | 4700 | 5170 | 15000 |
32768 | 7762.30 | 7274.64 | 7740 | 7950 | 8420 | 20000 |