Performance of Connext 5.3.1 on Beagle Black Bones (Perftest Benchmark)

2 posts / 0 new

Last post

Wed, 10/10/2018 - 10:49

shweta

Offline

Last seen: 6 years 5 months ago

Joined: 10/24/2013

Posts: 15

Performance of Connext 5.3.1 on Beagle Black Bones (Perftest Benchmark)

Hello,

I am running RTI Connext 5.3.1 on a cluster of Beagle Black Bone (BBB) devices for a project. To benchmark the baseline maximum throughput for different data sample sizes, we used RTI DDS perftest 2.3.2. I had some questions about the observed performance results and I will be very grateful if you can help me in understanding the reason behind the observed results.

Test Setup: We are running perftest publisher on one BBB device and the perftest subscriber on another BBB device. For a given data sample size, the publisher sends data as fast as it can (we have not used either sleep or spin) to the subscriber for 5 minutes. Each test was repeated 3 times and the plotted results are the average values across these 3 runs. (Error bars denote std. dev.). This test was performed under both- default reliable QoS settings and bestEffort QoS settings (with -bestEffort commandline parameter).

Reliable Test Configuration:

Publisher command line parameters: ./perftest_cpp -pub -cpu -noPrintIntervals -nic eth0 -transport UDPv4 -dataLen <dataLength> -batchSize 0 -executionTime 300

Subscriber command line parameters: ./perftest_cpp -sub -cpu -noPrintIntervals -nic eth0 -transport UDPv4

Best Effort Test Configuration:

Publisher command line parameters: ./perftest_cpp -pub -cpu -noPrintIntervals -nic eth0 -transport UDPv4 -dataLen <dataLength> -batchSize 0 -executionTime 300 -bestEffort

Subscriber command line parameters: ./perftest_cpp -sub -cpu -noPrintIntervals -nic eth0 -transport UDPv4 -bestEffort

Questions about observed results:

1. The perftest publisher is not using sleep/spin and is sending data as fast as it can. Yet, we are not able to saturate the CPU. The attached graph: cpu_pub.png shows the CPU utilization of perftest on publisher side and the graph: cpu_sub.png shows the CPU utilization on the subscriber side. I wanted to understand what is the bottleneck resource which is throttling the publisher and thereby limiting the maximum observed throughput (graph: throughput_pks). This behavior is observed even for the bestEffort configuration.

2. Why does the CPU utilization for both publisher and subscriber decrease with increasing dataLength sizes. I understand that the throughput in packets/second decreases as it takes longer to send larger messages which may impact the CPU utilization, but are there other reasons behind the observed trend.

Thank you for your time and help.

Shweta

Attachment	Size
Throughput in packets/sec under Reliable and BestEffort QoS settings.	33.4 KB
Publisher CPU utilization under Reliable and BestEffort QoS settings	33.72 KB
Subscriber CPU utilization under Reliable and BestEffort QoS settings	32.71 KB

Organization:

Vanderbilt University

Keywords:

Mon, 10/15/2018 - 05:33

ajimenez

Offline

Last seen: 2 years 9 months ago

Joined: 09/29/2017

Posts: 21

Hi Shweta,

Thank you for all the information that you have provided.

Addressing your questions:

1. The reasons why RTI Perftest is not using the whole CPU might be:

1a. The write loop is being executed in one single thread. This thread will make use of the 100% of one core of the machine, but not the rest. This would explain why you don't see the whole CPU being used.

Could you verify if one core is 100% used of your CPU?

1b. The bottleneck is the network card (since I believe the boards where you are running your test might not have a great nic), so when the write() operation is called the sample is being sent via a socket and that operation is a blocking operation.

2. As I just mentioned, the write operation in the writer will copy the sample you are about to send and then will try to send() it via a socket. That operation in the socket is blocking.

For small sizes, the write operation takes less time (both copy and send()), so the loop where we do all the process of sending the sample is exercised more often (more CPU is used).

For large sizes, the write() operation and the copies of data take more time, so the CPU usage is less.

In conclusion, when the middleware is writing large data, the block time to do the write() call is longer, thus the CPU usage is used less.

Does this make sense with what you see?

Please let me know if you have any other question.

Best,

Antonio

Navigation

Performance of Connext 5.3.1 on Beagle Black Bones (Perftest Benchmark)

RTI Community Portal Terms of Use

Search

Secondary menu

You are here

Navigation

User login

Performance of Connext 5.3.1 on Beagle Black Bones (Perftest Benchmark)

RTI Community Portal Terms of Use