Problem with Domain Participant Placement inside a class

7 posts / 0 new
Last post
Offline
Last seen: 3 years 4 months ago
Joined: 05/30/2022
Posts: 4
Problem with Domain Participant Placement inside a class

Hi all.

I need your help. Currently, I'm trying to initiate the Domain Participant inside a class. I have declared the variable inside the header file like this.

dds::core::QosProvider qosProvider;
dds::domain::DomainParticipant shmParticipant;
dds::domain::DomainParticipant rtkOutParticipant
 

After that, I initiate the smart pointer inside the class constructor like this:

FMEADDS::FMEADDS(): qosProvider(nullptr), rtkOutParticipant(nullptr), shmParticipant(nullptr)
 
I do the assignment inside my initialization() function. like this:
 
shmParticipant = dds::domain::DomainParticipant(0, qosProvider.participant_qos("::SHM_LOC_Profile::SHM_LOC_Participant"));
rtkOutParticipant = dds::domain::DomainParticipant(rtk_domain_id, qosProvider.participant_qos("::SHM_LOC_Profile::SHM_LOC_Participant"));
 
After that, I use the shmParticipant and rtkOutParticipant variable to declare the datareader and datawriter in another function (let's call it subscriber()).
 
But unfortunately, the data writer lags about 5 seconds. The data writer actually publish the message after 5 seconds.
 
If I try to declare the domain participant inside the same function, it is not lagging, it can publish the data exactly when the program is launched. 
 
So, when the Domain Participant isn't placed in the same function as the datawriter/reader, the corresponding data writer/reader will start to receive/publish the message after some delay (it is about 5 seconds). This problem happens when the datawriter/reader use the Domain Participant from the private member of the class (accessing private member). When the domain participant is in the same function as the datawriter/reader, it will publish/receive the data immediately after it launched.
 
Does anyone know how can I fix the delay of this? or Is that any more better way to initiate the DomainParticipant outside the main function? Can I create the Domain Participant far away, or I should create the Domain Participant right before the datawriter/reader implementation?
 
Thank you. I appreciate any answer here.
 
I also attached the code
AttachmentSize
File The Code5.61 KB
Howard's picture
Offline
Last seen: 13 hours 53 min ago
Joined: 11/29/2012
Posts: 673

Sorry, I'm unclear about the situation that you've described.

When you say "data writer actually publish the message after 5 seconds", do you mean that the first time DataWriter::write() is called is after a 5 second delay after the DataWriter was created?

That would imply that the application was blocked in some function call after creation and before the DataWriter::write() was called.

OR do you mean that the DataReader that is supposed to receive the data doesn't receive the data for 5 seconds after the DataWriter has begun to send data (after the first time that the DataWriter::write is called).  How many times has the DataWriter::write() sent data before the DataReader starts receiving data?

Also, I assume that there's another application involved?  In the code that you included, the DataWriter sends data to 1 topic, but the DataReader subscribes to a different topic.  If you're expecting some sort of round trip/ping pong effect, then there must be another application that subscribes to the first topic and publishes the second topic?

Where are you measuring the delay?  In the same application that the data was sent (round trip required) or in the other application that is supposed to receive the data from the writer, one way.

If there is another application, do you know about the timing for the other other application for receiving the data and sending data back?

Also, what QOS settings are you using for the Topics?

Finally, your code:

        waitset.dispatch(dds::core::Duration(0.5));

I assume you want the dispatch() to wait for up to 0.5 seconds before it times out?

If so, that code is wrong.  dds::core::Duration() does not take a floating point number representing the timeout.  The constructor takes 2 parameters, both int32, one is seconds and the other nanoseconds.  The nanoseconds parameter is initialized to 0 by default.  If you use 0.5 for the seconds parameter, I think that this will be changed to 0 by the compiler (usually with a warning)...and thus the timeout that is set is 0, and thus the dispatch() call will immediately return...and then the code immedially calls dispatch() again...in nonblocking loop..using up the full processing power of a core...

 

Offline
Last seen: 3 years 4 months ago
Joined: 05/30/2022
Posts: 4

Thank you for your feedback,

Yes, I'll try to use dispatch() function with two parameters. Thanks for the correction.

Yes, I'm using two different program. This is the first program which publishes RTK data. The second program will subscribe to the RTK data. 

I measure the delay by printing the time stamp on receiving node. 

I conducted more tests about this issue. This is my pipeline:

1. I created the domain participant inside the class initialization and save it as a private member

2. I construct the data writer and data reader based on that domain participant. The first data that is received by the receiving program is the data number 11 after the data writer/reader is being constructed. (This is the delay that I notice, I wonder why the receiving node won't receive the data from the data number 1)

3. I reconstruct the data writer and data reader with the same domain participant. In this case, the first data that is received by the receiving program is the data number 21 after the data writer/reader is being constructed. 

I don't know what is happening here. What should I do to fix this?

File Attachments: 
Offline
Last seen: 3 years 4 months ago
Joined: 05/30/2022
Posts: 4

I also have another findings. If I try to run the same code on 2 more PC. One PC works well, and another one has the same issue. This is the result when I run the code with no error.

As you can see from the top line, the data number 2 is being received. I don't know why it behave differently in different machine. 

Howard's picture
Offline
Last seen: 13 hours 53 min ago
Joined: 11/29/2012
Posts: 673

Hi,

When you're running your tests, are the pub and sub applications on 2 different computers?  Or running both on the same host? 

It seems like you're running on the same machine?  How are you getting the output of the 2 processes to show up in the same shell?

In any case, your test results show that it can take 1 or 2 seconds before your receiving app starts receiving data.  In your last run, it takes less than 0.2 seconds.

Fundamentally, when you start 2 DDS applications, they need to exchange information and complete a discovery process before user data can be successfully sent and received between the two applications.

If you create a participant/datawriter and then immediately send data, it's highly likely that the first data you send will not be received.  When will the first data sent be received depends on how quickly the discovery process completes.  It usually very fast (but does take non-zero time)...and for two applications with just a datawriter/datareader each, should take less than 1 second unless:

1) there is CPU contention (other apps on the computer consuming CPU)

2) network contention (assuming that's between 2 apps on 2 hosts)

3) network issues like not being to pass network packets sent by DDS in a timely manner, perhaps related to the ability of the network elements (NIC cards of the hosts, switches, routers) to pass multicast packets (which are used by DDS for discovery by default)

You can use wireshark to capture the network traffic and then see how long it takes for discovery to complete and how long for user data to start showing up. 

If you search on the community website for "wireshark", you'll see lots of articles/examples that will help you use wireshark to understand the on-the-wire packets sent by DDS.

For your final point that using different machines you get different results...well that's pointing the finger at something other than DDS causing the problem.  What's different about the 2 machines?  Are you using the same binary executable or did you compile them separately for each machine?  Are there OS differences?  How are the machines different?  CPU?  NIC card?  Are they connected to the same network through the same switch?  etc.

 

Offline
Last seen: 3 years 4 months ago
Joined: 05/30/2022
Posts: 4

Hi Howard, 

Thank you for giving me the idea that I should debug in the networking things.

By the way, I'm using docker container right now. So I packed all those two applications inside the docker. So it will exchange the data only inside docker.

Actually, I tried several things, I noticed that if I use  "--net=host" argument when I tried to launch the container, the issue exists. But when I delete the argument, the issue dissapears.

I think the problem is partially solved. But do you know what is the reason behind this?

Howard's picture
Offline
Last seen: 13 hours 53 min ago
Joined: 11/29/2012
Posts: 673

Sorry, dealing with docker is just adding another layer of abstraction under DDS...and there are plenty of issues that can be introduced or solved with the configuration of the abstract network layer, not to mention the actual network layer.

However, if you only need the two applications to communicate within a Docker container and not have to communicate over a network to apps outside the Docker container, then you can configure Connext DDS to disable the UDPv4 network transport and only use shared memory between the apps in the Docker container.

I would search community.rti.com to find articles and other postings on using DDS with Docker as well as configurating the builtin transports used by Connext.