[Reposted] How to configure data reader to receive data in the order that are sent?

5 posts / 0 new
Last post
Offline
Last seen: 10 years 7 months ago
Joined: 12/19/2013
Posts: 5
[Reposted] How to configure data reader to receive data in the order that are sent?

Please help, I am in a crunch mode for a demo and my DDS app is not behaving as expected.

I have two apps running in a standalone linux box.  App1 has one publisher with 4 datawriters, one for each topic.  This is supposed to be a realtime app which is required to write hundreds of messages per topic for each 640ms.  App2 is the receivers of all the messages that App1 publishes, one receiver per topic.  I am using UDP unicast for transport with ASYNCHRONOUS_PUBLISH_MODE_QOS and RELIABILITY_QOS.


The goal is to have App1 generates n number of messages per topic and send them in order from topic 1 to topic 4.  The expectation is I want App2 to receive the messages in the same order that APP1 is sent and they all should be received within 640ms.  Here are the problems I am having:

1.  For a sample of 50 messages per topic, App1 only took 30 ms to write the 200 messages (50 for topic 1, then the next 50 for topic 2.....).  However, on the receiver side, it took so long to receive all 200 messages (1000 - 2000 ms).  Observing the trace statements with timestamp, I can see that one reader receives a whole bunch, then there is a pause period before the other readers get to receive its messages.  Is it the writter that did not flush the messages or the reader that are slow and how can I improve performance for this problem?  I need to have App2 receives all messages to act upon them within the 640ms.

2. The messages received by App2 are not in the order that was sent.  For example, it sometimes receives message from topic 4 before message from topic 3.  And in other instances, the message within the same topic are not arrived in order.  How do I force the receiver (or writer) to get the messages in the order that they are sent?


Any help is greatly appreciate.  Thanks.

Organization:
Keywords:
rip
rip's picture
Offline
Last seen: 3 hours 30 min ago
Joined: 04/06/2012
Posts: 324

Hi Michelle,

Do you have batching enabled? (note: Batching is probably the answer to your problem)

How many participants are in use?

How big are the individual samples that are being written?

Regards,

rip

Offline
Last seen: 10 years 7 months ago
Joined: 12/19/2013
Posts: 5

I tried to enable batch but with batching on, app 2 does not get the last few messages app1 sent, looks like the msgs are stucked in the queue waiting to meet its quota before sending out,    This is the parameter i used for enabling batch:

<batch> <enable>true</enable>
 <max_data_bytes>30720</max_data_bytes>
<!-- 30 KB -->
 <max_samples>LENGTH_UNLIMITED</max_samples>
<!-- Batches can be flushed to the network based on an elapsed time. -->
 -<max_flush_delay> <sec>DURATION_INFINITE_SEC</sec> <nanosec>DURATION_INFINITE_NSEC</nanosec> </max_flush_delay>

</batch>

For each app, I have 2 domains with one participant for each domain.

The individual sample of app 1 size is about 1-2K.  But App 1 also receive very large sample, about 52K.

Thanks,

Michelle

 

rip
rip's picture
Offline
Last seen: 3 hours 30 min ago
Joined: 04/06/2012
Posts: 324

Batches can be flushed:

1) Manually (aDataWriter.flush();// see 6.3.9 in the documentation)

2) Due to bucket size

3) Due to timeout

Because you've set the max_flush_delay to DURATION_INFINITE ... Timeout isn't going to happen. 

Since you've stopped writing... bucket won't be filled.

Since you're not calling .flush()... see where I'm headed ? :)  You've actually demonstrated /exactly/ what I'd expect to see in your case.

Generally we recommend that you set the max_flush_delay to something, but in your specific case (writing a known number of instances), I'd recommend simply calling .flush() on your data writer (and keeping in mind that for a full application, maybe max_flush_delay is a better way).

For the other questions, #of participants seems correct, and the data size is under 1 UDP datagram max size, so I won't go into what I was looking for -- as Batching is still the answer.

 

Offline
Last seen: 6 years 4 months ago
Joined: 01/31/2011
Posts: 37

Hi Michelle,

Is there a reason you are using the Asynchronous Publisher when publishing your data? The Async Pub is the reason you see such low times to write the data (30ms for 200 messages).  The call to write() occurs from your application thread.  When you don't use the Async Pub, the call to write() does 3 things:

  1. serialize the data
  2. store the data into the DataWriter's queue
  3. send the data through the network (invoking a kernel operation such as sendto()).

After the third step, control is returned to the user.  All of the above occurs within the context of the calling thread.  When you use the Async Pub, only steps 1 & 2 occur in the calling thread. Step 3 occurs in a separate RTI thread called the Asynchronous Publishing thread, and there is one Async Pub thread per DDS::Publisher.

A FlowController (link to HTML documentation) associated with the Async Pub determines when the queued data is actually sent.  The default FlowController uses a scheduling policy called Earliest Deadline First (scheduling policies link).  This policy should ensure ordering across calls to DataWriter::write(), but only if all DataWriters are using the same Async Pub thread... that is, only if all DataWriters are created from the same DDS::Publisher.

So - are all of your DataWriters created from the same DDS::Publisher?

Second question: have you modified the reliability protocol settings? Given that you're writing data in a very bursty fashion (and the data will be flushed to the network in a burst), your reliability protocol needs to match the behavior of the data.  It's possible that data is lost and must be repaired, and the round trip for the repair is why it takes so long to receive the full set of data.  See here for more information (Which QoS parameters are important to tune for throughput testing? and What are "heartbeats" and how are they used in RTI Connext 4.x and above?).  Also, you'll want to make sure your Operating System is tuned for bursty performance (Tune Your OS for Performance).

Are you using a listener or a waiset to receive the data? A listener wil process the data as it arrives, whereas a WaitSet will let you process the entire data set (Use WaitSets, Except When You Need Extreme Performance).  If you use a WaitSet, (in your own application thread) you can wait for data to arrive on all 4 topics and then access the data in the order that you see fit.

The last thing I'll mention: there is a QoS policy called Presentation that governs how data is presented to the subscribing application.  This policy can explain why you see data out of order, even within the same topic.  The data is published in the exact order that you provide, and it is received in the exact order that was published.  However, this QoS policy controls how that data is then presented to the application when the user calls take().  You'll want to set your access scope to at least DDS_TOPIC_PRESENTATION_QOS  and ordered_access to true to ensure data is presented in the exact order it was published (within a single topic).  Note this QoS setting must be applied at both the DDS::Publisher and DDS::Subscriber level.

You'll see there are two other interesting aspects of the Presentation QoS policy:

  1. A "coherent access", which determines whether data is presented as a group or as individual samples.
  2. A "Group Ordered Access Kind", which orders the data across multiple DataWriters belonging to the same DDS::Publisher.


Hope this helps!

-sumeet