Efficient data agreement

3 posts / 0 new
Last post
Offline
Last seen: 5 years 2 months ago
Joined: 01/23/2019
Posts: 4
Efficient data agreement

Suppose the following data synchronisation approach: 

  • Topic A publishes every 100ms a message containing
    - an integer key
    - a related complex data structure V
  • Topic B publishes every 1s a message containing an integer, referencing the key of one of the last 20 messages of topic A. 
  • 10 different processes listen for topics A and B.
  • In a sence, topic B decides which value V from the A-topic everybody will use. When receiving a B, all processes should "consume" the topic A value referenced by the B message. They use that value V in their own work (e.g. compute something and publish some information).

The "easy" approach would be to for every process to:

  • Listen on Topic A with a history of 20
  • Listen on Topic B
  • If B is received, read its desired key and look back in the history for a message that has the desired key.
  • Use that message's value V.

Unfortunately this approach has quite some performance drawbacks:

  • Each process needs to establish the history (memory issue if the size of V is significant)
  • Each process has to search for the right message in the history (performance issue because of duplicated work)

Is there a better approach to centrally agree on which A from a fast moving stream everybody will work on?

Best regards,

Johan

Offline
Last seen: 3 months 6 days ago
Joined: 02/11/2016
Posts: 144

Why isn't B sending the actual V to work on?

In such a case the consumers can simply listen for B instead of listening for A.

Only the "decider" needs to listen to A (which now no longer needs to have a key field)

Maybe I'm missing something?

Offline
Last seen: 5 years 2 months ago
Joined: 01/23/2019
Posts: 4

Hi KickR, thanks for your quick feedback and sorry for the delay in my answer...

Your suggestion is possible and has been considered:

  • Only the coordinator listens for A and sends a B that contains V
  • Drawback would be that for a bigger V (complex data structure) and multiple instances of V, the information passes on the network multiple times (first through the A topic, then through the B topic). The A broadcast would be more efficient.

I was wondering whether there is an efficient way of referencing other published values without the inherent overhead that I described in my original post and without the requirement of "republishing" already sent information.

Moreover, this is linked to my question https://community.rti.com/forum-topic/forwarding-message-unknown-type.

Maybe I should just describe the situation in a bit more detail:

  • Lots of topics publish information at a high rate. (example A publishes V at 10Hz)
  • Some subscribers use this information at that high rate
  • Some subscribers (e.g. X, Y, Z) need this information at a much lower rate (e.g. 1 Hz). They need to consume therefore only 1 out of every 10 messages.
  • In our case, it is important that all 3 use the same values from the fast publications. Without coordination, X Y and Z could each make a different choice (e.g. X chooses the 10th V, Y uses the 11th and Z the 5th...)
  • That can be avoided by introducing a coordinator that:
    • Either repeats the consensus value. That coordinator just makes a pick at 1Hz and publishes the corresponding V. Unfortunately, our coordinator is unable to know the type of V, so it cannot easily publish it (that's where my thread https://community.rti.com/forum-topic/forwarding-message-unknown-type is talking about).
    • Or points to the consensus packet, essentially only publishing a reference to the value that needs to be used.

Depending on the size of the data, referencing could be more efficient.
On the other hand, referencing introduces the described inefficiency unless some better referencing mechanism exists.

I understand that this case might be difficult to follow, so I appreciate the time you're taking to understand it.

Thanks,
Johan