[SOLVED] rticonnextdds_connector + node.js + durability problem

8 posts / 0 new
Last post
Offline
Last seen: 2 years 3 months ago
Joined: 03/20/2017
Posts: 4
[SOLVED] rticonnextdds_connector + node.js + durability problem

Hello dear Support,

We started evaluating rtiddsconnector for node.js and I have 2 questions on it:

1. What are you plans and roadmap on adding more support for node.js - official release and support, fulll API support etc.?

2. We have problem working with it and I would appreciate your help.

The problem is that we have many lost messages when sending reliable and durable messages between two instances of node.js on the same machine, in high rate during two seconds.

We have publisher and subscriber applications. Publisher writes "Shape" messages for 10 seconds, and counts the messages. Subscriber receives and counts the received messages. QoS is defined as reliable and dirable. I expect the number of received messages to be equal to the number of sent messages but it is not.

If I start subsciber first (receiver.js) and then I start publisher (sender.js), then the number of messages received of subscriber is about 70% of messages sent by publisher.

If I start publisher first, and then only after all the messages are sent I start the late joining subscriber, it receives only about 256 messages, not starting from the first message sent.

receiver.js, sender.js and QoS files are attached.

We do not experience such problems with C# API working with similar QoS.

Thank you and the best regards,

Alex Dubinsky.

Elbit Systems.

AttachmentSize
File QoS file5.8 KB
Plain text icon receiver735 bytes
Plain text icon sender917 bytes
Organization:
gianpiero's picture
Offline
Last seen: 5 months 2 weeks ago
Joined: 06/02/2010
Posts: 174

Hello Alex

Thanks for trying out the connector!
Let me try to answer to number 2 first. 

You are correctly using the builtin qos profile BuiltinQosLibExp::Generic.StrictReliable. After that in your qos file you are changing the data reader qos setting history to KEEP_LAST with depth 1:

<datareader_qos>
<durability>
<kind>TRANSIENT_LOCAL_DURABILITY_QOS</kind>
</durability>
<history>
<kind>KEEP_LAST_HISTORY_QOS</kind>
<depth>1</depth>
</history>
<reliability>
<kind>RELIABLE_RELIABILITY_QOS</kind>
</reliability>
</datareader_qos>
</qos_profile>
</qos_library>

 

 To be really strict reliable the history kind has to be <strong>KEEP_ALL</strong> otherwise you may loose samples due to other resource limits. See this chapter of the user manual:

https://community.rti.com/static/documentation/connext-dds/5.2.3/doc/manuals/connext_dds/html_files/RTI_ConnextDDS_CoreLibraries_UsersManual/index.htm#UsersManual/ControllingQueueDepthwithHistory.htm?Highlight=strict reliable

By changing the reader qos to this:

<datareader_qos>
<durability>
<kind>TRANSIENT_LOCAL_DURABILITY_QOS</kind>
</durability>
<history>
<kind>KEEP_ALL_HISTORY_QOS</kind>
<!-- <depth>1</depth> -->
</history>
<reliability>
<kind>RELIABLE_RELIABILITY_QOS</kind>
</reliability>
</datareader_qos>

 it should work fine and you should not loose any samples.

 Also, you want to have two different participant in you XML: one containing only the Reader and one containing only the reader. 

In your example there was one participant containing both. That means that the sender will create a reader as well and, since is reliable the writer will have to be sure that one received the samples too. 

 I attached the xml with my modifications. 

 I also did some small modification to the receiver and the sender:

  • at the beginning i put a sleep of 4 seconds after creating the input and output: that will allow for the entities to discover. 

 A suggestions. In your code you do:

 console.log(JSON.stringify(input.samples.getJSON(i)));

 to print the whole sample: that can be slow. The way it works is that core create a json string and then we covert it to a js object. Creating the string can be pretty slow if the sample is big. Also, to print it on the screen you are transforming back the js obj to a string!  

I would suggest you to use the direct API to access a specific field:

console.log(input.samples.getString(i,'color'));

I attached my modified sender and receiver as well. 

As for the first question:

1. What are you plans and roadmap on adding more support for node.js - official release and support, fulll API support etc.?

Connector is an experimental feature that has been developed as a research project. The plan is to gather customer feedback and understand what is the interest before adding official support. 

Regarding the full API support: connector is a simplified API. We do not plan to support the full DDS Api at the moment. That being said, we are open to suggestion on how to improve the current API. 

I hope I was helpful,

Best, 
Gianpiero

File Attachments: 
Offline
Last seen: 2 years 3 months ago
Joined: 03/20/2017
Posts: 4

Hello dear Gianpiero,

Thank you very much for your support!

Question #2 (lost messages):

  Problem A - when receiver is started first, some messages are lost.

After the suggested changes, this issue is solved. I get exactly the same number of messages on the receiver side.

But as far as I understand, I am sending new instances, not samples of the same instance. I am not interested in getting multiple samples of the same instance on the subscriber side, I am interested in getting last versions of all instances. So, why number of samples in history is important here?

When working with C# API, I am used to set KEEP_LAST_HISTORY_QOS with depth 1 and everything works, there are no lost instances. Or, may be rticonnextdds-connector does not make difference between instances and samples? I publish each instance with different key (color field).

  Problem B - when receiver is started 10 seconds after the sender, only first 256 messages are received.

This issue is still there. I start the sender. Then I wait it to work for 14 seconds (4 seconds of sleep and 10 seconds of publishing) and only then (without stopping the sender, of course) I start the receiver (subscriber). It gets only 256 first messages. But I defined durable QoS, and I expect late-joining subsciber to get everything, this way it works for me in C# API. Will appreciate your help with that...

I test in on Windows 7, 64 bit, node.js version 7.7.3, if it is important.

Question #1 (plans and roadmap):

I see. Node.js is in trend today, and we are requested to follow technology trends. So we are required to develop new products with node.js and we would like to continue using DDS from node.js as well (and DDS is a great technology we are using today from C# and Java applications. You have a great product and great  support).

Specifically today I see a lack of the following features in node.js API:

1. Define/change topic, domainID, partition and filter condition programmatically in runtime. I have an idea of changing the DDS QoS file programmatically from my code and restart the DDS to achieve that, but I would like to have a nice programmatic solution. :-)

2. Get notified about participants status change (e.g. on_liveliness_changed) and Id of new/dead participants.

3. More issues will come... To select the DDS technology for our next project in node.js and be on the safe side we would like to know the DDS support for node.js is in development and will be extended over time...

Thank you very much again!

Alex.

 

gianpiero's picture
Offline
Last seen: 5 months 2 weeks ago
Joined: 06/02/2010
Posts: 174

Hello Alex,

First of all I want to clarify that the JS or PY Connector are using the same DDS libraries that C# uses. So all the QoS Settings have the same meaning and they should work the same way. That said let me try to answer to your questions the best i can. 

Question #2 

Problem A 

Even if you are sending different instances, each one of those instances is a sample. In your specific example each sample belongs to a different instance; but they still have to be stored in the reader queue and then processed by the application. 

I should look more in deep to both application but, if I have to guess, I think the reason why you notice a difference in the C# compared to the JS code is because the processing of each sample takes way longer. As I explained in the post above, in Javascript you are using dynamic data and in your code your processing for each sample is the following:

console.log(JSON.stringify(input.samples.getJSON(i))); 

That is slow! If you try to remove that line, and you just increase the counter, you will not lose any sample. Of course that is not realistic, and that’s why i think using keep_all is the right solution. 

Problem A (late joiner)

I think you identified a bug in the javascript layer. I spent some time investigating. So if you don’t use the connector.on but instead you do polling (just put a for loop instead):

 

Using connector.on:

// Subscribe

connector.on('on_data_available',

    function () {

        input.take();

        var len = input.samples.getLength()

        // console.log(len)

        for (i = 1; i <= len; i++) {

 

            if (input.infos.isValid(i)) {

                console.log(JSON.stringify(input.samples.getJSON(i)));

//                console.log(input.samples.getString(i,'color'));

                received_count++;

            } else {

                console.log(">>> INVALID SAMPLES");

            }

        }

    });

 

Using a for loop:

// Subscribe

for(;;) {

        input.take();

        var len = input.samples.getLength()

        // console.log(len)

        for (i = 1; i <= len; i++) {

 

            if (input.infos.isValid(i)) {

                console.log(JSON.stringify(input.samples.getJSON(i)));

//                console.log(input.samples.getString(i,'color'));

                received_count++;

            } else {

                console.log(">>> INVALID SAMPLES");

            }

        }

}

 all the sample is received! That means that there is something wrong with the mechanism that implements the connector.on! Internally we have a wait set that gets triggered when there are new data. At that point the function you specified into the connector.on gets called and the take will reset the waitset. Unfortunately

 I think there is a bug in how i implemented this: I first notify the js listeners and then go back to wait in c. I think what is happening is that, by the time i finish to notify all the js listeners and then call the take, all the sample are received by the middleware so the take resets the wait set and connector.on is never called again. 

That’s why if you poll (the for loop) you will get the samples! What is not happening is the triggering of the function that does the read in the right way. 

 I changed the js layer of the connector to call the c wait set before calling the javascript listener and that seems to fix the problem on my tests(see attached zip file): started the sender, waited for it to send all the samples, started the receiver, received all the samples.

 Could you try and confirm that for me? If it works for you, i will review the code with some coworker and then update the github repo. 

To try, use the attached file called rticonnextdds-connector.js and replace the one you have in node_modules/rticonnextdds-connector 

Question #1

I agree with you: nodejs is very trendy! We are planning to add more features to the connector js. Filters and the capability of adding new reader/writers dynamically as well as discovery topics and some status changes. The goal for the connector is to give enough flexibility but a smaller set of API. I also see that there is space for a full dds api in node separated from the connector. 

My suggestion is to contact your distributor and maybe set up a meeting with product management here in RTI and we can figure out together what the need and how to get there. What do you think?

 

Best,
  Gianpiero

Offline
Last seen: 2 years 3 months ago
Joined: 03/20/2017
Posts: 4

Hello Gianpiero,

Question #2 

Thank you very much for your explanations and the fix.

Concerning the fixed rticonnextdds-connector.js - Yes, I confirm, it works fine, now I get all the samples.

I tested different scenarios: receiver started before sender, receiver started during sender publishing, receiver started after sender finishes publishing - everything works. See attached screenshot.

It works even without the artificial 4 second sleep. It works with both slow JSON.stringify and without it.

So, it will be great to fix the npm package as well.

Question #1

It is good to know that you have plans for node.js.

Yes, your suggestion is great, we already have a scheduled meeting with local distributor on different other issues, so now we will discuss node.js as well and then we will see how to proceed.

Thank you very much again,

Best regards,

Alex.

 

File Attachments: 
gianpiero's picture
Offline
Last seen: 5 months 2 weeks ago
Joined: 06/02/2010
Posts: 174

Alex,

Thanks for taking the time to test. I will do a code review internally and update the connector on github and npm. I will let you know when is up. 

Best,
  Gianpiero

gianpiero's picture
Offline
Last seen: 5 months 2 weeks ago
Joined: 06/02/2010
Posts: 174

Alex,

The fix should be now on github and on npm. (v 0.2.1) 

Best,
  Gianpiero

Offline
Last seen: 2 years 3 months ago
Joined: 03/20/2017
Posts: 4

Hello Gianpiero,

Great news! Thank you very much for your great support!

Best regards,

Alex.