Unable to change some QoS settings for DDS router

11 posts / 0 new
Last post
Offline
Last seen: 1 year 10 months ago
Joined: 12/20/2022
Posts: 6
Unable to change some QoS settings for DDS router

I have some issues with rtiddsrotuer, where some QoS settings i give it doesnt seem to bite. 

In the QoS I have given datareader_qos, datawriter_qos and topic_qos destination order: "BY_SOURCE_TIMESTAMP_DESTINATIONORDER_QOS". But when i look in Adminconsole i see that the router requests _RECEPTION_. And the main issue is when the unpacking router then supplies _RECEPTION_ to my applications (readers) expecting _SOURCE_ (which is incompatilbe). 

I have the same issue with ordered_access. I give PublisherQoS and SubscriberQoS <ordered_access>true</ordered_access>, but the router still requests false when i look in adminconsole. 

So a brief background. I am running version 6.1.1. I try and link two DDS-LAN together using two routers connected with WAN over a dedicated point to point cable. This to allow only certain topics to be shared between the DDS-LAN. The QoS file i feed to the router is quite comprehensive, but i think i roughly know my way around it (=meaning there could be other settings made by someone else i have overlooked). I use this general QoS file for the participant on both LAN-sides. The QoS file is already running in the individual LAN:s, i just want my router to also be a participant :)  Current tests are run with Shapesdemo, using my system QoS file. All this will be running in the same building, so delays should not be significant. The router config is the simple tcp_transport.xml example, with adjustment to network adresses and discovery qos. 

So the question: is it not possible change these settings for the router, or do i need to add some more options to my QoS file? any suggestions of places to troubleshoot?

Thanks!
/KungLudvig 

Howard's picture
Offline
Last seen: 1 week 6 days ago
Joined: 11/29/2012
Posts: 618

Well, you should be able to modify those Qos settings for the entities (DW/DR/P/S) created by the Routing Service.  But would need to see your Routing Service configuration file to figure out what you're doing wrong...

 

Offline
Last seen: 1 year 10 months ago
Joined: 12/20/2022
Posts: 6

Hello! 
Thanks for that brief hint. I will attach the config and qos file. 

so the system is bascally PC1[shapesdemo publish w/ Ludvig::GlobalQoS --> routingservice w/ TCP_1] --> PC2[routingservice subscribe w/ TCP_2 --> shapesdemo w/ Ludvig::GlobalQoS]

This works if i run default  QoS in the demos (but still routing config-file as attached) , so im faily configdent the IP settings are fine. 
i get errors as expected if i mess with the QoS file, so im fairly confident that im reading the correct file and Qos environment variables are defined correctly. 

Thanks for the assistance! 

Howard's picture
Offline
Last seen: 1 week 6 days ago
Joined: 11/29/2012
Posts: 618

So, if you are using QoS Profiles to defined the QoS configurations of your DDS Entities (Participant, Publisher/Subscriber, DataWriter/DataReader), you need to refer to those profiles in your Routing Service configurations.

I think that you're relying on the "is_default_qos=true" attribute to try to replace the default values of the QoS Policies of the DDS Entities so that they are created with what you specify.

While that may work for your own applications, it generally won't work with RTI Services like Routing Service...which may load your QoS file, but it may not actually use the QoS values from your file unless you configure it to do so explicity...and not via the "backdoor" of change what the default values are.

Actually, relying on and using "is_default_qos=true" to set the QoS of different DDS entities is not considered best practice.

Take a look at these articles for best practice guidelines:

https://community.rti.com/best-practices/configure-your-qos-through-profiles

https://community.rti.com/best-practices/qos-profile-inheritance-and-composition-guidance

Then once you've defined QoS Profiles...which you already have (although I must point out that configuring TopicQos is virtually usefless...really, it has no effect unless you are calling code that created a DataWriter or DataReader with a parameter indicating that it should use the Topic's QoS values as its own...), then you must refer to the profiles in your Routing Service configuration file to tell Routing Service to use a specific profile to create the DDS Entities that it uses.

I see that you've done this for one of your participants,

<domain_route name="DR_UDPLAN_TCPWAN">
    <participant name="1">
        <domain_id>0</domain_id>
        <participant_qos base_name="Ludvig::GlobalQoS"/>
    </participant>

but would need also do with the publisher/subscriber/datawriter/datareader entities as well

for example:

      <sessionname="LuddeSech2">
        <publisher_qosbase_name="Ludvig::GlobalQoS"/>
        <subscriber_qosbase_name="Ludvig::GlobalQoS"/>
 
and then:
 
        <auto_topic_route name="LudvigForward">
          <publish_with_original_info>true</publish_with_original_info>
          <input participant="1">
            <datareader_qos base_name="Ludvig::GlobalQoS"/>
            <allow_topic_name_filter>Square</allow_topic_name_filter>
            <creation_mode>ON_DOMAIN_MATCH</creation_mode>
          </input>
          <output>
            <datawriter_qos base_name="Ludvig::GlobalQoS"/>
            <allow_topic_name_filter>Square</allow_topic_name_filter>
            <creation_mode>ON_DOMAIN_OR_ROUTE_MATCH</creation_mode>
          </output>
        </auto_topic_route>

 

this assumes that you have defined a QoS Profile named "GlobalQoS" in your QoS Library "Ludvig".

Offline
Last seen: 1 year 10 months ago
Joined: 12/20/2022
Posts: 6

Hello!

Thanks a lot Howard for the reply, it was very helpful! 

I have a followup question. after many hours of frustration i have noticed that some QoS settings for the tcp transport are behaving not quite as i expect.

When I use a QoS for the TCP transport that builds on the GlobalQoS attached in this conversation, I get different behaviour between a written java application and the Shaped Demo (it works just fine with the demo). The additional writer/reader QoS involve topic filters and some resource allocation and more. I know it might be tricky to give general recommendations, but here goes:

The information is published with 1 Hz and around 15 instances, so there is a low load. When i open Admin Console i can see that the Router 1 has data going into the tcp-domain. But when i look at router 2, there is no data coming out of the tcp-domain (and thus obviously not published in the second DDS-domain). When I look with wireshark the tcp traffic i absent. I see some "attempts" or hanshakes, but no traffic such as the one i see with the ShapedDemo. I can get my application data flowing if I use default QoS for the transport, also some other system QoS allows the data to flow. From this i draw the conclusion that the writer-application is not faulty. I can get my system to work, but it would be nice to have an explanation for the strange behaviour. 

i draw the conclusion that the QoS itself is not neccesarly incompatible with TCP-domain since it allows Shapesdemo to work. Do you have any suggestions of QoS to look out for or methods to troubleshoot further. If nothing comes to mind i could try produce a clean example of the QOS that im using. 

Thanks a lot again! 
/Ludvig 

Howard's picture
Offline
Last seen: 1 week 6 days ago
Joined: 11/29/2012
Posts: 618

Points to note:

  • With DDS, QOS compatibility is only considered between 2 end points (a DW/DR pair). 
  • 2 Participants must be able to connect before their endpoints can connect.  This is affected by whatever changes you make in the transports used by the partcipant as well as the discovery QOS parameters.

I think your use case is

<app> --UDP LAN--> Routing Service ---TCP WAN--> Routing Service --UDP LAN--> <app>

I think you said that when the <app> is ShapesDemo, it works.

But then you change the <app> to a Java-based app, and at the same time you've modified some QoS values.  And then it doesn't work.

If you can't get data from your <java-app> to your <java-app>, the problem could be in any (or all) of these places

  • <Java-app> to RS
  • RS to RS
  • RS to <Java-app

And each of those "connections" are configured by QoSs (which also configures the transports UDP or TCP used by the participant).  And these "connections" are independent of each other...that is, just because one of these connections isn't configured properly doesn't affect the ability of any of the other connections to connect (if they are properly configured).  Of course, if you're using the same QoS profile to configure multiple of these connections, an improperly configured QoS profile would affect all connections that use it.

As far as I understand, the TCP transport is only used in the RS to RS connection.

Given, you were able to get this to work in your <shapesdemo> to RS to RS to <shapesdemo> experiment, then the same QoS configuration used for the WAN participants for both Routing Services should also work...allow the RS to connect via TCP...independent of whatever change you make to the QoSes that affect the LAN side (either the RS or the <java app>).

So, if you use the exact same QoS configuration for the WAN participants that you used (and worked) with the <shapesdemo> scenario, but now applied to your <javaapp> scenario, but there's no data end-to-end....then the problem is likely in whatever QoS changes that you've made for either the <javaapp> or the LAN participant side of RS (including the QoS of the DW/DR used by the LAN participant side of the RSes).

Have you tried to run a <javaapp> to <javapp> directly (in the same LAN) with whatever QoS profile that you're using your WAN experiment?  You should get that to work first, and then don't change the QoS Profile for the <java app>, but now apply the QoS Profile that you used to the LAN side participant/datawriter/datareader of the Routing Service. 

Offline
Last seen: 1 year 10 months ago
Joined: 12/20/2022
Posts: 6

Hello Howard, 
Thanks for the reply!

yes, you understand the usecase correctly. I have some clarification to your questions. 

I have two functioning systems that I try and connect with this RS. Within both systems the traffic between the <Javaapps> are working with my desired QoS (lets call it QOS1). If i set up these two RS to connect the systems over WAN (also using QOS1) I would expect traffic to pass. But what i see in adminconsole is that RS1 will transmit data to TCP-domain, but when  i look at RS2 it does not recieve any data. This leads me to the conclusion that <javaapp> -> RS1 works fine, but RS1->RS2 has some problem. If i change RS1->RS2 to have a different (or just default) QOS then i will get the data to flow to RS2 (and to my destinaton app). This makes me think that the QOS1 is an issue, in particular in assosicastion with TCP transport.

If i run the same example with shapes demo, using QOS1 between all of demo->RS1, RS1->RS2, RS2->demo it works, making me believe that QOS1 is not a problem. confilicing with my above conclusion and thus resulting in this thread :) 

The ownership of QOS1 is shared, so that shouldnt be of a concern. the only thing i see to be differnt is that there is a topic filter in QOS1 that affects the javaapp communication but not the shapesdemo. this: (assume that javaapp use test1 topic)

<datawriter_qos topic_filter="test1*">
<publication_name>
<name>GoodNameProfile(test1*)</name>
</publication_name>
<durability>
<kind>VOLATILE_DURABILITY_QOS</kind>
</durability>
<reliability>
<kind>BEST_EFFORT_RELIABILITY_QOS</kind>
<max_blocking_time>
<sec>DURATION_ZERO_SEC</sec>
<nanosec>100000000</nanosec>
</max_blocking_time>
</reliability>
<resource_limits>
<max_samples_per_instance>5</max_samples_per_instance>
</resource_limits>
<batch>
<enable>true</enable>
<max_samples>5</max_samples>
<max_data_bytes>1200</max_data_bytes>
<source_timestamp_resolution>
<sec>0</sec>
<nanosec>0</nanosec>
</source_timestamp_resolution>
<max_flush_delay>
<sec>DURATION_ZERO_SEC</sec>
<nanosec>10000000</nanosec>
</max_flush_delay>
</batch>
</datawriter_qos>

The shapedemo will use a differnt filer that only has the Volative&Best_effort&100000000 Duration_zero_sec  max_blocking_time sections. 

I tried to disable this by misstyping the filter in my qos file in both RS1&RS2. Was this a usless test if this also have broken the RS2->javaapp communication? as the RS2 write will not use the filter, but the javaapp on a differnt machine will still expect the filter. Anyways, doing this did not changed the behaviour. 

Thanks a lot for the help! 

/Ludvig 

 

Howard's picture
Offline
Last seen: 1 week 6 days ago
Joined: 11/29/2012
Posts: 618

Sorry, I don't quite understand.

>> If i set up these two RS to connect the systems over WAN (also using QOS1) I would expect traffic to pass.

When you say using QoS1, you mean using QoS1 for the DataWriter and DataReaders used by the Routing Service to talk over the WAN, i.e., applied to the DW/DR of the routes defined in the Routing Service XML that uses the TCP WAN?

>> If i change RS1->RS2 to have a different (or just default) QOS then i will get the data to flow to RS2 (and to my destinaton app).

Again, what do you mean change the RS1->RS2 to use a different QOS.  Is this ONLY for the DW/DR of the routes that connect RS1 and RS2, or do you mean *not* configuring RS1 and RS2 to use the TCP WAN transport?

If you could attach the QoS and Routing Service configuration files for the cases in which it worked and which it didn't, it would be easier to help diagnose.

 

With regards to topic filters.  Is there a specific reason to use Topic filters?  Best practice is to define different QoS Profiles and then use specific QoS Profiles to create different DDS entities.  The topic filter mechanism can also work, but frankly is confusing and can be lead to errors in usage that is difficult to detect.

 

Offline
Last seen: 1 year 10 months ago
Joined: 12/20/2022
Posts: 6

Hello Howard! 
I have fixed some configuration files to upload here. These are not the exact files ive been running, but i have instead striped the down to only include the sections that I use in my tests. This should make it more readable for you. The Ludvig::GlobalQoS that I have refed to can be found in "qos file which defines Ludvig QOS library" in an earlier message in this conversation. 

and yes, when I write I change to use QOS1 I mean I change the DW/DR. 

And regards to the filter, I do not know the specific reason to use the Topic filter. In the files attached it will seem surplus as i have removed the other filter options. But perhaps the structure of the file will reveal more of the intenstions to you than I understand.

As I mentioned before. The Squares will get though with this configuration. ManualPosBasic get though, but this data is not cyclikal. But the manualPosKinematic (which is cyclical with 1 Hz) from the same application as Basic will not get through. If i changed the config of the DW/DR to use the default QOS (i.e. remove the line that defines DW/DR QoS) i can get ManulPosKinematic through. And im trying to understand this behaviour to configre my system optimally. 

Writing this i came to think about the Participat/pubisher/subscriber QOS which i use GlobalQOS for. but since the libary file only defines DW/DR behaviour this should not be a concern. am i correct in assuming this? 

Thanks a lot for the assistance!! 

Howard's picture
Offline
Last seen: 1 week 6 days ago
Joined: 11/29/2012
Posts: 618

Hi Ludvig,

So, are you working on a project that has paid RTI developer licenses...and thus likely also to have a paid support contract?  If so, I'd advise you to use the services of RTI's global support team for issues regarding the use of RTI products in the future.  RTI provides a customer portal for which supported developers can create issues/bug reports to which RTI's support team will answer in a timely manner (usually in the same day, we do have an extensive support group based in Spain, as well as the US).

The forum depends on the voluntary participation of forum members.  Yes, RTI employees do monitor and will respond to questions in our spare time, but depending on the type of question, the RTI support team will usually be much more efficient and able to respond to issues more completely.  So, please consider using the RTI Customer Portal (assuming your project has access) for future needs.

With respect to this specific issue, points to note:

1) if a running Routing Service is able to pass data for at least 1 topic from one application through RS to another application, but it is not able to pass data for another topic, then it's highly likely that the problem is in the misconfiguration of QOS for the route for the problematic topic.

You stated that by using "default" QoS, the data was flowing for the ManualPosKinematic topic.  This is clear evidence that the problem is somewhere in QoS configuration (and not in the configuration of the TCP connection between the Routing Services).

The QOS can be incompatibly configured from the app to the Routing Service input, from the Routing Service output to the app, and when running a RS-to-RS relay, from one Routing Service's output to the other Routing Service's input.   If QoS is incompatibly configured in any of these places, data will not flow end-to-end through the route.

You can use Admin Console to find QoS incompatibility from an app to Routing Service.  It's a bit harder to find incompatibility between Routing Services that are connected via TCP since Admin Console is probably not able to "see" both Routing Services at the same time to detect incompatibility.  But if you run the Routing Service at higher verbosities, you may be able to grep for incompatible Qos messages (as you could if you ran any application at higher Connext-logging verbosity levels).

 

2) If you're not using Admin Console to detect QoS incompatibility (highly recommended by the way), and you have to figure out yourself what QoS parameters have been set incompatibly.  You can do this by setting the Connext Logging verbosity to a higher level as mentioned.  While this works, it's not the best way since Connext logging can be quite verbose at higher verbosities and it's easy to miss the critical messages about the issue if you're not familiar with Connext logging messages.

Alternatively, in your own applications, you can use the DDSDataReaderListener and DDSDataWriterListener to monitor the REQUESTED/OFFERED_INCOMPATIBLE_QOS_STATUS events that will be triggered when DDS discovers an Entity with incompatible QoS settings:

https://community.rti.com/static/documentation/connext-dds/7.0.0/doc/api/connext_dds/api_cpp/structDDS__RequestedIncompatibleQosStatus.html

https://community.rti.com/static/documentation/connext-dds/7.0.0/doc/api/connext_dds/api_cpp/structDDS__OfferedIncompatibleQosStatus.html

As a last resort, you can use the divide-and-conquer method.  Start from a known QoS configuration that works, e.g., the default, and then slowly add QoS changes until you find the one that causes failure.  Or start from the failed QoS configuration and remove QoS settings one-at-a-time until something starts to work.   It's usually better starting from something that works than from something that doesn't...

When adding or removing a specific QoS setting, you should do so for both the DataWriter in one application (which may be an RS) and the DataReader in the other application (which may be an RS) at the same time.  Otherwise, you could be introducing incompatibility if you adding/removing a QoS setting for a DataWriter without also adding/removing the corresponding QoS setting for a DataReader.

3) When crafting QoS configurations, QoS profiles used as a base profile should only set QoS values that *most* derived QoS profiles are using.  There is no reason to set a QoS value in the base profile if most of the derived QoS profiles are overriding that value.

In addition, I strongly urge not to set any QoS values that just assert what the default value is.  QoS's that appear in a QoS profile should be changing/overriding the default value or the value set by the base profile.  If a QoS parameter in a profile should have the same value as the default or as the value set by the base profile, then that QoS parameter should not be set in a profile.

 

So, having said this, I note the following in the various "Ludvig" QoS files that you attached:

1) Base profiles set a lot of QoS settings to what is already their default values, e.g.

                <durability>
                    <kind>VOLATILE_DURABILITY_QOS</kind>
                </durability>

               <reliability>
                    <kind>BEST_EFFORT_RELIABILITY_QOS</kind>
                    <max_blocking_time>
                        <sec>DURATION_ZERO_SEC</sec>
                        <nanosec>100000000</nanosec>
                    </max_blocking_time>
                </reliability>

        <deadline>
          <period>
            <sec>DURATION_INFINITE_SEC</sec>
            <nanosec>DURATION_INFINITE_NSEC</nanosec>
          </period>
        </deadline>

        <liveliness>
          <kind>AUTOMATIC_LIVELINESS_QOS</kind>
          <lease_duration>
            <sec>DURATION_INFINITE_SEC</sec>
            <nanosec>DURATION_INFINITE_NSEC</nanosec>
          </lease_duration>
        </liveliness>

        <ownership>
          <kind>SHARED_OWNERSHIP_QOS</kind>
        </ownership>

        <history>
          <depth>1</depth>
          <kind>KEEP_LAST_HISTORY_QOS</kind>
        </history>

and others...this is unnecessary...and is confusing to maintain.  The general rule is if you want to use the default value of a QoS, then don't configure it in a QoS profile.

 

2) The fundamental difference between a QoS profile that seems to work, "LudvigLibrary::BasicDataProfile"...used for the ManualPosBasic topic and the

"LudvigLibrary::KinematicDataProfile" for the ManualPosKinematic topic is that the BasicDataProfile uses

        <durability>
          <kind>TRANSIENT_LOCAL_DURABILITY_QOS</kind>
        </durability>

(set in the base profile, "ReliableStatusProfile")

whereas the KinematicDataProfile directly sets

                <durability>
                    <kind>VOLATILE_DURABILITY_QOS</kind>
                </durability>

and overrides the setting of the base profile.

This would be fine if all DataWriters and DataReaders for this topic throughout the system are using this same VOLATILE setting.  (as is the case if everything used TRANSIENT_LOCAL)

But if the DataReader in the receiving application is using TRANSIENT_LOCAL_DURABILITY when the DataWriter of the RS is configured to be VOLATILE_DURABILITY, then there would be a QoS incompatibility that will prevent a connection between the Routing Service output DataWriter and the application's subscribing DataReader.

The same is true for the DataWriter in the sending application...if it is using VOLATILE_DURABILITY for its DataWriter QoS, but the Routing Service is using TRANSIENT_LOCAL for the input DataReader, then the same incompatible Qos condition exists.

(TRANSIENT_LOCAL DataWriters can send data to VOLATILE or TRANSIENT_LOCAL DataReaders.   However, VOLATILE DataReaders can only receive data from VOLATILE DataWriters)

 

3) I see that you have configurations defined for 2 different Routing Services, "RED" and "BLACK".  However, for one of the routes, the QoS profile used is different.

"Red"

                <auto_topic_route name="ManualPosKinematic_from_tcp">
                     <publish_with_original_info>true</publish_with_original_info>
                     <input participant="red_tcp_transport">
                        <datareader_qos base_name="LudvigLibrary::KinematicDataProfile"/>
                        <allow_topic_name_filter>ManualPosKinematic</allow_topic_name_filter>
                         <creation_mode>IMMEDIATE</creation_mode>
                    </input>
                     <output participant="red_dds_domain_34">
                        <datawriter_qos base_name="LudvigLibrary::KinematicDataProfile"/>
                        <allow_topic_name_filter>ManualPosKinematic</allow_topic_name_filter>
                         <creation_mode>IMMEDIATE</creation_mode>
                    </output>
                </auto_topic_route>

"Black"

                <auto_topic_route name="ManualPosKinematic_from_tcp">
                     <publish_with_original_info>true</publish_with_original_info>
                     <input participant="black_tcp_transport">
                        <datareader_qos base_name="LudvigLibrary::KinematicDataProfile"/>
                        <allow_topic_name_filter>ManualPosKinematic</allow_topic_name_filter>
                         <creation_mode>IMMEDIATE</creation_mode>
                    </input>
                     <output participant="black_dds_domain_34">
                        <datawriter_qos base_name="LudvigLibrary::BasicDataProfile"/>
                        <allow_topic_name_filter>ManualPosKinematic</allow_topic_name_filter>
                         <creation_mode>IMMEDIATE</creation_mode>
                    </output>
                </auto_topic_route>

For this configuration, an application subscribing to the ManualPosKinematic topic from the "Red" Routing Service must use VOLATILE_DURABILTY.

Also, I note that both "Red" and "Black" Routing Services use the "LudvigLibrary::KinematicDataProfile" to subscribe to the ManualPosKinematic topic from the application domain (which uses VOLATILE_DURABILITY for their DataReaders).  This means that applications publishing the ManualPosKinematic topic can either use VOLATILE or TRANSIENT_LOCAL_DURABILITY for their DataWriter QoS.

 Hope you can digest this lengthy response and use this information to figure out the incompatibility that's preventing your system from passing the ManualPosKinematic topic end-to-end.

Offline
Last seen: 1 year 10 months ago
Joined: 12/20/2022
Posts: 6

Hello Howard, 

thanks a lot for the response, I apprechiate it. 

I think it is a good suggestion to apply the best practice of not re-defining QoS values. I will look into it. 

I will thorughly look at the places you suggested, but i have a feeling i might end up doing the divide and conquer :(

If i have any further inqueries in this topic I will try to reach the commerical support. 

Thank you once again for your time.

/Ludvig