Interoperability Between RTI DDS 5.3 with Vortex OpenSplice 6.7

11 posts / 0 new
Last post
Offline
Last seen: 3 years 7 months ago
Joined: 05/19/2020
Posts: 5
Interoperability Between RTI DDS 5.3 with Vortex OpenSplice 6.7

I am building a C++11 application that uses both Opensplice and RTI libraries for communication. The problem arises when a separate application is sending RTI DDS messages to another application that uses both OpenSplice/RTI DDS libraries and causes the receiving application to throw a <std::length_error what():  basic_string::_S_create> which occurs in the standard c++ library. We referenced https://community.rti.com/forum-topic/interoperability-between-rti-and-opensplice for solutions but disabling shared memory by putting the following element into the participant qos in the USER_QOS_PROFILES.xml didn't work. 

<transport_builtin>
<mask>UDPV4</mask>
</transport_builtin>

We have isolated the issue with the <dcpsisocpp2.so> library. When we remove the library from linkage, the receiving application runs without any issues. However, when we link the library back into the application, it causes it to throw the exception mentioned above. Is there a way to configure RTI or Opensplice where linking the library doesn't step on each other during runtime. 

 

Howard's picture
Offline
Last seen: 17 hours 53 min ago
Joined: 11/29/2012
Posts: 621

So, does the application using both Connext DDS and OpenSplice actually startup and initialize without any problems?

If you then start a Connext DDS-only and OpenSplice-only application but don't publish any data, do the applications discover each other successfully?  Does the Connext/OpenSplice application have any problems?

If you send data from the Connext/OpenSplice app, either using a Connext DDS DataWriter or a OpenSplice DataWriter, are there any problems?  Do the Connext-only and OpenSplice-only apps receive the data?

If you subscribe to data with the Connext/OpenSplice app, you indicated that the problem only happens when data is sent by the Connext-only app?  Does it also happen if you send data with the OpenSplice-only app?

When the problem does occur, what is the stack trace?  What thread through what stack trace is causing the exception?   Can you run in a debugger and find out?

As for this

"disabling shared memory by putting the following element into the participant qos in the USER_QOS_PROFILES.xml didn't work."

How do you mean that it didn't work?  RTI Connext DDS is still using shmem if you try to configure it to only use UDP in XML?  How do you know?

"isolated the issue with the <dcpsisocpp2.so> library."

that's an OpenSplice library I think...what do you mean that you can remove it from linkage?  How is the receiving application able to run without that library?  Doesn't the OpenSplice code require that library? 

 

Gerardo Pardo's picture
Offline
Last seen: 3 weeks 2 days ago
Joined: 06/02/2010
Posts: 602

What is the IDL of the data-type you are sending? Does it contain unbounded strings?

Did try with a more recent version of Connext, e.g. 6.0? Or does your application need to use 5.3.1?

 

Offline
Last seen: 3 years 7 months ago
Joined: 05/19/2020
Posts: 5

@howard 

If you then start a Connext DDS-only and OpenSplice-only application but don't publish any data, do the applications discover each other successfully?  Does the Connext/OpenSplice application have any problems?

A Connext DDS-only application is able to discover the Connext/OpenSplice application without any problem. The Connext only application is able to send a message to the Connext/OpenSplice message one time. On the next message, the Connext/OpenSplice message throws the exception. 

If you send data from the Connext/OpenSplice app, either using a Connext DDS DataWriter or a OpenSplice DataWriter, are there any problems?  Do the Connext-only and OpenSplice-only apps receive the data?

I was testing the Connext/OpenSplice app in GTest that only sends data after receiving input from the Connext DDS Datawriter. The Connext-only app sends/receive data. 

If you subscribe to data with the Connext/OpenSplice app, you indicated that the problem only happens when data is sent by the Connext-only app?  Does it also happen if you send data with the OpenSplice-only app?

The issue doesn't occur when an OpenSplice-only app sends messages to the Connext/OpenSplice application. The OpenSplice-only app can send messages to the Connext/OpenSplice application and the Connext/OpenSplice application can process the message and send DDS messages to a Connext-only app. 

 

When the problem does occur, what is the stack trace?  What thread through what stack trace is causing the exception?   Can you run in a debugger and find out?

After running through GDB, thread #2 threw the following error. 

#0  0x00007f486d7fb387 in raise () from /lib64/libc.so.6

#1  0x00007f486d7fca78 in abort () from /lib64/libc.so.6

#2  0x00007f486e10b7d5 in __gnu_cxx::__verbose_terminate_handler() () from /lib64/libstdc++.so.6

#3  0x00007f486e109746 in ?? () from /lib64/libstdc++.so.6

#4  0x00007f486e109773 in std::terminate() () from /lib64/libstdc++.so.6

#5  0x00007f486e109786 in ?? () from /lib64/libstdc++.so.6

#6  0x00007f486e1093c2 in __cxa_call_unexpected () from /lib64/libstdc++.so.6

As for this

"disabling shared memory by putting the following element into the participant qos in the USER_QOS_PROFILES.xml didn't work."

How do you mean that it didn't work?  RTI Connext DDS is still using shmem if you try to configure it to only use UDP in XML?  How do you know?

I added the element into the USER_QOS_PROFILES.xml for both the Connext-only application and Connext/OpenSplice application, re-ran the Connext-only appliction to send messages to the Connext/OpenSplice application. The result ended in the exception mentioned previously before. I verified that the changes took hold because I would remove a bracket and see if the application had issues with the .xml file. 

"isolated the issue with the <dcpsisocpp2.so> library."

that's an OpenSplice library I think...what do you mean that you can remove it from linkage?  How is the receiving application able to run without that library?  Doesn't the OpenSplice code require that library? 

Yes that is an OpenSplice library. So the full application is built using both Connext and OpenSplice, however, i was able to isolate the Connext DDS component from the whole application and throw it into GTest to listen for incoming DDS messages. When building the GTest with the Connext DDS component, the test is built linking both OpenSplice and Connext libraries. I had the same issue when I ran the full application, both Connext/OpenSplice, and isolated the problem to the Connext DDS component.

 

@Gerardo 

What is the IDL of the data-type you are sending? Does it contain unbounded strings?

We are using std::vector<uint8> and std::strings. 

Did try with a more recent version of Connext, e.g. 6.0? Or does your application need to use 5.3.1?

The software is built using 5.3.1 and I am unable to update it to 6.0. 

Howard's picture
Offline
Last seen: 17 hours 53 min ago
Joined: 11/29/2012
Posts: 621

Hi,

So, sorry, it's not obvious what the problem could be.  Perhaps @Gerardo has more insight since he's done some work dealing with OpenSplice during interoperability experiments.  However, what you're seeing is not an interoperability issue.  It's a work in the same process issue...which I think that we do have customers who have done so successfully.

In any case,

"disabling shared memory by putting the following element into the participant qos in the USER_QOS_PROFILES.xml didn't work."

How do you mean that it didn't work?  RTI Connext DDS is still using shmem if you try to configure it to only use UDP in XML?  How do you know?

I added the element into the USER_QOS_PROFILES.xml for both the Connext-only application and Connext/OpenSplice application, re-ran the Connext-only appliction to send messages to the Connext/OpenSplice application. The result ended in the exception mentioned previously before. I verified that the changes took hold because I would remove a bracket and see if the application had issues with the .xml file

Since shared memory is not interoperable between Connext DDS and OpenSplice, following the guideline to disable the shared memory transport in Connext DDS is a good idea, even if it is not relevant to your problem.  Unfortunately, the stack trace is virtually useless...


If somehow, Connext DDS was linked to OpenSplice libs...you can try to:


1) rearrange the order that the libraries appear for the linker.  If OpenSplice libs were first, move them behind Connext DDS, or vice versa and see if that makes a difference


2) Change from shared libraries to static libraries (or vice versa)


3) Use static libraries for Connext DDS and shared libraries for OpenSplice (or vice versa)

 

 

 

 

 

 

 

 

 

Offline
Last seen: 3 years 7 months ago
Joined: 05/19/2020
Posts: 5

@howard 

1) rearrange the order that the libraries appear for the linker.  If OpenSplice libs were first, move them behind Connext DDS, or vice versa and see if that makes a difference

So I have tried this suggestion. The OpenSplice libraries was first linked then the RTI libraries. This order of linkage caused the exception mentioned above. I switched the order, RTI libraries first and OpenSplice libraries second, the exception doesn't occur. However, I am concerned that whatever the issue that occured with sending RTI messages will shift to the OpenSplice side since we are linking hte RTI libraries first and OpenSplice libraries second. I plan on testing this idea more to see if any issues arise. 

2) Change from shared libraries to static libraries (or vice versa)

3) Use static libraries for Connext DDS and shared libraries for OpenSplice (or vice versa)

2 and 3 are good suggestions and I will give this a shot. I will post an update after I tried them. 

Howard's picture
Offline
Last seen: 17 hours 53 min ago
Joined: 11/29/2012
Posts: 621

So interesting...somewhere in our libraries, the same function must be defined.  I'm surprised that a multiply-defined symbol warning wasn't produced at linking time.

It would be great to know what that function is...if you find out please post.

Offline
Last seen: 3 years 7 months ago
Joined: 05/19/2020
Posts: 5

So interesting...somewhere in our libraries, the same function must be defined.  I'm surprised that a multiply-defined symbol warning wasn't produced at linking time.

It would be great to know what that function is...if you find out please post.

I haven't found the multi-defined symbol/function. 

Revisting the issue, it seems to be intermittent. Scenario: 

Application A is up and running and sending DDS traffic to subscribers. Application B starts up afterwards and i get the error ( throw a <std::length_error what():  basic_string::_S_create>). However, I restart the application B again and it appears to be running alright. Ever so often I restart App B while App A is running, App B will throw the error. 

I tried another thing, App B which is listening/sending its DDS messages. I start App A and its able to communicate with App B. I restart App A and it continues to communicate. I repeat the process several times with App B active and constantly restarting App A and it doesn't throw the error. 

My thinking is that its an issue with the User_QOS_Profiles.xml and I have been tinkering with it. 

  • I tried switching from Generic.StrictReliable.LargeData to Generic.KeepLastReliable.TransientLocal because I was thinking that App A is sending so much traffic to App B, App B is unable to correctly process all the data. However, when App B is up and running and waiting to receive data from App A, its doing fine. However, that didn't help the issue. 
  • I am setting the resource limits of the participant_qos, however, this doesn't help either. 

<resource_limits>
<type_object_max_serialized_length>0</type_object_max_serialized_length>
<type_code_max_serialized_length>0</type_code_max_serialized_length>
</resource_limits>

 

Continuing to investigate the issue. 

Howard's picture
Offline
Last seen: 17 hours 53 min ago
Joined: 11/29/2012
Posts: 621

So, from the behavior that you found through your experiments, I would guess that there's memory corruption going on in App B.

App A and App B are different right?  And you never see the problem in App A?  And see it in App B, but not 100% of the time?  That sounds like a memory issue that sometimes overwrites the wrong piece of memory with the wrong value and doesn't at other times.  And it happens on startup.  If it doesn't happen, it won't happen.

I would check your own coding logic:

strings, arrays, any allocated objects/memory...are you still using after free/delete...

If you are setting string parameters in Connext DDS QOS structures, you have to allocate the memory and give it to the structure.  You can't set a string to a const or some value that's gonna be released later.

I would suggest that you run a memory analysis tool like valgrind or similar and see if that doesn't give you any clues.

 

Offline
Last seen: 3 years 7 months ago
Joined: 05/19/2020
Posts: 5

App A and App B are different right?  And you never see the problem in App A?  And see it in App B, but not 100% of the time?  That sounds like a memory issue that sometimes overwrites the wrong piece of memory with the wrong value and doesn't at other times.  And it happens on startup.  If it doesn't happen, it won't happen.

App A and App B are different and correct I don't see the issue in App A but App B is intermitten. Correct, it happens on startup, if it doesn't occur it will run without any issues. 

strings, arrays, any allocated objects/memory...are you still using after free/delete...

I am not using any free/delete calls. 

I would suggest that you run a memory analysis tool like valgrind or similar and see if that doesn't give you any clues.

I will give this a shot and see if this yields anything. 

 

Also another weird behavior, if I leave App A continously running and when I quickly start App B, close, quickly restart App B right after (repeated this process many times), i don't encounter the issue. However, while App A is running, I start App B. Then I close App B, wait like 10 seconds and start App B again i would encounter that error.

Howard's picture
Offline
Last seen: 17 hours 53 min ago
Joined: 11/29/2012
Posts: 621

"I am not using any free/delete calls. "

Connext DDS internally may release memory that you pass to it...specifically it will manage the strings that are set in QOS structures.

For memory corruption issues, I would check any access to arrays, pointers, etc...i