5.22. Hangs

5.22.1. [Critical] Calling ignore_participant during a ParticipantBuiltinTopicDataDataReader on_data_available callback triggered by the deletion of a remote DomainParticipant led to a hang or unexpected error message

Suppose you were following the example code in Ignoring Specific Remote DomainParticipants for ignoring a DomainParticipant during a ParticipantBuiltinTopicDataDataReader on_data_available callback. When the callback was invoked as a result of deleting a DomainParticipant, it was possible for this ignore_participant to misbehave under either of the following conditions:

  • You had called set_listener on the ParticipantBuiltinTopicDataDataReader after enabling the DomainParticipant.

  • In a previous invocation of the on_data_available callback, you had not taken all of the samples, so in this invocation, there are multiple samples to take.

The misbehavior was as follows:

  • In release libraries, ignore_participant would have succeeded, but a later call of DDS_DomainParticipant_delete_contained_entities would have hanged.

  • In debug libraries, ignore_participant would have succeeded but with an error message in the internal function REDACursor_startFnc.

This problem has been fixed by making ignore_participant succeed without any side effects.

[RTI Issue ID CORE-15558]

5.22.2. [Critical] Potential deadlock when calling DDS_DomainParticipantFactory_set_qos and DDS_DomainParticipantFactory_finalize_instance

The application may have ended up in a deadlock in the following scenario:

  • A custom XML file contained a structure.

  • One thread called DDS_DomainParticipantFactory_set_qos while another thread called DDS_DomainParticipantFactory_finalize_instance at the same time.

[RTI Issue ID CORE-15541]

5.22.3. [Critical] Potential deadlock when using TCP Transport on Windows systems if WINDOWS_WAITFORMULTIPLEOBJECTS was selected as the socket monitoring API

A potential deadlock occurred when using TCP Transport on Windows systems if WINDOWS_WAITFORMULTIPLEOBJECTS was selected as the socket monitoring API.

[RTI Issue ID CORE-14983]

5.22.4. [Critical] Potential deadlock when using SHMEM transport on Linux, macOS, and LynxOS systems and killing one application

Consider the following scenario:

  • Three applications using the SHMEM transport are communicating with each other.

  • One application is ungracefully killed while creating a DomainParticipant.

If the killing of the application occurred at just the wrong moment, then the other applications would stop communicating with each other because they were in a deadlock. The deadlock would persist until you cleaned the shared memory resources. In addition, any new DomainParticipants that attempted to use the same shared memory resources as the killed application would hang during DomainParticipant creation.

The problem could occur on Linux®, macOS®, and LynxOS® systems.

Now, instead of entering a deadlock, Connext will print a log message like this one at the WARNING level and the PLATFORM category:

RTIOsapiSharedMemorySemMutex_attach_os:FAILED TO ATTACH | Semaphore set identifier 0x12345678 has not been initialized. Consider cleaning this semaphore if this warning occurs multiple times for this semaphore across all applications on this machine.

The semaphore set identifier corresponds to the semid column of the output of ipcs -s on Linux:

------ Semaphore Arrays --------
key        semid      owner      perms      nsems
0x0010000c 305419896  username   666        1

[RTI Issue ID CORE-14848]

5.22.5. [Critical] Potential deadlock when creating entities in parallel that use Zero Copy Transfer over Shared Memory transport

Consider the following scenario:

  • A DomainParticipant is using the Zero Copy transfer over shared memory transport.

  • One thread is trying to create a DataWriter or DataReader belonging to that DomainParticipant.

  • Another thread is either:

    • trying to create a DataWriter or DataReader from a callback function of a builtin Topic DataReader belonging to that DomainParticipant, or

    • using the Request-Reply API and trying to create a Requester or Replier belonging to that DomainParticipant.

These two threads may have entered a deadlock while trying to create their entities.

[RTI Issue ID CORE-14820]

5.22.6. [Critical] String built-in type unregister_type API blocked associated DomainParticipant

The unregister_type for the String built-in type did not work properly; it failed and blocked the DomainParticipant instead of unregistering the type.

[RTI Issue ID CORE-14543]

5.22.7. [Critical] Hang of QNX application using shared memory if one of the shared semaphores was in an unusable state

On QNX platforms, in the initialization of the shared memory transport, if a Connext application crashed or was killed between the creation and the initialization of a shared semaphore, that shared semaphore was unusable. Other Connext applications running on the same domain would try to access the shared semaphore and would hang waiting for the shared semaphore to be initialized. Now, the other Connext applications will log an error message and continue running if they wait for the shared semaphore initialization for more than four seconds.

[RTI Issue ID CORE-16108]

5.22.8. [Critical] Potential deadlock when using Network Capture utility APIs concurrently with DDS calls

Using the Network Capture utility APIs in one thread while making DDS calls in another thread could cause a deadlock during the first DDS API call in that thread.

The hang occurred because the creation of some DDS thread-specific state was taking a pair of locks in the opposite order in which they were taken by the Network Capture APIs.

[RTI Issue ID CORE-15984]

5.22.9. [Major] Potential deadlock when using Durable Writer History if there was a failure while creating the connection *

A potential deadlock occurred when using Durable Writer History if there was a failure while creating the connection.

[RTI Issue ID CORE-15130]

5.22.10. [Major] Calling ignore_participant when using the on_data_available callback to process a ParticipantBuiltinTopicData with partial_configuration led to a hang during DomainParticipant deletion *

Suppose you were following the example code in Ignoring Specific Remote DomainParticipants for ignoring a DomainParticipant during a ParticipantBuiltinTopicDataDataReader on_data_available callback. When detecting the creation of another DomainParticipant, it was possible for this ignore_participant to misbehave when the ParticipantBuiltinTopicData::partial_configuration field was set to DDS_BOOLEAN_TRUE. The misbehavior was as follows:

  • In release libraries, ignore_participant would have succeeded, but a later call of DDS_DomainParticipant_delete_contained_entities would have hanged.

  • In debug libraries, ignore_participant would have succeeded but with an error message in the internal function REDACursor_startFnc.

This problem has been fixed by making ignore_participant succeed without any side effects.

[RTI Issue ID CORE-13783]



* This bug did not affect you if you are upgrading from 6.1.x or earlier.