Hello,
I've recently encountered a problem with one of my applications(which uses DDS over UDP for communications).
Once every few restarts (about 4-6) one of my RTI reader tasks raises an exception upon deserialization of a sample.
The topic itself contains an enum as a key value, followed by a few fields (about 100 bytes), and has a QoS policy set to RELIABLE_RELIABILITY_QOS, DDS_TRANSIENT_DURABILITY_QOS
with resource limits set to: max_smaples: 10, max_instances: 10, max_samples_per_instance: 1, initial_samples: 10, initial_instances: 10, history depth: 1, these settings apply for both the reader and the writer.
In reallity there is a maximum of 4 instances, each being periodically sent.
Unfortunately, the exception itself doesn't contain any meaningful information (the logger doesn't print anything), it seems that there is a memory violation of some sort.
I've tried a few (somewhat random) approaches to try to get a better understading of the problem:
1) Changing the reliability to BEST_EFFORT_RELIABILITY_QOS - fixed the problem
2) Sending the entire topic with the fields set to 0 (memset 0) on the writer size - fixed the problem, memset-ing the entire struct except for the key value also worked for some reason, but leaving at least one data field
non - 0 crashed.
3) Disabling the sending of any other topic (the discovery process still applies to them) - crashed
4) Removing the key attribute from the enum - fixed the problem
5) Reducing the max_instances and max_samples to 1 - fixed the problem
6) Setting the logging level to VERBOSE and checking for any weird/unexpected differences between a non-crashing run and a "good" run and found nothing
7) I've also tried reproducing the issue on a PC but everything works perfectly there (the apps themselves are identical between PC and Target - Integrity 5.0.11)
So far it seems to me that there is some sort of memory/performance/timing issue on my target that is causing this exception due to the fact that reducing the reliability/max_instances reduces the network overhead which gives enough breathing room to the threads,
but I've failed to understand exactly what the issue is. I've also noticed that the exception itself happens only on app initialization, meaning that if the thread won't crash in the first 5-10 seconds - it won't crash until a restart.
Furthermore, on a "crash run" other topics aren't received as well, indicating that the entire read process is blocked(?) by something.
The target is running with Integrity 5.0.11 on a PowerPc arch.
DDS version: NDDS 5.0.0 (CPP)
I'd love to get a better understanding of why the thread is crashing and what can be done to fix it if it pops up somewhere else.
Thanks in advance!