We are experiencing crashing on our cRIO-9045 controllers. The crash log indicates it is happening within DDS.
####
#Date: Mon, Sep 11, 2023 12:01:10 PM
#Desc: LabVIEW caught fatal signal
20.0 - Received SIGSEGV
Reason: address not mapped to object
Attempt to reference address: 0x0x40000000c
#RCS: unspecified
#OSName: Linux
#OSVers: 4.14.146-rt67-cg-8.0.0f1-x64-139
#OSBuild: 265874
#AppName: lvrt
#Version: 20.0
#AppKind: AppLib
#AppModDate:
0x00007F792263CF81 - DDS_DomainParticipantGlobals_get_worker_per_threadI + B
0x00007F7922645110 - DDS_DomainParticipant_get_workerI + 10
0x00007F792258DE04 - DDS_Entity_lock + 9D
0x00007F792342BC98 - LVDDS_WriterNode_write_w_sample_kind + 1B6
0x00000000059C87FD - <unknown> + 0
####
#Date: Mon, Sep 11, 2023 12:34:58 PM
#Desc: LabVIEW caught fatal signal
20.0 - Received SIGSEGV
Reason: address not mapped to object
Attempt to reference address: 0x0x100000037
#RCS: unspecified
#OSName: Linux
#OSVers: 4.14.146-rt67-cg-8.0.0f1-x64-139
#OSBuild: 265874
#AppName: lvrt
#Version: 20.0
#AppKind: AppLib
#AppModDate:
0x00007F2609753DDE - DDS_Entity_lock + 77
0x00007F260A5F1C98 - LVDDS_WriterNode_write_w_sample_kind + 1B6
0x00000000047970FD - <unknown> + 0
There are several areas DDS is being called as a writer. Is there anyway to narrow this down other than disabling parts of code and running it.
Latest RTIDDS version.
LV 2020 on controller
Hi FFly,
I observed you also posted in the NI forum. Let's follow up here. Can you answer the following:
If you can provide a reproducer and more information about the scenario where the issue is happening that would be a great help.
Does it happen in Windows as well? I don't have a good way to run this in windows.
Are you using security? No
Are all arrays initialized when the VIs are generated with Complex Type Generator? Yes
If you can provide a reproducer and more information about the scenario where the issue is happening that would be a great help. There are several DDS calls. In order for me to provide a reproducer I would need some guidance on which call is throwing this error.
Hi FFly,
The function LVDDS_WriterNode_write_w_sample_kind is called in the generated "Write" vi of your DataType. Can you try to isolate the Write where the chrash is happening? Does it happens with an specific value or always no matter the value? Are you using arrays or sequences? What QoS are you using? Did you started with version 3.1.1 or you have a previous version and then update it? If so, you must regenrate your VIs generated with precious versions.
If you could provide the DataType that causes the crash and the QoS you use I could try to replicate it. Also, try it in Windows. No need to run your whole code in Windows. Just run the example of your DataType generated with Complex Type Generator in a Windows LabVIEW.
We have been using 3.1.1 since it's release.
To be clear here. There are several writers and several data types. Alot of the writers are running in parallel.
Are you saying that this funcion LVDDS_WriterNode_write_w_sample_kind is called for all writers?
If so, do these lines in the crash log indicate this could be a thread issue on your side
0x00007F792263CF81 - DDS_DomainParticipantGlobals_get_worker_per_threadI + B
0x00007F7922645110 - DDS_DomainParticipant_get_workerI + 10
0x00007F792258DE04 - DDS_Entity_lock + 9D
Just run the example of your DataType generated with Complex Type Generator in a Windows LabVIEW
As mentioned above there are several writers so I would need to setup a project that will run on Windows simulating the various data points to write. While not impossible, I was hoping to avoid that.
Let's try to isolate if we are accessing a deleted DataWriter. The toolkit shares internally the DataWriters when possible. That means that the same DatWriter can be used in several calls if different threads.
We need to islate the code flow where the issue hapens. To do it try enabling the write call of a single DataType/topic and QoS and disable all the others and try until one fails. Once you have it, can you check if you are trying to delete a DataWriter in one of the threads while doing calls to write the same DataType/topic/Qos in another thread? Can you increase the "Timeout to delete inactive DDS Entities"? You can do it in the administration panel or using the "Set Configuration parameters" vi in the Data Comunication->RTI DDS Toolkit->DDS Debugging palete.
For what it is worth, the code flow is to open the writers/readers at the beginning. They are not closed until the application exits.
As mentioned earlier, recreating this code in Windows is doable but will take a fair amount of time.
Questions:
1) I have 2 arrays that are being written. If there is a mismatch between the size of the data being written and preallocatted size of the array, could this cause the error we are seeing? To clear the question up. Array to be written is 35 elements and preallocated array is 30.
2) What happens in the same scenario if the data to be written is less than the preallocated array?
I was going to attach the DDS code but the file is 4MB and the limit here is 2MB
Hi FFly,
Here are some answers:
1) Preallocated array size is the maximun array size when generating your code with Complex Type Generator in case of DDS sequences, and the array size in case of DDS arrays. When using DDS arrays the number of elements must always be the same.
2) Arrays with less elements than the maximun is fine if you are using DDS Sequences (default). But if you are using DDS arrays that is an incorrect usage. DDS sequences can have variable size but DDS arrays have fixed size.
So don't use more elements than the maximun of the type. This is valid for DDS arrays and DDS sequences. When using DDS arrays, the input arrays must be always initialized to the size you used gor generating the VIs. DDS Sequences can have variable size but always lower than the maximun.
However I don't think using arrays incorretly could cause a concurrence issue. At least not directly. On any case, I recomend you to fix it and try it.
I have received your files by Email. There are lot of VIs in there. I have only seen four ctls that uses arrays. I need some guidance about where to look.
Ismael,
My appologies. I had forgotten there were a number of VIs that we are no longer using.
The writers are:
First here is a screen capture that National Instruments sent me from their core dump file.
Second, I have removed the FF_DDS_OD_Polygon_Write from the main code and after several days of running on 2 machines there have been no crashes. Additionally, I have a test project running on my local cRIO that is doing the DiagnesticArray write. It has not crashed either.
I regards to the FF_DDS_OD_Polygon_Write I have a question. There is a possiblity that my code could pass NaN as one or more of the array entries. On the RTI side, what happens if NaN is passed in?
Update. We have been running without "FF_DDS_OD_Polygon_Write" for about 60 hours spread across 3 machines. There have been no crashes running this way.
Hi Alan,
We confirmed that setting array values with NaN works fine, it is a value of the numeric type.
Glad to hear there have not been any crashes without the FF_DDS_OD_Polygon_Write, it seems you have isolated the problem there. Have you fixed the array access that was out of bounds, and does it still crash?
Thanks,
Maxx
Maxx,
Sorry a bit slow with the reply.
The array write that was out of bounds was fixed and we still had crashing.
We have restructured the way we are doing Obstacle Detection and have removed the entirety of the FF_DDS_OD_Polygon_Write. That is what fixed the crashing.
Alan