Hi,
we have a problem while recording large data packets (~8MB) with the DDS recording service.
Our goal is to capture some sensor outputs (gps, radar, camera,...) to later make an offline stimulation of our DDS Micro 3.0.0 nodes with the DDS replayer service.
Therefore we have some gateway (DDS Micro), that is converting the raw senormassages from ROS format to DDS format and then this raw DDS data packets we want to record.
Our problem is now that we cannot (or almost cannot) record the raw image data. Whereas other data types a rather complex and we already had to increase type_object_max_serialized_length to 16k the data type for the image data is very simple and only containing a huge array with all the pixel information.
What I already tried out is:
- Make the connection reliable
- Set history kind to all
- increase max_samples, max_instances, max_samples_per_instance
- Using shared memory for the connection
The recording of all other data types is working correctly but with the ImageData we get a datarate arround 0.3 Hz at most (reliable mode only) where we have a 25 Hz data input. Rest is much slower arround 1 of 1000 data packets or less.
The current system runs on a single host computer.
What I also tested is:
The communication between two DDS Micro nodes seems to work perfect and also with the already mentioned ImageData type. (At least the correct callbacks were called in correct rate)
A self written DDS Micro Writer node with only the ImageData type has the same recording problems as the much bigger gateway.
Only in reliable mode I got two different errors:
- From the send method in the gateway: DDS_RETCODE_TIMEOUT
- From the self written pure node in the console: ModuleID=7 Errcode=500 which is according to the docu "Could not allocate a resource of the specified kind"
In non reliable mode I got nor error messages.
Has anybody an idea which other QoS Parameters have to be changed to get this to work?
Regards,
Niclas
Hi Niclas,
I've let Maxx Becker, the RTI FAE responsible for helping Zukunft Mobility at this time to know about your question. I'm sure he'll get in touch soon.
In the meantime, we'll probably need to clarify what you're actually doing.
For your 0.3 Hz rate, are you sending from an appllication using Connext Micro to another application using Connext Micro? Or are you using Connext Micro to send data to RTI Recording Service?
How have you configure your RTI Recording Service to record the data (if you are using the Recording Service)? If using a Connext Micro application to subscribe, how have you configured the QOS for the DataReader?
Will your real system be running the sending and receiving applications on the same host and thus can take advantage of using Shared Memory? Or is that only available for this testing and you will be using an Ethernet or other type of network for the actual scenario?
If can use shared memory, what is the configuration of your shared memory and how did you set it up (can copy/paste the code), including how you set up the Transports to use shared memory.
Finally, please use whatever means that you can to localize the place in your code, the specific Connext DDS Micro API that your code called, that produces the warning message that you reported:
This issue may be related to the poor performance that you are experiencing.
Hi Howard,
thanks for your reply. I try to answer your questions the best I can as I am still new to Connext DDS.
I am sending data from Connext Micro to RTI Recording service with KEEP_ALL_HISTORY and RELIABLE mode turned on for bot writer and receiver.
Yes I am using the Recording Service.
DataReader is configured:
Participant is configured:
This is only for recording and testing. In the end we want to replay the data a a host pc and want to stimulate a target on a target computer.
As we are using internal libs for configuring dds micro it is not so easy to answer but I think the following lines are the configuration of shmem.
Resource Limits, Reliability and History is configured the same in the sender as the recording service.
Hope this information will help to understand.
Regards,
Niclas
Hi Niclas,
Thank you for the information. A few more questions...
For the data type for the image information, how large is the array that holds the pixel information? Is it unbounded? How much data do you have in each sample when you are issuing the write() call? Are you attempting 8MB data samples at 25Hz? If you can provide the IDL, or any code snippets, that would help.
From what I see, it seems that the Data Writer may be exceeding its resouce limits as defined, blocks, and eventually times out (throwing the DDS_RETCODE_TIMEOUT). This could be because the Data Reader (the Recording Service) is not acknowledging samples, and therefore the Data Writer queues are filling up. It looks like the shared memory resources are configured as defaults; the receive_buffer_size ends up being 1MB. We may have to look at how data is being handled in the receive queues.
Feel free to follow up with me directly via email through the ongoing channel I have with the rest of the team.
Thanks,
Maxx
Hi Niclas,
So, I'm sure that Maxx will be able to help you to get this working...but let me give you guys some pointers since I have a bit more experience than Maxx.
1) When using RELIABLE reliability, you can't just set RELIABLE and KEEP_ALL history. Unfortunately, the parameters for the reliable protocol used by DDS, which is turned on when you set RELIABLE, are not well configured for any specific use case.
So, for best practice, we suggest that for any DataWriter or DataReader that is supposed to send or receive data reliably, you should use a QOS Profile derived from the Builtin QOS Profiles, and specifically for strict, TCP, lossless, reliability, you should use the "Generic.StrictReliable.LargeData" (since the data is 8 MB) profile from the "BuiltinQosLib" qos library. This for applications using the full Connext DDS, including the Recording Service.
So in your Recording Service XML configuration file, you should use this (which will set the QOS to be RELIABLE and KEEP_ALL and thus don't need to set those anymore)
Also, you should configure the DomainParticipant created by Recording Service to better handle large data by using the following QOS setting in the Recording Service XML file,
Unfortunately, Connext Micro doesn't support the concept of QOS profiles and thus you will have to make modifications to code.
For DataWriters, I would recommend:
int NUM_SAMPLES = 2; dw_qos.resource_limits.max_samples = NUM_SAMPLES;
dw_qos.resource_limits.max_samples_per_instance = NUM_SAMPLES;
dw_qos.resource_limits.max_instances = 1; // 10 ms
dw_qos.protocol.rtps_reliable_writer.heartbeat_period.sec = 0;
dw_qos.protocol.rtps_reliable_writer.heartbeat_period.nanosec = 10000000;
dw_qos.protocol.rtps_reliable_writer.heartbeats_per_max_samples = NUM_SAMPLES;
where NUM_SAMPLES should be set to the number of data samples that you want DDS to be able to hold while DDS is using the reliable protocol to send the data...if you're sending very fast, then NUM_SAMPLES should be larger (typically for small data sent continuously, we recommend setting to 40).
However, for large data, if you set NUM_SAMPLES to 1, then after you sent a data sample, the next call to DataWriter::write() will block (assuming RELIABLE and KEEP_ALL) until DDS has successfully sent and confirmed the receipt of the data. Setting NUM_SAMPLES to 2 will allow you to write another sample while DDS is still sending the first, i.e., if you are sending the data twice in a row without any delay in between.
NOTE, the larger the value of NUM_SAMPLES, the more memory will be allocated by DDS to hold those samples.
Additionally, if only sending through shared memory (and not over UDP on a network), there are several optimizations that you can make such as changing the MTU (message_size_max) of the shared memory transport to hold the largest data that you can send without having to fragment the data. Right now, your configuration sets the message_size_max to 64K Bytes, and only allocates 1 MB of total shared memory space. This basically forces DDS to break up your 8 MB data into 64KB fragments to send over shared memory which can only hold 1 MB (actually each 64 KB packet only contains up to 65507 bytes of user data, the other bytes will be taken up by a header). Fundamentally, if the 1 MB shared memory buffer isn't serviced fast enough by the receiving application, the buffer will become full, and shared memory packets will be dropped...but will be resent because of the reliability protocol. But any dropped packets means delays and inefficient use of the CPU.
So to fix, you should set your shared memory segment to have a much larger message_size_max (over 8 MB if your largest data is 8 MB) and larger buffer (some N x 8 MB, so that multiple data can be buffered without loss in shared memory).
This needs to be configured in Recording Service if Recording Service is receiving the data (think of the Shared Memory as a mailbox owned by the receiving application). And if Micro is receiving the data, then it should be configured similarly.
For Recording service it would be like
For Micro it would be:
// Set sizing for SHMEM transport
shmem_property.message_size_max = 10*1024*1024;
shmem_property.receive_buffer_size = 2*10*1024*1024; // register SHMEM transport
if (!registry->register_component(NETIO_DEFAULT_SHMEM_NAME, NETIO_SHMEMInterfaceFactory_get_interface(),
&shmem_property._parent._parent, nullptr))
The above won't help at all if you are also sending the data over a network, which has a MTU of 64 KB. When sending the data to multiple participants, DDS will use the smallest MTU to fragment the data, so if UDP is 64 KB and shared memory is 10 MB, DDS will fragment at 64 KB.
Finally, this is advanced stuff and usually we recommend getting some training first before diving into this, if you are only sending data through shared memory, you can consider using the ZeroCopy Shared Memory option..., see this documentation: https://community.rti.com/static/documentation/connext-micro/3.0.3/doc/html/usersmanual/zerocopy.html
Hi,
thanks @Howard for the amazing help. I think it is working now as expected. Only one question is left over. When I first ran all the settings as you explained I got the error:
NDDS_Transport_Shmem_attach_writer:incompatible shared memory segment found. Found segment with max message size 65536. Needed 10485760.
I then changed
to this size (65536) and it worked like a charm. But you said:
And I do not really understand why this is error occors as for me it looks it is the setting in the writer
shmem_property.message_size_max = 65536;
that is causing this.
To complete the open question from @maxx (also thanks to you) the size of the data buffer array is fixed. From the idl I got
sequence<octet,7372800> a_DataBuffer;
But I think @Howard already solved the problem.
Hi Niclas,
Did you set both
in the Recording Service XML configuration, as well as
shmem_property.message_size_max = 10*1024*1024;
In the Micro source configuration? The error you are seeing points to a shared memory configuration mismatch. So they should both have the same value, 10485760.
Maxx
Hi Maxx,
so I got this setting has to be the same in writer AND receiver? Good point to know.
No I did not set it in the the receiver (by code) cause this is hidden under a library at our company I am not cappable to change. (I can see the code but not change it that easily)
But nevertheless it is owrking althoug the shared memory is fragmented. I think increasing the <receive_buffer_size> made the fix.
Niclas
Hi Niclas,
Yes, the shared memory configuration should match in the XML (for the Recording Service) and in the code (of the Publisher, using Micro). I am not sure what you mean by the receiver in your last reply, if it is not the recording service... is this the target that subscribes to the replay of the recorded data?
In any case, glad to know it is working!
Maxx
Hi Maxx,
Thanks again for your help and the elucidation of the last question.
With receiver I meant the recording service. And as you correctly pointed out the the Publisher is the DDS Micro application.
Niclas