Problem with recording of large data packets (~8MB) from DDS Micro

10 posts / 0 new
Last post
Offline
Last seen: 4 years 2 months ago
Joined: 08/24/2020
Posts: 5
Problem with recording of large data packets (~8MB) from DDS Micro

Hi,

we have a problem while recording large data packets (~8MB) with the DDS recording service.

Our goal is to capture some sensor outputs (gps, radar, camera,...) to later make an offline stimulation of our DDS Micro 3.0.0 nodes with the DDS replayer service.

Therefore we have some gateway (DDS Micro), that is converting the raw senormassages from ROS format to DDS format and then this raw DDS data packets we want to record.

Our problem is now that we cannot (or almost cannot) record the raw image data. Whereas other data types a rather complex and we already had to increase  type_object_max_serialized_length to 16k the data type for the image data is very simple and only containing a huge array with all the pixel information.

What I already tried out is:

  • Make the connection reliable
  • Set history kind to all
  • increase max_samples, max_instances, max_samples_per_instance
  • Using shared memory for the connection

The recording of all other data types is working correctly but with the ImageData we get a datarate arround 0.3 Hz at most (reliable mode only) where we have a 25 Hz data input. Rest is much slower arround 1 of 1000 data packets or less.

The current system runs on a single host computer.

What I also tested is:

The communication between two DDS Micro nodes seems to work perfect and also with the already mentioned ImageData type. (At least the correct callbacks were called in correct rate)

A self written DDS Micro Writer node with only the ImageData type has the same recording problems as the much bigger gateway.

Only in reliable mode I got two different errors:

  • From the send method in the gateway: DDS_RETCODE_TIMEOUT
  • From the self written pure node in the console: ModuleID=7 Errcode=500 which is according to the docu "Could not allocate a resource of the specified kind"

In non reliable mode I got nor error messages.

Has anybody an idea which other QoS Parameters have to be changed to get this to work?

Regards,

Niclas

Howard's picture
Offline
Last seen: 4 hours 35 min ago
Joined: 11/29/2012
Posts: 605

Hi Niclas,

I've let Maxx Becker, the RTI FAE responsible for helping Zukunft Mobility at this time to know about your question.  I'm sure he'll get in touch soon.

In the meantime, we'll probably need to clarify what you're actually doing.

For your 0.3 Hz rate, are you sending from an appllication using Connext Micro to another application using Connext Micro?  Or are you using Connext Micro to send data to RTI Recording Service?

How have you configure your RTI Recording Service to record the data (if you are using the Recording Service)?  If using a Connext Micro application to subscribe, how have you configured the QOS for the DataReader?

Will your real system be running the sending and receiving applications on the same host and thus can take advantage of using Shared Memory?  Or is that only available for this testing and you will be using an Ethernet or other type of network for the actual scenario?

If can use shared memory, what is the configuration of your shared memory and how did you set it up (can copy/paste the code), including how you set up the Transports to use shared memory.

Finally, please use whatever means that you can to localize the place in your code, the specific Connext DDS Micro API that your code called, that produces the warning message that you reported:

From the self written pure node in the console: ModuleID=7 Errcode=500 which is according to the docu "Could not allocate a resource of the specified kind"

This issue may be related to the poor performance that you are experiencing.

Offline
Last seen: 4 years 2 months ago
Joined: 08/24/2020
Posts: 5

Hi Howard,

thanks for your reply. I try to answer your questions the best I can as I am still new to Connext DDS.

 

For your 0.3 Hz rate, are you sending from an appllication using Connext Micro to another application using Connext Micro?  Or are you using Connext Micro to send data to RTI Recording Service?

I am sending data from Connext Micro to RTI Recording service with KEEP_ALL_HISTORY and RELIABLE mode turned on for bot writer and receiver.

 

How have you configure your RTI Recording Service to record the data (if you are using the Recording Service)?

Yes I am using the Recording Service.

DataReader is configured:

<datareader_qos>
  <reliability>
    <kind>RELIABLE_RELIABILITY_QOS</kind>
  </reliability>
  <history>
    <kind>KEEP_ALL_HISTORY_QOS</kind>
  </history>
  <resource_limits>
    <initial_samples>1000</initial_samples>
    <initial_instances>10</initial_instances>
    <max_samples>1000</max_samples>
    <max_instances>10</max_instances>
    <max_samples_per_instance>100</max_samples_per_instance>
  </resource_limits>
</datareader_qos>

 

 Participant is configured:

<participant_qos>
  <discovery>
    <initial_peers>
      <element>shmem://</element>
    </initial_peers>
  </discovery>
  <transport_builtin>
    <mask>SHMEM</mask>
  </transport_builtin>
  <resource_limits>
    <type_object_max_serialized_length>16384</type_object_max_serialized_length>
    <remote_reader_allocation>
      <max_count>120</max_count>
    </remote_reader_allocation>
    <remote_writer_allocation>
      <max_count>120</max_count>
    </remote_writer_allocation>
    <remote_participant_allocation>
      <max_count>120</max_count>
    </remote_participant_allocation>
  </resource_limits>
</participant_qos>

 

This is only for recording and testing. In the end we want to replay the data a a host pc and want to stimulate a target on a target computer.

 

If can use shared memory, what is the configuration of your shared memory and how did you set it up (can copy/paste the code), including how you set up the Transports to use shared memory.

As we are using internal libs for configuring dds micro it is not so easy to answer but I think the following lines are the configuration of shmem.

shmem_property.received_message_count_max = 64;
shmem_property.message_size_max = 65536;
shmem_property.receive_buffer_size = (shmem_property.received_message_count_max * shmem_property.message_size_max) / 4;

registry->register_component(NETIO_DEFAULT_SHMEM_NAME,
NETIO::SHMEM::InterfaceFactory::get_interface(),
(struct RT_ComponentFactoryProperty*)&shmem_property,
NULL)   

Resource Limits, Reliability and History is configured the same in the sender as the recording service.

 

Hope this information will help to understand.

 

Regards,

Niclas

 

 

maxx's picture
Offline
Last seen: 1 year 2 weeks ago
Joined: 08/26/2020
Posts: 9

Hi Niclas,

Thank you for the information. A few more questions...

For the data type for the image information, how large is the array that holds the pixel information? Is it unbounded? How much data do you have in each sample when you are issuing the write() call? Are you attempting 8MB data samples at 25Hz? If you can provide the IDL, or any code snippets, that would help.

From what I see, it seems that the Data Writer may be exceeding its resouce limits as defined, blocks, and eventually times out (throwing the DDS_RETCODE_TIMEOUT). This could be because the Data Reader (the Recording Service) is not acknowledging samples, and therefore the Data Writer queues are filling up. It looks like the shared memory resources are configured as defaults; the receive_buffer_size ends up being 1MB. We may have to look at how data is being handled in the receive queues. 

Feel free to follow up with me directly via email through the ongoing channel I have with the rest of the team. 

Thanks,

Maxx

Howard's picture
Offline
Last seen: 4 hours 35 min ago
Joined: 11/29/2012
Posts: 605

Hi Niclas,

So, I'm sure that Maxx will be able to help you to get this working...but let me give you guys some pointers since I have a bit more experience than Maxx.

1) When using RELIABLE reliability, you can't just set RELIABLE and KEEP_ALL history.  Unfortunately, the parameters for the reliable protocol used by DDS, which is turned on when you set RELIABLE, are not well configured for any specific use case. 

So, for best practice, we suggest that for any DataWriter or DataReader that is supposed to send or receive data reliably, you should use a QOS Profile derived from the Builtin QOS Profiles, and specifically for strict, TCP, lossless, reliability, you should use the "Generic.StrictReliable.LargeData" (since the data is 8 MB) profile from the "BuiltinQosLib" qos library.  This for applications using the full Connext DDS, including the Recording Service.

So in your Recording Service XML configuration file, you should use this (which will set the QOS to be RELIABLE and KEEP_ALL and thus don't need to set those anymore)

<datareader_qos base_name="BuiltinQosLib::Generic.StrictReliable.LargeData"></datareader_qos>

Also, you should configure the DomainParticipant created by Recording Service to better handle large data by using the following QOS setting in the Recording Service XML file,

        <domain_participant name="Participant0">

         <domain_id>0</domain_id>
         <participant_qos base_name="BuiltinQosLib::Generic.Participant.LargeData"></participant_qos>

     </domain_participant>

Unfortunately, Connext Micro doesn't support the concept of QOS profiles and thus you will have to make modifications to code.

For DataWriters, I would recommend:

      int NUM_SAMPLES = 2; dw_qos.resource_limits.max_samples = NUM_SAMPLES;

    dw_qos.resource_limits.max_samples_per_instance = NUM_SAMPLES;

    dw_qos.resource_limits.max_instances = 1; // 10 ms

    dw_qos.protocol.rtps_reliable_writer.heartbeat_period.sec = 0;

    dw_qos.protocol.rtps_reliable_writer.heartbeat_period.nanosec = 10000000;

    dw_qos.protocol.rtps_reliable_writer.heartbeats_per_max_samples = NUM_SAMPLES;

 

where NUM_SAMPLES should be set to the number of data samples that you want DDS to be able to hold while DDS is using the reliable protocol to send the data...if you're sending very fast, then NUM_SAMPLES should be larger (typically for small data sent continuously, we recommend setting to 40).

However, for large data, if you set NUM_SAMPLES to 1, then after you sent a data sample, the next call to DataWriter::write() will block (assuming RELIABLE and KEEP_ALL) until DDS has successfully sent and confirmed the receipt of the data.  Setting NUM_SAMPLES to 2 will allow you to write another sample while DDS is still sending the first, i.e., if you are sending the data twice in a row without any delay in between.

NOTE, the larger the value of NUM_SAMPLES, the more memory will be allocated by DDS to hold those samples.

Additionally, if only sending through shared memory (and not over UDP on a network), there are several optimizations that you can make such as changing the MTU (message_size_max) of the shared memory transport to hold the largest data that you can send without having to fragment the data.  Right now, your configuration sets the message_size_max to 64K Bytes, and only allocates 1 MB of total shared memory space.  This basically forces DDS to break up your 8 MB data into 64KB fragments to send over shared memory which can only hold 1 MB (actually each 64 KB packet only contains up to 65507 bytes of user data, the other bytes will be taken up by a header).  Fundamentally, if the 1 MB shared memory buffer isn't serviced fast enough by the receiving application, the buffer will become full, and shared memory packets will be dropped...but will be resent because of the reliability protocol.  But any dropped packets means delays and inefficient use of the CPU.

So to fix, you should set your shared memory segment to have a much larger message_size_max (over 8 MB if your largest data is 8 MB) and larger buffer (some N x 8 MB, so that multiple data can be buffered without loss in shared memory).

This needs to be configured in Recording Service if Recording Service is receiving the data (think of the Shared Memory as a mailbox owned by the receiving application).  And if Micro is receiving the data, then it should be configured similarly.

For Recording service it would be like

<domain_participant name="Participant0">
            <domain_id>0</domain_id>
        <participant_qos base_name="BuiltinQosLib::Generic.Participant.LargeData">

                <transport_builtin>
                    <mask> SHMEM </mask>
                    <shmem>
                        <message_size_max>  10485760 </message_size_max>
                        <receive_buffer_size> 20971520 </receive_buffer_size>
                    </shmem>
                </transport_builtin>

</participant_qos>

For Micro it would be:

// Set sizing for SHMEM transport

shmem_property.message_size_max = 10*1024*1024;

shmem_property.receive_buffer_size = 2*10*1024*1024; // register SHMEM transport

if (!registry->register_component(NETIO_DEFAULT_SHMEM_NAME, NETIO_SHMEMInterfaceFactory_get_interface(),

                                  &shmem_property._parent._parent, nullptr))

The above won't help at all if you are also sending the data over a network, which has a MTU of 64 KB.  When sending the data to multiple participants, DDS will use the smallest MTU to fragment the data, so if UDP is 64 KB and shared memory is 10 MB, DDS will fragment at 64 KB.

Finally, this is advanced stuff and usually we recommend getting some training first before diving into this, if you are only sending data through shared memory, you can consider using the ZeroCopy Shared Memory option..., see this documentation: https://community.rti.com/static/documentation/connext-micro/3.0.3/doc/html/usersmanual/zerocopy.html

Offline
Last seen: 4 years 2 months ago
Joined: 08/24/2020
Posts: 5

Hi,

thanks @Howard for the amazing help. I think it is working now as expected. Only one question is left over. When I first ran all the settings as you explained I got the error:

NDDS_Transport_Shmem_attach_writer:incompatible shared memory segment found. Found segment with max message size 65536. Needed 10485760.

I then changed 

<message_size_max>  10485760 </message_size_max>

to this size (65536) and it worked like a charm. But you said:

So to fix, you should set your shared memory segment to have a much larger message_size_max (over 8 MB if your largest data is 8 MB) and larger buffer (some N x 8 MB, so that multiple data can be buffered without loss in shared memory).

This needs to be configured in Recording Service if Recording Service is receiving the data (think of the Shared Memory as a mailbox owned by the receiving application).

And I do not really understand why this is error occors as for me it looks it is the setting in the writer

shmem_property.message_size_max = 65536;

that is causing this.

 

To complete the open question from @maxx (also thanks to you) the size of the data buffer array is fixed. From the idl I got

sequence<octet,7372800> a_DataBuffer;

But I think @Howard already solved the problem.

 

 

maxx's picture
Offline
Last seen: 1 year 2 weeks ago
Joined: 08/26/2020
Posts: 9

Hi Niclas,

Did you set both

<message_size_max>  10485760 </message_size_max>

in the Recording Service XML configuration, as well as 

shmem_property.message_size_max = 10*1024*1024;

In the Micro source configuration? The error you are seeing points to a shared memory configuration mismatch. So they should both have the same value, 10485760. 

Maxx

Offline
Last seen: 4 years 2 months ago
Joined: 08/24/2020
Posts: 5

Hi Maxx,

so I got this setting has to be the same in writer AND receiver? Good point to know.

No I did not set it in the the receiver (by code) cause this is hidden under a library at our company I am not cappable to change. (I can see the code but not change it that easily)

But nevertheless it is owrking althoug the shared memory is fragmented. I think increasing the <receive_buffer_size> made the fix.

Niclas

 

 

maxx's picture
Offline
Last seen: 1 year 2 weeks ago
Joined: 08/26/2020
Posts: 9

Hi Niclas,

Yes, the shared memory configuration should match in the XML (for the Recording Service) and in the code (of the Publisher, using Micro). I am not sure what you mean by the receiver in your last reply, if it is not the recording service... is this the target that subscribes to the replay of the recorded data?

In any case, glad to know it is working! 

Maxx

 

Offline
Last seen: 4 years 2 months ago
Joined: 08/24/2020
Posts: 5

Hi Maxx,

Thanks again for your help and the elucidation of the last question.

With receiver I meant the recording service. And as you correctly pointed out the the Publisher is the DDS Micro application.

Niclas