Error when replay large data

5 posts / 0 new
Last post
Offline
Last seen: 2 years 11 months ago
Joined: 04/07/2021
Posts: 3
Error when replay large data

Hi,

I have 2 processes A and B

 - Process A publish a full HD RGB image with @transfer_mode(SHMEM_REF) on a topic

 - Process B subscribe then visualize image

Now I want to record data of process A then replay for process B

I encounterd this issue when replay the saved data

COMMENDSrWriterService_canSampleBeSent:!write. Reliable fragmented data requires asynchronous writer.
COMMENDSrWriterService_write:sample cannot be sent
PRESPsWriter_writeCommend:!srw->write
PRESPsWriter_writeInternal:!failed to write sample in Commend

I attached my record and replay configuration file

Thanks,

An

AttachmentSize
File recorder_config.xml1.69 KB
File replay_config.xml1.86 KB
Howard's picture
Offline
Last seen: 1 day 4 hours ago
Joined: 11/29/2012
Posts: 565

You have to configure the DataWriter used by the replay service to use the ASYNCHRONOUS publish_mode.

In your existing replay_config.xml, you define a QOS Profile that sets asynchronous publish mode in the replay_config.xml file, but you didn't configure the datawriter for the topic to use it

            <topic_group name="DefaultTopicGroup">
                <datawriter_qos base_name="MyQos::CameraSharedMemory"/>
                <allow_topic_name_filter>*</allow_topic_name_filter>
                <deny_topic_name_filter>rti/*</deny_topic_name_filter>
            </topic_group>

I also note that the Replay participant hasn't been configured to have a large shared memory buffer like the Record participant.

NOTE:  the following comments apply only to the case where you are NOT using zero copy.  If you're limiting your communications to direct shared memory (not zero copy over shared memory), then setting the message_size_max property for the SharedMemory transport to be bigger than the largest data that you're sending through shared memory will allow DDS to make a single copy of the entire data sample without fragmentation.  By default message_size_max is 64 KB for shared memory.  You should set this value to (n*x + 512), where N is the number of samples that can be buffered in the shared memory queue, X is the max size of  a data sample.  The extra 512 bytes are for DDS headers.

However, if using zero-copy transfer, then you don't need to change the size of the share memory buffer for the shared memory transport.  In fact, the shared memory transport itself is NOT being used to transfer the actual data, it's only being used to transfer a pointer (reference) to a different shared memory segment in which the data was stored.

I also note that the error message about fragmenting data is an indication that the Replay Service is NOT using zero copy for replay.  As previously stated, when using zero-copy, there is no transfer of large data through shared memory, and thus no fragmentation and thus no ASYNCHRONOUS publish_mode is needed. 

Offline
Last seen: 2 years 11 months ago
Joined: 04/07/2021
Posts: 3

Hi Howard,

Thanks for your help! Now the replay service can publish message without any error.

 

Offline
Last seen: 2 years 11 months ago
Joined: 04/07/2021
Posts: 3

Hi Howard,

I just have another question. The replay published messages with unexpected FPS  and I saw the following logs :


sample buffer min size is smaller than the minimum sample deserialized size. You may want to increase this value through the <sample_buffer_min_size> to reduce dynamic memory allocations

NDDS_Transport_Shmem_send:failed to add data. shmem queue for port 0x1cf3 is full (received_message_count_max=1000, receive_buffer_size=60000000). Try to increase queue resource limits.

File Attachments: 
Howard's picture
Offline
Last seen: 1 day 4 hours ago
Joined: 11/29/2012
Posts: 565

Well, first, the Replay Service can not using zero copy to replay the data.  Thus the data is being fragmented at the message_size_max (default 64 KB) of the shared memory transport and sent in multiple fragments over shared memory.

In addition, you enabled "RELIABLE" Reliability but didn't actually configure the Reliability protocol at all.  Out-of-the-box, the Reliable protocol configuration parameters aren't optimized for sending Large fragmented data (or actually any practical use case really).

Thus, I'm guessing, the replay service is sending data faster than the reliable protocol is configured to keep the buffers clear...and the size of the writer buffer is probably > size of the shared memory buffer so that the shared memory buffer is overflowing before the writer buffer is full.

Typically, we suggest that for connections that need to be reliable, the QOS used for the DataWriter AND DataReader should be derived from the builtin QOS Profiles that are configured for Reliability.   You can read about builtin QOS profiles here:

https://community.rti.com/examples/built-qos-profiles

https://community.rti.com/kb/configuring-qos-built-profiles

You probably want to

1) increase shared memory message_size_max of your applications, both the Replay Service as well as the application subscribing to the replayed topic to the max size of the topic that you want to send plug a few bytes (say 512) for DDS overhead.

NOTE: this requires you to configure the Participant only to use the shmem transport.  If both UDP and Shmem transports are configured, the smallest value of message_size_max across all installed transports will be used for fragmentation for all transports.

This would be the Participant QOS, for example if an image was 1 MB, then I would set the message size max to 105,000 (1048576+512 rounded up)

                <transport_builtin>
                    <mask>SHMEM</mask>
                    <shmem>
                        <message_size_max> 105000 </message_size_max>
                        <receive_buffer_size> 60000000 </receive_buffer_size>
                        <received_message_count_max> 1000 </received_message_count_max>
                    </shmem>
                </transport_builtin>

which, by the way, I don't know why you set received_message_count_max to 1000.  If you are only sending images through the shared memory transport, with a 60,000,000 byte buffer, it can only hold 57 x 1 MB messages (not including DDS overhead per message).

2) Since you are sending image data...and are concerned more about FPS than about receiving every frame....and via shared memory the only way to lose data/frames is if the sending app writes the frames faster than the receiving app can read and process...then there's frankly no reason to use a RELIABLE connection to send/receive the data.

Both the DataWriter and DataReader should use the default value for Reliability...which is non-reliable, aka "BEST_EFFORT"

3) If for some reason, you *must* send/receive all image frames reliably, then you need to configure the parameters that control the reliability protocol.

I would suggest starting with using the  "BuiltinQosLib::Generic.StrictReliable" QOS profile as the base profile for the DataWriter.  This assumes that data samples are NOT fragmented, i.e., message_size_max is > the largest image being sent.

If fragmentation is occuring, then you should use the "BuiltinQosLib::Generic.StrictReliable.LargeData" instead.

<qos_profile name="CameraSharedMemory" base_name="BuiltinQosLib::Generic.StrictReliable">
...
</qos_profile>