what cause packet lost in local wired network?

6 posts / 0 new

Log in or register to post comments

Last post

Sun, 01/05/2025 - 04:36

#1

samiraghorbani2020

Offline

Last seen: 1 week 1 day ago

Joined: 09/10/2022

Posts: 44

what cause packet lost in local wired network?

Hello. I use Connext DDS version 6.0.0. I create a dds domain with multi node (more than one PC) with same spec, and same OS. PCs connected with each other in a local network with NIC and Switch that support gig. When I run "ethtool " it shows 1000Mb/s speed. I set this configs for linux machine in /etc/sysctl.conf:

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

net.core.wmem_max = 16777216
net.core.wmem_default = 131072
net.core.rmem_max = 16777216
net.core.rmem_default = 131072
 
net.ipv4.tcp_rmem = 4096 131072 16777216
net.ipv4.tcp_wmem = 4096 131072 16777216
net.ipv4.tcp_mem = 4096 131072 16777216
 
net.core.netdev_max_backlog = 30000
net.ipv4.ipfrag_high_thresh = 8388608
net.ipv4.tcp_window_scaling = 1
net.ipv4.tcp_timestamps = 1
net.ipv4.tcp_sack = 1
net.ipv4.tcp_no_metrics_save = 1
# semaphores: semmsl, semmns, semopm, semmni
kernel.sem = 250 32000 100 1024

I set this DomainParticipantFactoryQos:

1

2

3

4

dds::domain::qos::DomainParticipantFactoryQos factoryQos = dds::domain::DomainParticipant::participant_factory_qos();
factoryQos << dds::core::policy::EntityFactory::ManuallyEnable();
factoryQos->resource_limits.max_objects_per_thread(4096);
dds::domain::DomainParticipant::participant_factory_qos(factoryQos);

I set this DomainParticipantQos:

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

dds::domain::qos::DomainParticipantQos dpQos = dds::core::QosProvider::Default()
        .participant_qos(rti::core::builtin_profiles::qos_lib::baseline());
dpQos->participant_name = rti::core::policy::EntityName("MyApp");
rti::core::policy::DomainParticipantResourceLimits resource_limits_qos;
resource_limits_qos.type_code_max_serialized_length(0);
resource_limits_qos.reader_user_data_max_length(65536);
resource_limits_qos.writer_user_data_max_length(65536);
resource_limits_qos.type_object_max_serialized_length(20000);
dpQos << resource_limits_qos;
 
std::map<std::string,std::string> dpPropertyMap = {{"dds.transport.UDPv4.builtin.recv_socket_buffer_size", "16777216"},
                                                   {"dds.transport.UDPv4.builtin.send_socket_buffer_size", "16777216"},
                                                   {"dds.transport.UDPv4.builtin.parent.message_size_max", "100000"},
                                                   {"dds.transport.shmem.builtin.receive_buffer_size", "2048576"}};
</std::string,std::string>

I use about hundred idls in domain and I use different Qos for dds entities that are compatible with each other. I configured some dataReaders in some Idl's with BestEffort Qos. I set these qos's for dataWriter:

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

dds::pub::qos::DataWriterQos qosDw;
std::map<std::string, std::string=""> qosDwPropertyMap = {{"dds.builtin_type.*.max_size", "16777216"},
                                                       {"dds.builtin_type.*.alloc_size", "16777216"},
                                                       {"dds.data_writer.history.memory_manager.fast_pool.pool_buffer_max_size", "3000000"}};
 
qosDw << rti::core::policy::Property(qosDwPropertyMap.begin(), qosDwPropertyMap.end());
qosDw << rti::core::policy::DataWriterProtocol().rtps_reliable_writer(
             rti::core::policy::RtpsReliableWriterProtocol()
             .min_send_window_size(dds::core::LENGTH_UNLIMITED)
             .max_send_window_size(dds::core::LENGTH_UNLIMITED)
             )
      << rti::core::policy::PublishMode::Asynchronous()
      << dds::core::policy::ResourceLimits()
      << dds::core::policy::Lifespan(dds::core::Duration(lifespan,0))
      << dds::core::policy::History::KeepAll();
</std::string,>

And I use these qos's for dataReader:

1

2

3

4

5

6

7

8

9

10

dds::sub::qos::DataReaderQos qosDr;
std::map<std::string, std::string=""> qosDrPropertyMap = {{"dds.data_reader.history.memory_manager.fast_pool.pool_buffer_max_size", "3000000"},
                                                       {"reader_resource_limits.dynamically_allocate_fragmented_samples", "true"}};
 
qosDr << rti::core::policy::Property(qosDrPropertyMap.begin(), qosDrPropertyMap.end());
qosDr   << rti::core::policy::DataReaderProtocol().rtps_reliable_reader(
               rti::core::RtpsReliableReaderProtocol())
        << dds::core::policy::ResourceLimits()
        << dds::core::policy::History::KeepAll();
</std::string,>

I use local wired network, so I expect that there is not any packet drop in sending and receiving idl packets with stable hardwares even I use BestEffort qos for dataReaders. But I see packet lost in some situations. This situation is repeatable. I don't understand why this happen? After packet lost I ckeck packet drop statistics of the interface with this command:

1 2	`# ethtool -S <interface-name> \| grep drops` `</interface-name>`

Also I know backLog queue is enouph because second column in output of these command (that shows The number of dropped frames because of a full backlog queue) is zero:

1	`# awk '{for (i=1; i<=NF; i++) printf strtonum("0x" $i) (i==NF?"\n":" ")}' /proc/net/softnet_stat \| column -t`

Also I know I don't need more transmit queue length, because output of this command shows zero dropped:

1 2	`# tc -s qdisc show dev <interface-name>` `</interface-name>`

the things that I should mention is that:

packet lost is in poth multicast and unicast discovery.
packet lost is in both syncronous and asyncronous publisher.
I see allways 3 packet losts together (when I check publication_sequence_number in sample info). This means packet lost is not random and there is a real reason for it.
I use multithreading in sending this packet, but I see packet lost even when I use mutex in sending packets.
More dataReaders in network cause more packet lost.

So what can be the reason of packet lost and how can I check possible reasons?

Keywords:

packet lost in local network

Mon, 01/06/2025 - 13:20

#2

Howard

Howard's picture

Offline

Last seen: 3 days 23 hours ago

Joined: 11/29/2012

Posts: 629

Does DDS report packets are lost for the DataReader? on_samples_lost() callback? If so, what is the reason reported? Are packet losses only reported for BEST_EFFORT DataReaders?

For the DataReaders that are losing data, is the data type keyed or not?

If you set RELIABLE Reliability for the readers that are reporting lost packets, are packets still being lost?

I see you using

<< dds::core::policy::ResourceLimits()

I'm not sure that this is does anything since you are not modifying any values in the resource limits policy. What are you expect this to do?

Fundamentally, with BEST_EFFORT connections, there is no guarantee that all data will be received. DDS is allowed to overwrite old data with new data even if the user app hasn't read the old data from the DataReader....which can happen if the user code is somehow being delayed from reading the data.

If you absolutely need to receive ALL data sent for a topic with DDS, you need to configure the QoS for DataWriters and DataReaders to have strict reliability (RELIABLE Reliability, and KEEP_ALL History)

ALTHOUGH, I also you using the LifeSpan QOS. In that case, if DDS isn't able to deliver the data within the lifespan...or the lifespan of the data expires before the user app reads the data, DDS will automatically delete that data....even with strict reliability. The same when using any sort of filtering (Time or content-based filter).

Sat, 01/11/2025 - 09:15

#3

samiraghorbani2020

Offline

Last seen: 1 week 1 day ago

Joined: 09/10/2022

Posts: 44

Thank you Mr. Howard.
Allways on_sample_lost() reports "DDS_LOST_BY_WRITER". Life span is long enough for sending all packets. And thank you for your note about dds::core::policy::ResourceLimits().
DataReaders are BestEffort. I didn't test Reliable. Reliability has overhead by sending ACK/NACK or any thing like that. I just want to be sure that DataWriter's DomainParticipant (that has a socket with ip port and sends and receives data) send one sample to all DataReaders that subscribe DomainParticipant's DataWriters. I want after sending a sample to all dataReader, it continue to sending new samples one by one to all DataReaders for packets.
How Can I do that without Reliable Qos? I don't understand why DataWriter should remove sample before sending it to all subscribers? There should be possibility of using BestEffort Qos in a way that user can be rely to dds in sending packet to all dataReaders before sending new packets. And of course I will need a function that determine maximum delay in sending packets. I think not having this possibility could count as an insufficiency! these packet losts accure when cpu percentage is about 30 and I just send about 16mb/sec Data!

Sun, 01/12/2025 - 19:42

#4

Howard

Howard's picture

Offline

Last seen: 3 days 23 hours ago

Joined: 11/29/2012

Posts: 629

"DDS_LOST_BY_WRITER" usually means that the writer no longer has the data in the cache to resend/repair even though the DataReader is missing the data. You should only get "DDS_LOST_BY_WRITER" for connections that are Reliable. Are you sure that you get that reason for loss when the DataReader is BEST_EFFORT?

With any data, a source of data loss is the receive socket size or shared memory buffer (in the case of sending/receiving between processes on the same host).

Try increasing the socket buffer per this documentation: https://community.rti.com/kb/achieving-low-jitter-performance-connext-pro#Socket-buffer-sizes

Mon, 01/13/2025 - 02:08

#5

samiraghorbani2020

Offline

Last seen: 1 week 1 day ago

Joined: 09/10/2022

Posts: 44

As I said above I set this qos for data reader:

dds::sub::qos::DataReaderQos qosDr;

std::map<std::string, std::string=""> qosDrPropertyMap = {{"dds.data_reader.history.memory_manager.fast_pool.pool_buffer_max_size", "3000000"},

{"reader_resource_limits.dynamically_allocate_fragmented_samples", "true"}};

qosDr << rti::core::policy::Property(qosDrPropertyMap.begin(), qosDrPropertyMap.end());

qosDr << rti::core::policy::DataReaderProtocol().rtps_reliable_reader(

rti::core::RtpsReliableReaderProtocol())

<< dds::core::policy::ResourceLimits()

<< dds::core::policy::History::KeepAll();

And it does not make DataReader reliable. Does it?

Increasing socket buffer didn't fix the problem.

I think when samples are written in fast rate, DataWriter sometimes (for example when block-time reached to a threshold) doesn't continue to sending copies to dataReaders and jumps to next sample, is that true? and if it is, is there any property or Qos to configure for fixing the issue?

Thu, 01/16/2025 - 23:26

#6

Howard

Howard's picture

Offline

Last seen: 3 days 23 hours ago

Joined: 11/29/2012

Posts: 629

 I don't understand what you think this is doing:
<< rti::core::policy::DataReaderProtocol().rtps_reliable_reader(
               rti::core::RtpsReliableReaderProtocol())
        << dds::core::policy::ResourceLimits()
 
This code is not customizing any values for either policy.
 
How did you increase the socket buffer?  Are you sure that they were increased? In addition, on Linux, you have to configure the OS to allow you to set larger socket buffers. 
 
See this article:
 
https://community.rti.com/howto/improve-rti-connext-dds-network-performance-linux-systems
 
What is the size of the data that is being sent?
 
I think when samples are written in fast rate, DataWriter sometimes (for example when block-time reached to a threshold) doesn't continue to sending copies to dataReaders and jumps to next sample, is that true?
 
No, that is not true.  The DataWriter::write() can only block if there is a reliable reader who haven't acknowledged that it received all of the data.  If the write() return an timeout exception or return code, then the data that the user tried to write was never accepted to be sent.
 
If the write() call returns OK, then the data was accepted to be sent and was sent on the wire (in the case of synchronous publish_mode which is the default).
 
Are you checking the return value of the write() call?

RTI Community Portal Terms of Use

NOTICE: Any content you submit to the RTI Research Community Portal, including personal information, is not subject to the protections which may be afforded to information collected under other sections of RTI's Web site. You are entirely responsible for all content that you upload, post, e-mail, transmit or otherwise make available via RTI Community Portal. RTI does not control the content posted by visitors to RTI Community Portal and, does not guarantee the accuracy, integrity, or quality of such content. Under no circumstances will RTI be liable in any way for any content not authored by RTI, or any loss or damage of any kind incurred as a result of the use of any content posted, e-mailed, transmitted or otherwise made available via RTI Community Portal. Read the complete Terms prior to use.

Please see RTI's privacy policy and cookie policy if you have questions about any information collected during the sign-up process.

Community of RTI Data Distribution Service Users. Copyright © Real-Time Innovations, Inc.