what cause packet lost in local wired network?

5 posts / 0 new
Last post
Offline
Last seen: 8 hours 14 min ago
Joined: 09/10/2022
Posts: 39
what cause packet lost in local wired network?
Hello. I use Connext DDS version 6.0.0. I create a dds domain with multi node (more than one PC) with same spec, and same OS. PCs connected with each other in a local network with NIC and Switch that support gig. When I run "ethtool " it shows 1000Mb/s speed. I set this configs for linux machine in /etc/sysctl.conf:
net.core.wmem_max = 16777216
net.core.wmem_default = 131072
net.core.rmem_max = 16777216
net.core.rmem_default = 131072

net.ipv4.tcp_rmem = 4096 131072 16777216
net.ipv4.tcp_wmem = 4096 131072 16777216
net.ipv4.tcp_mem = 4096 131072 16777216

net.core.netdev_max_backlog = 30000
net.ipv4.ipfrag_high_thresh = 8388608
net.ipv4.tcp_window_scaling = 1
net.ipv4.tcp_timestamps = 1
net.ipv4.tcp_sack = 1
net.ipv4.tcp_no_metrics_save = 1
# semaphores: semmsl, semmns, semopm, semmni
kernel.sem = 250 32000 100 1024
I set this DomainParticipantFactoryQos:
dds::domain::qos::DomainParticipantFactoryQos factoryQos = dds::domain::DomainParticipant::participant_factory_qos();
factoryQos << dds::core::policy::EntityFactory::ManuallyEnable();
factoryQos->resource_limits.max_objects_per_thread(4096);
dds::domain::DomainParticipant::participant_factory_qos(factoryQos);
I set this DomainParticipantQos:
dds::domain::qos::DomainParticipantQos dpQos = dds::core::QosProvider::Default()
        .participant_qos(rti::core::builtin_profiles::qos_lib::baseline());
dpQos->participant_name = rti::core::policy::EntityName("MyApp");
rti::core::policy::DomainParticipantResourceLimits resource_limits_qos;
resource_limits_qos.type_code_max_serialized_length(0);
resource_limits_qos.reader_user_data_max_length(65536);
resource_limits_qos.writer_user_data_max_length(65536);
resource_limits_qos.type_object_max_serialized_length(20000);
dpQos << resource_limits_qos;

std::map dpPropertyMap = {{"dds.transport.UDPv4.builtin.recv_socket_buffer_size", "16777216"},
                                                   {"dds.transport.UDPv4.builtin.send_socket_buffer_size", "16777216"},
                                                   {"dds.transport.UDPv4.builtin.parent.message_size_max", "100000"},
                                                   {"dds.transport.shmem.builtin.receive_buffer_size", "2048576"}};
I use about hundred idls in domain and I use different Qos for dds entities that are compatible with each other. I configured some dataReaders in some Idl's with BestEffort Qos. I set these qos's for dataWriter:
dds::pub::qos::DataWriterQos qosDw;
std::map qosDwPropertyMap = {{"dds.builtin_type.*.max_size", "16777216"},
                                                       {"dds.builtin_type.*.alloc_size", "16777216"},
                                                       {"dds.data_writer.history.memory_manager.fast_pool.pool_buffer_max_size", "3000000"}};

qosDw << rti::core::policy::Property(qosDwPropertyMap.begin(), qosDwPropertyMap.end());
qosDw << rti::core::policy::DataWriterProtocol().rtps_reliable_writer(
             rti::core::policy::RtpsReliableWriterProtocol()
             .min_send_window_size(dds::core::LENGTH_UNLIMITED)
             .max_send_window_size(dds::core::LENGTH_UNLIMITED)
             )
      << rti::core::policy::PublishMode::Asynchronous()
      << dds::core::policy::ResourceLimits()
      << dds::core::policy::Lifespan(dds::core::Duration(lifespan,0))
      << dds::core::policy::History::KeepAll();
And I use these qos's for dataReader:
dds::sub::qos::DataReaderQos qosDr;
std::map qosDrPropertyMap = {{"dds.data_reader.history.memory_manager.fast_pool.pool_buffer_max_size", "3000000"},
                                                       {"reader_resource_limits.dynamically_allocate_fragmented_samples", "true"}};

qosDr << rti::core::policy::Property(qosDrPropertyMap.begin(), qosDrPropertyMap.end());
qosDr   << rti::core::policy::DataReaderProtocol().rtps_reliable_reader(
               rti::core::RtpsReliableReaderProtocol())
        << dds::core::policy::ResourceLimits()
        << dds::core::policy::History::KeepAll();
I use local wired network, so I expect that there is not any packet drop in sending and receiving idl packets with stable hardwares even I use BestEffort qos for dataReaders. But I see packet lost in some situations. This situation is repeatable. I don't understand why this happen? After packet lost I ckeck packet drop statistics of the interface with this command:
# ethtool -S  | grep drops
Also I know backLog queue is enouph because second column in output of these command (that shows The number of dropped frames because of a full backlog queue) is zero:
# awk '{for (i=1; i<=NF; i++) printf strtonum("0x" $i) (i==NF?"\n":" ")}' /proc/net/softnet_stat | column -t
Also I know I don't need more transmit queue length, because output of this command shows zero dropped:
# tc -s qdisc show dev 
the things that I should mention is that:
  1. packet lost is in poth multicast and unicast discovery.
  2. packet lost is in both syncronous and asyncronous publisher.
  3. I see allways 3 packet losts together (when I check publication_sequence_number in sample info). This means packet lost is not random and there is a real reason for it.
  4. I use multithreading in sending this packet, but I see packet lost even when I use mutex in sending packets.
  5. More dataReaders in network cause more packet lost.
So what can be the reason of packet lost and how can I check possible reasons?
Howard's picture
Offline
Last seen: 1 day 23 hours ago
Joined: 11/29/2012
Posts: 622

Does DDS report packets are lost for the DataReader?  on_samples_lost() callback?  If so, what is the reason reported?   Are packet losses only reported for BEST_EFFORT DataReaders?

For the DataReaders that are losing data, is the data type keyed or not?

If you set RELIABLE Reliability for the readers that are reporting lost packets, are packets still being lost?

I see you using

<< dds::core::policy::ResourceLimits()

I'm not sure that this is does anything since you are not modifying any values in the resource limits policy.  What are you expect this to do?

Fundamentally, with BEST_EFFORT connections, there is no guarantee that all data will be received.  DDS is allowed to overwrite old data with new data even if the user app hasn't read the old data from the DataReader....which can happen if the user code is somehow being delayed from reading the data.

If you absolutely need to receive ALL data sent for a topic with DDS, you need to configure the QoS for DataWriters and DataReaders to have strict reliability (RELIABLE Reliability, and KEEP_ALL History)

ALTHOUGH, I also you using the LifeSpan QOS.  In that case, if DDS isn't able to deliver the data within the lifespan...or the lifespan of the data expires before the user app reads the data, DDS will automatically delete that data....even with strict reliability.  The same when using any sort of filtering (Time or content-based filter).

Offline
Last seen: 8 hours 14 min ago
Joined: 09/10/2022
Posts: 39

Thank you Mr. Howard.
Allways on_sample_lost() reports "DDS_LOST_BY_WRITER". Life span is long enough for sending all packets. And thank you for your note about dds::core::policy::ResourceLimits().
DataReaders are BestEffort. I didn't test Reliable. Reliability has overhead by sending ACK/NACK or any thing like that. I just want to be sure that DataWriter's DomainParticipant (that has a socket with ip port and sends and receives data) send one sample to all DataReaders that subscribe DomainParticipant's DataWriters. I want after sending a sample to all dataReader, it continue to sending new samples one by one to all DataReaders for packets.
How Can I do that without Reliable Qos? I don't understand why DataWriter should remove sample before sending it to all subscribers? There should be possibility of using BestEffort Qos in a way that user can be rely to dds in sending packet to all dataReaders before sending new packets. And of course I will need a function that determine maximum delay in sending packets. I think not having this possibility could count as an insufficiency! these packet losts accure when cpu percentage is about 30 and I just send about 16mb/sec Data!

Howard's picture
Offline
Last seen: 1 day 23 hours ago
Joined: 11/29/2012
Posts: 622

"DDS_LOST_BY_WRITER" usually means that the writer no longer has the data in the cache to resend/repair even though the DataReader is missing the data.  You should only get "DDS_LOST_BY_WRITER" for connections that are Reliable.  Are you sure that you get that reason for loss when the DataReader is BEST_EFFORT?

With any data, a source of data loss is the receive socket size or shared memory buffer (in the case of sending/receiving between processes on the same host).

Try increasing the socket buffer per this documentation: https://community.rti.com/kb/achieving-low-jitter-performance-connext-pro#Socket-buffer-sizes

Offline
Last seen: 8 hours 14 min ago
Joined: 09/10/2022
Posts: 39

As I said above I set this qos for data reader:

dds::sub::qos::DataReaderQos qosDr;
std::map<std::string, std::string=""> qosDrPropertyMap = {{"dds.data_reader.history.memory_manager.fast_pool.pool_buffer_max_size", "3000000"},
                                                       {"reader_resource_limits.dynamically_allocate_fragmented_samples", "true"}};
 
qosDr << rti::core::policy::Property(qosDrPropertyMap.begin(), qosDrPropertyMap.end());
qosDr   << rti::core::policy::DataReaderProtocol().rtps_reliable_reader(
               rti::core::RtpsReliableReaderProtocol())
        << dds::core::policy::ResourceLimits()
        << dds::core::policy::History::KeepAll();
 
And it does not make DataReader reliable. Does it?
Increasing socket buffer didn't fix the problem.
I think when samples are written in fast rate, DataWriter sometimes (for example when block-time reached to a threshold) doesn't continue to sending copies to dataReaders and jumps to next sample, is that true? and if it is, is there any property or Qos to configure for fixing the issue?