Unicast Discovery and offline hosts

1 post / 0 new
Offline
Last seen: 6 years 2 months ago
Joined: 04/04/2014
Posts: 27
Unicast Discovery and offline hosts

Hello,


we are migrating from old Suse with kernel 3.0 to a new version with kernel 3.12. One of the changes are ARP limits:

unres_qlen_bytes - INTEGER
    The maximum number of bytes which may be used by packets queued for each unresolved address by other network layers. (added in linux 3.3) Setting negative value is meaningless and will return error.
    Default: 65536 Bytes(64KB)

unres_qlen - INTEGER
    The maximum number of packets which may be queued for each     unresolved address by other network layers.
    (deprecated in linux 3.3) : use unres_qlen_bytes instead. Prior to linux 3.3, the default value is 3 which may cause unexpected packet loss. The current default value is calculated according to default value of unres_qlen_bytes and true size of packet.
    Default: 31 

 Our setup consists of around 40 workstations, around half of them are communicating via multicast and another half must be accessed via unicast. Our discovery setup is as following:

  • Multicast hosts use multicast address and hostnames of non-multicast peers
  • Non-multicast hosts don't use multicast and use hostnames of all peers (otherwise they have to wait to be communicated by one of the multicast hosts)

This setup works well (fast discovery) with kernel 3.0 (unres_qlen = 3) but has problems with kernel 3.12 (unres_qlen = 31 and unres_qlen_bytes = 65536) -  sendmsg calls sending announcments to offline hosts block the send buffer until arp for the this offline host timeouts, which takes around 3s with default kernel settings. Since sendmsg is in protected section most of the dds functions are blocked for this time. In this case the application needs from 10s to 60s to start (init domain participant and create some data readers and data writers). I didn't check yet, but maybe the same problem occurs every participant_liveliness_assert_period.


I've tried/thought about following:

  • Use mainly multicast - may be not possible in our network setup
  • Tune ARP kernel settings - don't want to mess with kernel settings to avoid breaking something else
  • Increase dds.transport.UDPv4.builtin.send_socket_buffer_size - my favorite and it greatly improves performance, current limit is 200KB and I want to test with bigger value.
  • Reducing maximum number of participants per host greatly reduces startup time but 10-20 DDS processes on some hosts are possible, so there is not much to optimize
  • Set initial_participant_announcements to 1 - I suppose that there is no harm in reducing from 5 to 1.

What is the recommended way to tune discovery in such setup? What are other options which could help to improve perofmrance during discovery? Using  send_blocking setting is not recommended, right?

Any helpful comments would be greatly appreciated.

Keywords: