Improve RTI Connext DDS Network Performance on Linux Systems

There are many aspects of the Linux kernel configuration that can affect performance. This HOWTO attempts to gather some of these.

Use the tuned-adm tool in RedHat systems

The tuned-adm tool allows you to switch between different profiles that internally tune the system parameters to aim better latency or throughput. Instead of manually tuning several kernel parameters, this tool does that work for you base on the profile.

This tool provides some predefined profiles. To list all available profiles and identify the current active profile, run:
tuned-adm list
To only display the currently active profile, run:
tuned-adm active
To switch to one of the available profiles, run:
tuned-adm profile profile_name
for example:
tuned-adm profile server-powersave
To disable all tuning:
tuned-adm off

A good profile when aiming the lowest latency is network-latency. When aiming to get higher throughput, the balanced profile works great for that. It is important to change the active profile if the performance goal changes (latency to throughput or viceversa).

These profiles do not modify the kernel receive and send socket buffers. Thus, this should be done separately.

Size of kernel receive and send socket buffers

When working with a Linux system, there are kernel parameters that limit the maximum send and receive socket buffer sizes.

To see your system's settings, run this command:

[root@slab15 ~]# /sbin/sysctl -a | grep core

The default values are:

net.core.rmem_default = 109568
net.core.wmem_default = 109568<
net.core.rmem_max = 131071<
net.core.wmem_max = 131071

(Where "rmem" is for the receive socket, "wmem" is for the send socket.)

For improved DDS performance, we suggest increasing the maximum send and receive socket buffer sizes. For example:

# RTI: Increase max. send/recv socket buffer limits
# for higher network performance
net.core.rmem_default = 65536
net.core.rmem_max = 2097152
net.core.wmem_default = 65536
net.core.wmem_max = 1048576

To temporarily change any of this paramaters you can use sysctl or the /proc filesystem. Either way the changes only apply to the running operating system state. They will not survive a reboot.

For example to set the new value of 65536 for net.core.rmem_default, become the root user and use the sysctl command shown below:

sysctl -w net.core.rmem_default="65536"

The same result can also be changed via the /proc file system with the following command:

echo "65536" > /proc/sys/net/core/rmem_default

Alternatively to make changes to these parameters change permanent across reboots, edit the /etc/sysctl.conf file and add or edit the corresponding variables. For example, to set the net.core.rmem_default edit the /etc/sysctl.conf file and add the line:

net.core.rmem_default = 65536

For information about setting socket buffer sizes on QNX, see here.

Maximum number of input packets queued on the network interface

This queue is used to hold packets when the interface receives them faster than the kernel can process them. The default setting is around 300. For fast networks (1 GigE and beyond) it is recommended to increase it so that packets bursts do not immediately result in some packets being dropped.

# RTI: Increase max. recv packets queued at the interface
# for higher network performance
net.core.netdev_max_backlog = 30000

Amount of buffer space the Linux kernel uses to reassemble IP fragments

One potential cause of poor performance on a Linux system is the amount of buffer space the kernel uses to reassemble IP fragments. The description here applies to the configuration of the RedHart Enterprise Linux 4.4 but similar considerations apply to other kernel configurations.

The parameters in /proc/sys/net/ipv4 control various aspects of the network, including a parameter that controlls the reassembly buffer size.

ipfrag_high_thresh specifies that maximum amount of memory used to reassemble IP fragments. When the memory used by fragments reaches ipfrag_high_thresh, old entries are removed until the memory used declines to ipfrag_low_thresh. If the output of netstat shows increasing amounts of IP fragment reassemblies failing, we recommend to increase ipfrag_high_thresh. The impact can be significant. In some use cases, we have seen that increasing this buffer space improved throughput from 32MB/sec to 80MB/sec on a 1 Gbit Ethernet. To temporarily change the value of ipfrag_high_thresh, use this command as root:

echo "8388608" > /proc/sys/net/ipv4/ipfrag_high_thresh

To make this change permanent across reboots, edit the /etc/sysctl.conffile and add the line:

net.ipv4.ipfrag_high_thresh = 8388608

NIC Card Settings: Interrupt coalescing

The configuration of the NIC card can have a big impact on throughput and latency. This depends heavily on the kind of NIC you use. In our experience the Intel 1 Gbit NICs are very sensitive to this setting so this section applies mostly to those. Other NICs may be less sensitive or not even offer the option to configure the setting.

Interrupt coalescing refers to the ability of the NIC to not interrupt the CPU immediately whenever a packet is received, but rather wait a little bit in the how that more packets arrive. That way a single interrupt can be used to process multiple packets.  This decision represent a tradeoff between latency and throughput. Coalescing the interrupts amounts to a wait which will enhance throughput but degrade latency. Disabling the Interrupt Coalescing and thus forcing the NIC to interrupt the CPU for each packet will provide the minimal latency but lower throughput.

Out of the box most NICs are configured with an "adaptive" setting (sometimes called "dynamic") which in our experience tends to favor throughput over latency. This depends on the actual NIC used an sometimes also on the Linux distribution.

Depending on the Linux distribution and NIC driver you can use tools such as "ethtool" and "modprobe" to find out and modify the NIC settings. Note that these commands must be executed with "root" priviledges.

The first step is to identify the NIC adapter and driver you are using. The following Adapter and Driver ID Guide" at http://support.intel.com/support/network/adapter/pro100/21397.htm provides a good reference on how to do this for different systems:

Step 1 is to identify the list of Ethernet ports and their names. In our system:

[root@slab15 ~]# lspci -v | grep Ethernet
[root@slab15 ~]# ifconfig | grep eth
eth1      Link encap:Ethernet  HWaddr 00:1B:21:00:38:94  

Step 2 is to we identify the name of the network adapter:

root@slab15 ~]# lspci -v | grep Ethernet


01:00.0 Ethernet controller: Intel Corporation 82572EI Gigabit Ethernet Controller (Copper) (rev 06)
03:00.0 Ethernet controller: Marvell Technology Group Ltd. 88E8053 PCI-E Gigabit Ethernet Controller (rev 22)
Subsystem: Giga-byte Technology Marvell 88E8053 Gigabit Ethernet Controller (Gigabyte) 

And finally in Step 3 the driver version. Note that we use the Ethernet port name "eth1" obtained from the first step.:

[root@slab15 ~]# ethtool -i eth1
driver: e1000
version: 7.2.7-k2-NAPI
firmware-version: 5.11-10
bus-info: 0000:01:00.0

We can check the current settings by looking at /etc/modprob.conf file. Note how the use of the driver name "e1000" obtained in the previous step:

[root@slab15 ~]# grep e1000 /etc/modprobe.conf
alias eth1 e1000
options e1000 InterruptThrottleRate=1

This setting of "1" corresponds to an adaptive setting which tries to balance latency and throughput, so it will not provide minimal latency. To achieve minimal latency we must disable Interrupt Throttling completely by setting InterruptThrottleRate=0. We can do that either using ethtool (if the system supports that), or alternatively manually editing the /etc/modprobe.conf file and then rebooting the system. In our experience this second mechanism is the more robust way to do it.

Additional information and suggestions for improving the latency on Intel Ethernet controllers can be found at: http://download.intel.com/design/network/applnots/322819.pdf and http://www.kernel.org/doc/Documentation/networking/e1000.txt

MTU (Maximum Transmission Unit) of the Network Interface (NIC)

This parameter controls the largest packet size (in bytes) that the interface can transmit to the network. Larger packets will be fragmented by the interface, send in separate network packets, and re-assembled on the receiving network interface. This fragmentation and re-assembly will diminish the throughput and increase the latency of the communication.

Out-of-the-box most Linux systems are configured with an MTU of 1500 Bytes. This value derives from old Ethernet NICs and switches that could not handle larger packets. Modern Ethernet hardware (e.g. 1 Gbit/sec NICs) can handle MTU of at least 9000B. Therefore it is recommended that you reconfigure your operating system network settings to match what the hardware can do.

To see your system's settings, run this command as user root:

[root@slab15 ~]# ifconfig -a
eth0      Link encap:Ethernet  HWaddr 00:0C:29:9E:D7:DC  
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:47633 errors:47346 dropped:0 overruns:0 frame:0
          TX packets:27546 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:61940065 (59.0 MiB)  TX bytes:1739139 (1.6 MiB)
          Interrupt:19 Base address:0x2024 

lo        Link encap:Local Loopback  
          inet addr:127.0.0.1  Mask:255.0.0.0
          inet6 addr: ::1/128 Scope:Host
          UP LOOPBACK RUNNING  MTU:16436  Metric:1
          RX packets:40 errors:0 dropped:0 overruns:0 frame:0
          TX packets:40 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0 
          RX bytes:4003 (3.9 KiB)  TX bytes:4003 (3.9 KiB)

In the above example you can see that "eth0" which represents the external interface has the MTU set to 1500 Bytes.

For improved DDS performance, increase the MTU to 9000 Bytes. You can do this two ways. One from the command line which will take effect immediately but only work until the next reboot. The other way changes teh boot settings so that the next time you reboot the MTU is set to the new 9000 Byte value.

To do it in the command line type the following as root (note "eth0" should be replaced by the name of your interface as showin in the ifconfig command):

[root@slab15 ~]# ifconfig eth0 mtu 9000

To change it permanently in RedHat Linux edit the file /etc/sysconfig/network-scripts/ifcfg-eth0 and add the following line to it:

MTU=9000

This is my file after I added the line:

[root@slab15 ~]# cat /etc/sysconfig/network-scripts/ifcfg-eth0
# Intel Corporation 82801G (ICH7 Family) LAN Controller
DEVICE=eth0
BOOTPROTO=static
BROADCAST=10.10.255.255
HWADDR=00:19:DB:AD:FE:8D
IPADDR=10.10.30.101
IPV6ADDR=fe80::219:dbff:fead:fe8d/64
IPV6PREFIX=64
NETMASK=255.255.0.0
NETWORK=10.10.0.0
ONBOOT=yes
MTU=9000

After the file is changed, re-start the network service with the following command:

[root@slab15 ~]# service network restart eth0
Shutting down interface eth0:                              [  OK  ]
Shutting down loopback interface:                          [  OK  ]
Bringing up loopback interface:                            [  OK  ]
Bringing up interface eth0:  
Determining IP information for eth0... done.               [  OK  ]

Now the change will be preserved even if the system is rebooted.

Note that the actual mechanism to set the MTU depends on the Linux distribution. The instructions here are for RedHat. Ubunto and Debian offer similar functionality but use different commands and files to configure the MTU.