Improve RTI Connext DDS Network Performance on Linux Systems
There are many aspects of the Linux kernel configuration that can affect performance. This HOWTO attempts to gather some of these.
Use the tuned-adm tool in RedHat systems
The tuned-adm tool allows you to switch between different profiles that internally tune the system parameters to aim better latency or throughput. Instead of manually tuning several kernel parameters, this tool does that work for you base on the profile.
tuned-adm list
tuned-adm active
tuned-adm profile profile_name
tuned-adm profile server-powersave
tuned-adm off
A good profile when aiming the lowest latency is network-latency. When aiming to get higher throughput, the balanced profile works great for that. It is important to change the active profile if the performance goal changes (latency to throughput or viceversa).
These profiles do not modify the kernel receive and send socket buffers. Thus, this should be done separately.
Size of kernel receive and send socket buffers
When working with a Linux system, there are kernel parameters that limit the maximum send and receive socket buffer sizes.
To see your system's settings, run this command:
[root@slab15 ~]# /sbin/sysctl -a | grep core
The default values are:
net.core.rmem_default = 109568 net.core.wmem_default = 109568< net.core.rmem_max = 131071< net.core.wmem_max = 131071
(Where "rmem" is for the receive socket, "wmem" is for the send socket.)
For improved DDS performance, we suggest increasing the maximum send and receive socket buffer sizes. For example:
# RTI: Increase max. send/recv socket buffer limits # for higher network performance net.core.rmem_default = 65536 net.core.rmem_max = 2097152 net.core.wmem_default = 65536 net.core.wmem_max = 1048576
To temporarily change any of this paramaters you can use sysctl or the /proc filesystem. Either way the changes only apply to the running operating system state. They will not survive a reboot.
For example to set the new value of 65536 for net.core.rmem_default, become the root user and use the sysctl command shown below:
sysctl -w net.core.rmem_default="65536"
The same result can also be changed via the /proc file system with the following command:
echo "65536" > /proc/sys/net/core/rmem_default
Alternatively to make changes to these parameters change permanent across reboots, edit the /etc/sysctl.conf file and add or edit the corresponding variables. For example, to set the net.core.rmem_default edit the /etc/sysctl.conf file and add the line:
net.core.rmem_default = 65536
For information about setting socket buffer sizes on QNX, see here.
Maximum number of input packets queued on the network interface
This queue is used to hold packets when the interface receives them faster than the kernel can process them. The default setting is around 300. For fast networks (1 GigE and beyond) it is recommended to increase it so that packets bursts do not immediately result in some packets being dropped.
# RTI: Increase max. recv packets queued at the interface # for higher network performance net.core.netdev_max_backlog = 30000
Amount of buffer space the Linux kernel uses to reassemble IP fragments
One potential cause of poor performance on a Linux system is the amount of buffer space the kernel uses to reassemble IP fragments. The description here applies to the configuration of the RedHart Enterprise Linux 4.4 but similar considerations apply to other kernel configurations.
The parameters in /proc/sys/net/ipv4 control various aspects of the network, including a parameter that controlls the reassembly buffer size.
ipfrag_high_thresh
specifies that maximum amount of memory used to reassemble IP fragments. When the memory used by fragments reaches ipfrag_high_thresh
, old entries are removed until the memory used declines to ipfrag_low_thresh
. If the output of netstat shows increasing amounts of IP fragment reassemblies failing, we recommend to increase ipfrag_high_thresh
. The impact can be significant. In some use cases, we have seen that increasing this buffer space improved throughput from 32MB/sec to 80MB/sec on a 1 Gbit Ethernet. To temporarily change the value of ipfrag_high_thresh
, use this command as root:
echo "8388608" > /proc/sys/net/ipv4/ipfrag_high_thresh
To make this change permanent across reboots, edit the /etc/sysctl.conffile and add the line:
net.ipv4.ipfrag_high_thresh = 8388608
NIC Card Settings: Interrupt coalescing
The configuration of the NIC card can have a big impact on throughput and latency. This depends heavily on the kind of NIC you use. In our experience the Intel 1 Gbit NICs are very sensitive to this setting so this section applies mostly to those. Other NICs may be less sensitive or not even offer the option to configure the setting.
Interrupt coalescing refers to the ability of the NIC to not interrupt the CPU immediately whenever a packet is received, but rather wait a little bit in the how that more packets arrive. That way a single interrupt can be used to process multiple packets. This decision represent a tradeoff between latency and throughput. Coalescing the interrupts amounts to a wait which will enhance throughput but degrade latency. Disabling the Interrupt Coalescing and thus forcing the NIC to interrupt the CPU for each packet will provide the minimal latency but lower throughput.
Out of the box most NICs are configured with an "adaptive" setting (sometimes called "dynamic") which in our experience tends to favor throughput over latency. This depends on the actual NIC used an sometimes also on the Linux distribution.
Depending on the Linux distribution and NIC driver you can use tools such as "ethtool" and "modprobe" to find out and modify the NIC settings. Note that these commands must be executed with "root" priviledges.
The first step is to identify the NIC adapter and driver you are using. The following Adapter and Driver ID Guide" at http://support.intel.com/support/network/adapter/pro100/21397.htm provides a good reference on how to do this for different systems:
Step 1 is to identify the list of Ethernet ports and their names. In our system:
[root@slab15 ~]# lspci -v | grep Ethernet [root@slab15 ~]# ifconfig | grep eth eth1 Link encap:Ethernet HWaddr 00:1B:21:00:38:94
Step 2 is to we identify the name of the network adapter:
root@slab15 ~]# lspci -v | grep Ethernet
01:00.0 Ethernet controller: Intel Corporation 82572EI Gigabit Ethernet Controller (Copper) (rev 06)
03:00.0 Ethernet controller: Marvell Technology Group Ltd. 88E8053 PCI-E Gigabit Ethernet Controller (rev 22)
Subsystem: Giga-byte Technology Marvell 88E8053 Gigabit Ethernet Controller (Gigabyte)
And finally in Step 3 the driver version. Note that we use the Ethernet port name "eth1" obtained from the first step.:
[root@slab15 ~]# ethtool -i eth1 driver: e1000 version: 7.2.7-k2-NAPI firmware-version: 5.11-10 bus-info: 0000:01:00.0
We can check the current settings by looking at /etc/modprob.conf file. Note how the use of the driver name "e1000" obtained in the previous step:
[root@slab15 ~]# grep e1000 /etc/modprobe.conf alias eth1 e1000 options e1000 InterruptThrottleRate=1
This setting of "1" corresponds to an adaptive setting which tries to balance latency and throughput, so it will not provide minimal latency. To achieve minimal latency we must disable Interrupt Throttling completely by setting InterruptThrottleRate=0. We can do that either using ethtool (if the system supports that), or alternatively manually editing the /etc/modprobe.conf file and then rebooting the system. In our experience this second mechanism is the more robust way to do it.
Additional information and suggestions for improving the latency on Intel Ethernet controllers can be found at: http://download.intel.com/design/network/applnots/322819.pdf and http://www.kernel.org/doc/Documentation/networking/e1000.txt
MTU (Maximum Transmission Unit) of the Network Interface (NIC)
This parameter controls the largest packet size (in bytes) that the interface can transmit to the network. Larger packets will be fragmented by the interface, send in separate network packets, and re-assembled on the receiving network interface. This fragmentation and re-assembly will diminish the throughput and increase the latency of the communication.
Out-of-the-box most Linux systems are configured with an MTU of 1500 Bytes. This value derives from old Ethernet NICs and switches that could not handle larger packets. Modern Ethernet hardware (e.g. 1 Gbit/sec NICs) can handle MTU of at least 9000B. Therefore it is recommended that you reconfigure your operating system network settings to match what the hardware can do.
To see your system's settings, run this command as user root:
[root@slab15 ~]# ifconfig -a eth0 Link encap:Ethernet HWaddr 00:0C:29:9E:D7:DC UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:47633 errors:47346 dropped:0 overruns:0 frame:0 TX packets:27546 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:61940065 (59.0 MiB) TX bytes:1739139 (1.6 MiB) Interrupt:19 Base address:0x2024 lo Link encap:Local Loopback inet addr:127.0.0.1 Mask:255.0.0.0 inet6 addr: ::1/128 Scope:Host UP LOOPBACK RUNNING MTU:16436 Metric:1 RX packets:40 errors:0 dropped:0 overruns:0 frame:0 TX packets:40 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:4003 (3.9 KiB) TX bytes:4003 (3.9 KiB)
In the above example you can see that "eth0" which represents the external interface has the MTU set to 1500 Bytes.
For improved DDS performance, increase the MTU to 9000 Bytes. You can do this two ways. One from the command line which will take effect immediately but only work until the next reboot. The other way changes teh boot settings so that the next time you reboot the MTU is set to the new 9000 Byte value.
To do it in the command line type the following as root (note "eth0" should be replaced by the name of your interface as showin in the ifconfig command):
[root@slab15 ~]# ifconfig eth0 mtu 9000
To change it permanently in RedHat Linux edit the file /etc/sysconfig/network-scripts/ifcfg-eth0 and add the following line to it:
MTU=9000
This is my file after I added the line:
[root@slab15 ~]# cat /etc/sysconfig/network-scripts/ifcfg-eth0 # Intel Corporation 82801G (ICH7 Family) LAN Controller DEVICE=eth0 BOOTPROTO=static BROADCAST=10.10.255.255 HWADDR=00:19:DB:AD:FE:8D IPADDR=10.10.30.101 IPV6ADDR=fe80::219:dbff:fead:fe8d/64 IPV6PREFIX=64 NETMASK=255.255.0.0 NETWORK=10.10.0.0 ONBOOT=yes MTU=9000
After the file is changed, re-start the network service with the following command:
[root@slab15 ~]# service network restart eth0 Shutting down interface eth0: [ OK ] Shutting down loopback interface: [ OK ] Bringing up loopback interface: [ OK ] Bringing up interface eth0: Determining IP information for eth0... done. [ OK ]
Now the change will be preserved even if the system is rebooted.
Note that the actual mechanism to set the MTU depends on the Linux distribution. The instructions here are for RedHat. Ubunto and Debian offer similar functionality but use different commands and files to configure the MTU.