How do I tune the network buffering in VxWorks 5.4?
Note: Applies to NDDS 3.0 and VxWorks 5.4.
Introduction
The VxWorks TCP/IP stack uses a fixed amount of buffers. The buffers and their sizes are configured in the kernel. This is different from non real-time OSs, where the buffers can be taken as needed from main memory.
In many NDDS applications on VxWorks, you have to modify the default values of this configuration to achieve optimal performance and to avoid running out of buffers. In certain cases, running out of buffers is a fatal event, which can stop the stack from sending or receiving.
Symptons
The following are some of the symptoms that indicate that the amount and/or sizing of the stack buffers is not configured correctly.
- Packet loss:
When a TCP/IP stack runs out of buffers, it will silently drop packets. You will notice that a best-effort subscription will start missing issues. - Poor performance of reliable protocols:
When packet loss occurs, the RTPS reliable protocols will try to regain reliability by resending packets; this will cause increases in latency and decreased throughput. - Error messages indicating NDDS is unable to send:
- A typical error message is
NDDS Alarm: Unable to serialize fired message
which indicates that NDDS is unable to use the stack to send a local UDP-message. This often indicates that the stack is out of buffers. - The stack stops working completely:
If your problem is related to buffer availability, these symptoms will go away if you send less data: either fewer packets (by decreasing the send-rate of the publications or increase the minimum separation of the best-effort subscriptions) or packets with less data (by decreasing the user-data size).
Some Background
Before we can go in detail, here is some background on the internals of buffer management in the VxWorks TCP/IP stack.
There are three important aspects:
- The mbufs and clusters used in the stack
- The amount of data that can be held by a socket
- The buffering inside the driver
The mbufs and clusters
VxWorks uses "mbufs" with associated "clusters" whenever data needs to be stored or passed around in the stack. The clusters come in pools of fixed size clusters; both the number of pools, the size of the clusters in the pool and the number of clusters in the pools is configurable.
The data held by the socket
Buffering inside the device drivers
The device driver itself will also need internal buffers that are used to hold, for example, an incoming raw Ethernet package. For improved efficiency, the so-called END-drivers also use the same mbuf structure to store this data.
General Approach
Before going into details, here is the general suggested policy for tuning these various parameters.
To avoid running out of buffers in the stack, keep the "receiveBufferSizes" of the sockets LOW:
The receiveBufferSize should at least be the size of the largest UDP packet that is being received. If the receiveBufferSize of a socket is small (e.g., exactly equal to the size of one UDP packet), the socket can only hold one pending packet: when a second packet arrives, it will be dropped.
Ideally, the stack should be configured so that the space provided by the buffers/clusters in the stack is larger than the total receiveBufferSizes of all sockets. In this case, the stack itself cannot run out of buffers. This is a sane and good policy; however it is hard to enforce, since the stack is shared by many applications, not just NDDS.
To avoid running out of buffers and to increase performance, increase and tune the mbufs/clusters that the stack:
The total number of mbufs/clusters should be large enough for the load the machine is supposed to handle. As illustrated above, for the case of NDDS reception, the total receiveBufferSize of all sockets is a bare minimum.
In addition, to improve performance, the sizes of the actual UDP messages should be taken into account. When the stack needs to handle many small messages, it is useful to allocate many small clusters; if the stack runs out of clusters, it will start using larger clusters for small packets, which will waste a lot of memory.
When large UDP packets are sent or received, it is useful to make sure there are clusters that are large enough to store the entire UDP packet. Otherwise, the stack will need to find and link several smaller clusters, which will take time.
- The driver needs sufficient buffers to handle incoming traffic. Unfortunately, the details of the behavior depend on the device driver.
Details on Option 1: Tunning the Receive Buffer Sizes
To monitor the queues of the sockets, you can use the inetstatShow
command. It will list all the sockets in the system and the state of the sockets.
-> inetstatShow Active Internet connections (including servers) PCB Proto Recv-Q Send-Q Local Address Foreign Address (state) -------- ----- ------ ------ ------------------ ------------------ ------- ... 807ca248 TCP 0 0 0.0.0.0.513 0.0.0.0.0 LISTEN 807ca5e4 UDP 888 0 0.0.0.0.49155 0.0.0.0.0 807ca560 UDP 0 0 127.0.0.1.1024 127.0.0.1.17185 807ca3d4 UDP 0 0 0.0.0.0.17185 0.0.0.0.0 ... value = 1 = 0x1 -> inetstatShow Active Internet connections (including servers) PCB Proto Recv-Q Send-Q Local Address Foreign Address (state) -------- ----- ------ ------ ------------------ ------------------ ------- ... 807ca248 TCP 0 0 0.0.0.0.513 0.0.0.0.0 LISTEN 807ca5e4 UDP 936 0 0.0.0.0.49155 0.0.0.0.0 807ca560 UDP 0 0 127.0.0.1.1024 127.0.0.1.17185 807ca3d4 UDP 0 0 0.0.0.0.17185 0.0.0.0.0 ...
You can see the number of packets in the receive queue increasing.This would happen, for example, if the receive-thread of NDDS is blocked, because the user takes too much time on the onIssueReceives
call-back of an immediate subsription.
The size of the receiving queue can be configured in NDDS. Please see section 10.4.4 dgram: Datagram Properties in the NDDS 3.0 User's Manual for instructions on how to do this. For reference: the default receive queue size of a UDP socket in VxWorks is 41k.
Details on Option 2: Tunning the Buffers in the Stack
To monitor the state of the buffer usage inside the stack, you can use " netStackDataPoolShow
" or " mbufShow
"; they give the same information.
If the stack itself has been running out of buffers, this will show up as an increasing "number of times failed to find space." If this call indicates that the stack sometimes runs out of buffers, you either have to reduce the socket receive queues (see above) or --better-- to increase the amount of buffering in the stack. As mentioned above, ideally, the kernel should be configured with clusters of appropriate size, to improve performance.
Example 1: How to create 4096 byte mbufs in the kernel when using a kernel with the Tornado II project facility.
1. In $WIND_BASE/target/h/netBufLib.h, add these lines after the similar lines for NUM_2048
:
#ifndef NUM_4096 #define NUM_4096 25 /* no. 4096 byte clusters */ #endif /* NUM_4096 */
Modify the next lines so that you add in the NUM_4096 cluster:
#ifndef NUM_CL_BLKS #define NUM_CL_BLKS (NUM_64 + NUM_128 + NUM_256 + \ NUM_512 + NUM_1024 + NUM_2048 + NUM_4096) #endif /* NUM_CL_BLKS */
2. In target/config/comps/vxWorks/00network.cdf, add the lines:
Parameter NUM_4096 { NAME Number of 4096 byte clusters for user data TYPE uint DEFAULT 25 }
and modify the lines:
Parameter NUM_CL_BLKS { NAME Size of network memory pool for user data SYNOPSIS Total of all cluster sizes for shared user data TYPE uint DEFAULT NUM_64 + NUM_128 + NUM_256 + \ NUM_512 + NUM_1024 + NUM_2048 + NUM_4096 }
and
Component INCLUDE_NET_SETUP { NAME network buffer initialization SYNOPSIS network buffer creation and device support CFG_PARAMS NUM_NET_MBLKS NUM_CL_BLKS \ NUM_64 NUM_128 NUM_256 NUM_512 NUM_1024 NUM_2048 \ NUM_SYS_MBLKS NUM_SYS_CL_BLKS NUM_4096\ NUM_SYS_64 NUM_SYS_128 NUM_SYS_256 NUM_SYS_512 \ IP_MAX_UNITS CONFIGLETTES net/usrNetLib.c HDR_FILES netBufLib.h ipProto.h }
3. In target/config/comps/src/net/usrNetLib.c modify the clDescTbl to:
CL_DESC clDescTbl [] = { /* clusterSize num memArea memSize ----------- ---- ------- ------- */ {64, NUM_64, NULL, 0}, {128, NUM_128, NULL, 0}, {256, NUM_256, NULL, 0}, {512, NUM_512, NULL, 0}, {1024, NUM_1024, NULL, 0}, {2048, NUM_2048, NULL, 0}, {4096, NUM_4096, NULL, 0} };
4. In the Workspace browser, go to the parameters menu for network components/basic network initialization/network buffer initialization. Select NUM_CL_BLKS
and modify its value by adding " + NUM_4096)
" to the end.
5. Rebuild your kernel from the Workspace browser
6. Confirm that you have added the new memory by running mbufShow from a WindShell. There should be a line showing the availability of 4096 byte mbufs, (mbufShow is only available if you have included network show routines.)
Example 2: How to create a 4096 byte mbuf using the bsp makefile facility.
1. In target/src/config/usrNetwork.c modify the clDescTbl to:
CL_DESC clDescTbl [] = { /* clusterSize num memArea memSize ----------- ---- ------- ------- */ {64, NUM_64, NULL, 0}, {128, NUM_128, NULL, 0}, {256, NUM_256, NULL, 0}, {512, NUM_512, NULL, 0}, {1024, NUM_1024, NULL, 0}, {2048, NUM_2048, NULL, 0}, {4096, NUM_4096, NULL, 0} };
2. In target/proj/prjParams.h:
#define NUM_2048
#undef NUM_4096 #define NUM_4096 25
#define NUM_CL_BLKS (NUM_64 + NUM_128 + NUM_256 + NUM_512 + NUM_1024 + NUM_2048+ NUM_4096)
3. Rebuild the kernel and run the tests in step 6 above.
Details on Option 3: The Buffers Inside the Driver
/*-----------------------START CODE------------------------*/ #include "vxWorks.h" #include "netBufLib.h" #include "end.h" #include "muxLib.h" #include "netShow.h" #include "stdio.h" extern void netPoolShow (NET_POOL_ID pNetPool); void endPoolShow ( char * devName, int unit ) { END_OBJ * pEnd; if ((pEnd = endFindByName (devName, unit)) != NULL) netPoolShow (pEnd->pNetPool); else printf ("Could not find device \n", devName); return; } /*-----------------------END CODE------------------------*/
> ld < myPoolShow.o value = 8380688 = 0x7fe110 = endPoolShow + 0x480
> sp (endPoolShow, "dc", 0) task spawned: id = 0x55a180, name = t1 value = 5611904 = 0x55a180
This will show the state of the clusters within the device driver.
If and how the buffering inside the driver can be modified is driver-dependent.