How do I tune the network buffering in VxWorks 5.4?

Note: Applies to NDDS 3.0 and VxWorks 5.4.

Introduction

The VxWorks TCP/IP stack uses a fixed amount of buffers. The buffers and their sizes are configured in the kernel. This is different from non real-time OSs, where the buffers can be taken as needed from main memory.

In many NDDS applications on VxWorks, you have to modify the default values of this configuration to achieve optimal performance and to avoid running out of buffers. In certain cases, running out of buffers is a fatal event, which can stop the stack from sending or receiving.

Symptons

The following are some of the symptoms that indicate that the amount and/or sizing of the stack buffers is not configured correctly.

  • Packet loss:
    When a TCP/IP stack runs out of buffers, it will silently drop packets. You will notice that a best-effort subscription will start missing issues.
  • Poor performance of reliable protocols:
    When packet loss occurs, the RTPS reliable protocols will try to regain reliability by resending packets; this will cause increases in latency and decreased throughput.
  • Error messages indicating NDDS is unable to send:
  • A typical error message is  NDDS Alarm: Unable to serialize fired message which indicates that NDDS is unable to use the stack to send a local UDP-message.  This often indicates that the stack is out of buffers.
  • The stack stops working completely:
This is the most drastic symptom: all communication going through the TCP/IP stack stops and the target seems dead. To see whether the problem is related to the stack, try to connect to the target over a communication line that does not use the TCP/IP stack (such as a serial connection to a target shell).  In this error situation, the target will behave normally (there are usually no fatal error messages from the side of the OS) but the stack is locked up (does not send nor receive).
 

If your problem is related to buffer availability, these symptoms will go away if you send less data: either fewer packets (by decreasing the send-rate of the publications or increase the minimum separation of the best-effort subscriptions) or packets with less data (by decreasing the user-data size).

Some Background

Before we can go in detail, here is some background on the internals of buffer management in the VxWorks TCP/IP stack.

There are three important aspects:

  1. The mbufs and clusters used in the stack
  2. The amount of data that can be held by a socket
  3. The buffering inside the driver

The mbufs and clusters

VxWorks uses "mbufs" with associated "clusters" whenever data needs to be stored or passed around in the stack. The clusters come in pools of fixed size clusters; both the number of pools, the size of the clusters in the pool and the number of clusters in the pools is configurable.

VxWorks has two separate groups of mbufs/clusters. One is called the "network stack system pool." This pool is used to store all the management information, such as the state of sockets and is usually not the cause of problems. The more important pool is called the "network stack data pool" -  the mbufs and clusters from this pool contain the actual data that moves through the stack. 

The data held by the socket

Each UDP socket can queue up to a certain amount of data. These queues will be used if the thread servicing the socket (the NDDS receive thread) cannot keep up with the incoming flow of information. This queue is entirely made up of mbufs/clusters. Therefore each receiving UDP socket can use up mbufs/clusters. The socket has a "receiveBufferSize" which limits the amount of incoming packets that can be queued with the socket 

Buffering inside the device drivers

The device driver itself will also need internal buffers that are used to hold, for example, an incoming raw Ethernet package. For improved efficiency, the so-called END-drivers also use the same mbuf structure to store this data.

General Approach

Before going into details, here is the general suggested policy for tuning these various parameters.

  1. To avoid running out of buffers in the stack, keep the "receiveBufferSizes" of the sockets LOW:

    The receiveBufferSize should at least be the size of the largest UDP packet that is being received. If the receiveBufferSize of a socket is small (e.g., exactly equal to the size of one UDP packet), the socket can only hold one pending packet: when a second packet arrives, it will be dropped.

    Ideally, the stack should be configured so that the space provided by the buffers/clusters in the stack is larger than the total receiveBufferSizes of all sockets. In this case, the stack itself cannot run out of buffers. This is a sane and good policy; however it is hard to enforce, since the stack is shared by many applications, not just NDDS.

  2. To avoid running out of buffers and to increase performance, increase and tune the mbufs/clusters that the stack:

    The total number of mbufs/clusters should be large enough for the load the machine is supposed to handle. As illustrated above, for the case of NDDS reception, the total receiveBufferSize of all sockets is a bare minimum.

    In addition, to improve performance, the sizes of the actual UDP messages should be taken into account. When the stack needs to handle many small messages, it is useful to allocate many small clusters; if the stack runs out of clusters, it will start using larger clusters for small packets, which will waste a lot of memory.

    When large UDP packets are sent or received, it is useful to make sure there are clusters that are large enough to store the entire UDP packet. Otherwise, the stack will need to find and link several smaller clusters, which will take time.

  3. The driver needs sufficient buffers to handle incoming traffic. Unfortunately, the details of the behavior depend on the device driver.

Details on Option 1: Tunning the Receive Buffer Sizes 

To monitor the queues of the sockets, you can use the inetstatShow command. It will list all the sockets in the system and the state of the sockets.

The following illustrates what happens if a task does not service a receiving UDP socket:
 
-> inetstatShow
 Active Internet connections (including servers)
 PCB     Proto Recv-Q Send-Q  Local Address      Foreign Address    (state)
-------- ----- ------ ------  ------------------ ------------------ ------- 
...
807ca248 TCP        0      0  0.0.0.0.513        0.0.0.0.0          LISTEN
807ca5e4 UDP      888      0  0.0.0.0.49155      0.0.0.0.0         
807ca560 UDP        0      0  127.0.0.1.1024     127.0.0.1.17185   
807ca3d4 UDP        0      0  0.0.0.0.17185      0.0.0.0.0         
...
value = 1 = 0x1

-> inetstatShow
Active Internet connections (including servers)
PCB      Proto Recv-Q Send-Q  Local Address      Foreign Address    (state)
-------- ----- ------ ------  ------------------ ------------------ -------
...
807ca248 TCP        0      0  0.0.0.0.513        0.0.0.0.0          LISTEN
807ca5e4 UDP      936      0  0.0.0.0.49155      0.0.0.0.0         
807ca560 UDP        0      0  127.0.0.1.1024     127.0.0.1.17185   
807ca3d4 UDP        0      0  0.0.0.0.17185      0.0.0.0.0         
...           

You can see the number of packets in the receive queue increasing.This would happen, for example, if the receive-thread of NDDS is blocked, because the user takes too much time on the onIssueReceives call-back of an immediate subsription.

The size of the receiving queue can be configured in NDDS.  Please see section 10.4.4 dgram: Datagram Properties in the NDDS 3.0 User's Manual for instructions on how to do this. For reference: the default receive queue size of a UDP socket in VxWorks is 41k.

Details on Option 2: Tunning the Buffers in the Stack

To monitor the state of the buffer usage inside the stack, you can use " netStackDataPoolShow" or " mbufShow"; they give the same information.

If the stack itself has been running out of buffers, this will show up as an increasing "number of times failed to find space." If this call indicates that the stack sometimes runs out of buffers, you either have to reduce the socket receive queues (see above) or --better-- to increase the amount of buffering in the stack. As mentioned above, ideally, the kernel should be configured with clusters of appropriate size, to improve performance.

The following describes how to configure the VxWorks stack for the case where you are using the project facility (example 1) or the makefile approach (example 2).  The VxWorks manuals contain additional information.
 

Example 1: How to create 4096 byte mbufs in the kernel when using a kernel with the Tornado II project facility.

1. In $WIND_BASE/target/h/netBufLib.h, add these lines after the similar lines for NUM_2048:

#ifndef NUM_4096
#define NUM_4096 25 /* no. 4096 byte clusters */ 
#endif  /* NUM_4096 */  

Modify the next lines so that you add in the NUM_4096 cluster: 

#ifndef NUM_CL_BLKS
#define NUM_CL_BLKS (NUM_64 + NUM_128 + NUM_256 + \
NUM_512 + NUM_1024 + NUM_2048 + NUM_4096)
#endif  /* NUM_CL_BLKS */   

2.  In target/config/comps/vxWorks/00network.cdf,  add the lines:

Parameter NUM_4096
{
   NAME Number of 4096 byte clusters for user data
   TYPE uint
   DEFAULT 25
}

and modify the lines:

Parameter NUM_CL_BLKS
{

    NAME Size of network memory pool for user data
    SYNOPSIS Total of all cluster sizes for shared user data
    TYPE uint
    DEFAULT NUM_64 + NUM_128 + NUM_256 + \
            NUM_512 + NUM_1024 + NUM_2048 + NUM_4096
}        

    and 

Component INCLUDE_NET_SETUP
{
   NAME network buffer initialization
   SYNOPSIS network buffer creation and device support
   CFG_PARAMS NUM_NET_MBLKS NUM_CL_BLKS \
                        NUM_64 NUM_128 NUM_256 NUM_512 NUM_1024 NUM_2048 \
                        NUM_SYS_MBLKS NUM_SYS_CL_BLKS NUM_4096\
                        NUM_SYS_64 NUM_SYS_128 NUM_SYS_256 NUM_SYS_512 \
   IP_MAX_UNITS
   CONFIGLETTES net/usrNetLib.c
   HDR_FILES netBufLib.h ipProto.h
}   

3.  In target/config/comps/src/net/usrNetLib.c modify the clDescTbl to:

CL_DESC clDescTbl [] =
    {
    /* 
    clusterSize num memArea memSize
    ----------- ---- ------- -------
    */
    {64, NUM_64, NULL, 0},
    {128, NUM_128, NULL, 0},
    {256, NUM_256, NULL, 0},
    {512, NUM_512, NULL, 0},
    {1024, NUM_1024, NULL, 0},
    {2048, NUM_2048, NULL, 0},
    {4096, NUM_4096, NULL, 0}
    };              
 

4. In the Workspace browser, go to the parameters menu for network components/basic network initialization/network buffer initialization. Select NUM_CL_BLKS and modify its value by adding " + NUM_4096)" to the end.

5. Rebuild your kernel from the Workspace browser 

6. Confirm that you have added the new memory by running mbufShow from a WindShell. There should be a line showing the availability of 4096 byte mbufs, (mbufShow is only available if you have included network show routines.)

Example 2: How to create a 4096 byte mbuf using the bsp makefile facility.

1.  In target/src/config/usrNetwork.c modify the clDescTbl to:

CL_DESC clDescTbl [] =
    {
    /* 
    clusterSize num memArea memSize
    ----------- ---- ------- -------
    */
    {64, NUM_64, NULL, 0},
    {128, NUM_128, NULL, 0},
    {256, NUM_256, NULL, 0},
    {512, NUM_512, NULL, 0},
    {1024, NUM_1024, NULL, 0},
    {2048, NUM_2048, NULL, 0},
    {4096, NUM_4096, NULL, 0}
    };              

2.  In target/proj/prjParams.h:

(a) Add these lines after  #define NUM_2048
#undef  NUM_4096
#define NUM_4096 25 
(b) Modify the line 
#define NUM_CL_BLKS (NUM_64 + NUM_128 + NUM_256 + NUM_512 + NUM_1024 + NUM_2048+ NUM_4096)

3. Rebuild the kernel and run the tests in step 6 above.

Details on Option 3: The Buffers Inside the Driver

Finally, it is possible that the problem lies in the device driver (the driver for your Ethernet card).
 
To monitor what happens inside an END-driver, it is useful to compile and link the following piece of code with your kernel:
/*-----------------------START CODE------------------------*/
#include "vxWorks.h"
#include "netBufLib.h"
#include "end.h"
#include "muxLib.h"
#include "netShow.h"
#include "stdio.h"

extern void netPoolShow (NET_POOL_ID pNetPool);

void endPoolShow (
char * devName,
int unit
)
{
END_OBJ * pEnd;
if ((pEnd = endFindByName (devName, unit)) != NULL)
   netPoolShow (pEnd->pNetPool);
else
   printf ("Could not find device \n", devName);
return;
}
/*-----------------------END CODE------------------------*/                      
Here is how this can be used:
> ld < myPoolShow.o
value = 8380688 = 0x7fe110 = endPoolShow + 0x480 
Now you can run endPoolShow on the interfaces
> sp (endPoolShow, "dc", 0)
task spawned: id = 0x55a180, name = t1
value = 5611904 = 0x55a180  

This will show the state of the clusters within the device driver.

If and how the buffering inside the driver can be modified is driver-dependent.

Product:
Platform: