publishing moderately large (34KB) data from VxWorks DKM

14 posts / 0 new
Last post
Offline
Last seen: 11 years 6 months ago
Joined: 03/22/2013
Posts: 7
publishing moderately large (34KB) data from VxWorks DKM

We have a system consisting of a C based RTI DDS program running as a VxWorks 6.7 DKM which subscribes to and receives up to 2MB data packets and, in turn, generates approximately 34KB of data and publishes it back to the original data source.  For our current testing, the original data source comes from a Windows RTI DDS application (Java based running in Eclipse) while the target system is running on a Power PC target board.  They are connected via a 10/100 switch.

The problem is that the data being published by the VxWorks target is only sporadically being received by the Windows application, and it can take up to 20 seconds to be received.

In order to simplify things as much as possible, I have created a very simple system using rtiddsgen which just publishes anywhere from 5KB up to 64KB data packets from a VxWorks DKM (in C) to a Windows subscriber (currently C++ in Visual 2010).  The only modification to the generated code is to embed a sequence number in the data for tracking purposes.  I run into the exact same problem there (to accomodate the 64KB test, I use ASYNCHRONOUS_PUBLISH_MODE_QOS).  When publishing 5KB data packets, the Windows application receives every single sample (tested up to 10Hz).  When I try 30KB data packets, I only receive them sporadically - if at all.

On a related note, if I remove the switch and do a direct connect from the VxWorks target board to the PC via a crossover cable, the delay in receiving the samples that do make it to the Windows application goes down to about 5 seconds.  I have tried 3 different switches from 3 different manufacturers (NetGear, D-Link, and Zonet).  On the Windows 7 PC, I am using a USB-to-Ethernet dongle with a static IP address.  If instead I use the built-in NIC for the PC (changed from DHCP to static IP), the delay goes to under 1 second (also using crossover cable).  But, in all cases, reception of the data is unreliable when usng 30 KB samples.

The current QOS XML file (used by both the Windows sub and the VxWorks pub) is included.  I have tried RELIABLE and KEEP_ALL.

 

Do you spot something we're doing wrong or have any suggestions?  Thanks 

 

code 

<?xml version="1.0"?>
<dds xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
     xsi:noNamespaceSchemaLocation="C:/RTIx32/ndds.5.0.0/scripts/../resource/rtiddsgen/../qos_profiles_5.0.0/schema/rti_dds_qos_profiles.xsd"
     version="5.0.0">
    <qos_library name="InputOutput_Library">
        <qos_profile name="InputOutput_Profile" is_default_qos="true">
   <!--
   <participant_qos>
    <property>
     <value>
      <element>
       <name>rti.monitor.library</name>
       <value>rtimonitoring</value>
      </element>
      <element>
       <name>rti.monitor.create_function_ptr</name>
       <value>$(MONITORFUNC)</value>
      </element>
     </value>
    </property>
   </participant_qos>
   -->

   <datawriter_qos>
    <reliability>
     <kind>BEST_EFFORT_RELIABILITY_QOS</kind>
    </reliability>
    <history>
     <kind>KEEP_LAST_HISTORY_QOS</kind>
     <depth>1</depth>
    </history>
    <durability>
     <kind>TRANSIENT_LOCAL_DURABILITY_QOS</kind>
    </durability>
    <publish_mode>
     <kind>ASYNCHRONOUS_PUBLISH_MODE_QOS</kind>
    </publish_mode>
   </datawriter_qos>

            <datareader_qos>
    <reliability>
     <kind>BEST_EFFORT_RELIABILITY_QOS</kind>
    </reliability>
    <history>
     <kind>KEEP_LAST_HISTORY_QOS</kind>
     <depth>1</depth>
    </history>
    <durability>
     <kind>TRANSIENT_LOCAL_DURABILITY_QOS</kind>
    </durability>
   </datareader_qos>

            <participant_qos>
                <participant_name>
                    <name>InputOutput_HW_Test</name>
                </participant_name>
            </participant_qos>
        </qos_profile>

    </qos_library>
</dds>

Gerardo Pardo's picture
Offline
Last seen: 3 weeks 3 days ago
Joined: 06/02/2010
Posts: 602

Hi,

I think this may be happening because the default out-of-the-box settings are configured for a maximum message size of about 9KB. To send data larget that that you need to either chnage the settings, or configure it to use ASYNCRONOUS_PUBLSHER so thet DDS can fragment and re-assemble the fragments for you.  These settings are chosen such that they work an all Operating Systems. Amazingly there are still a few RT Operating Systems out there that are not able to send UDP datagrams larger than 9KB...

Windows and VxWorks can be configured to send UDP datagrams up to the UDP maximum which is 64KB. Below I will give some pointers on how to do it. But first we should make sure this is really what is happening.

If you are hitting the 9KB limit,  then you should be eble to reproduce the error bewteen two Widows computers.  I am attaching a simple program that will test it.  Unzip the attached files into a directory and run the command:

rtiddsgen -example  TestType.idl

to generate the makefiles. Ignore the warnings that say: 

File ... already exists and will not be replaced with updated content. If you would like to get a new ...

These occur because rtiddsgen notices that you already have the TestType_publisher.cxx and TestType_subscriber.cxx and it therefore does not override them, which is what we want.

Once your have your makefiles or Windows projects, build and you should get a  TestType_publisher and  TestType_subscriber executables.

If you run these executables with the out-of-the-box settings between two windows computers (or VxWorks to Windows). Make sure you run the executables with yoru current working directory set to the directory that contains the USER_QOS_PROFILES.xml that I included in the ZIP file. That way the XMP file will be read and used to configire the QoS.

then you should see output similar to this:

Publisher output:

./objs/x64Darwin10gcc4.2.1/TestType_publisher 
Writing TestType, count = 0, payload_size= 0
Writing TestType, count = 1, payload_size= 1000
Writing TestType, count = 2, payload_size= 2000
Writing TestType, count = 3, payload_size= 3000
Writing TestType, count = 4, payload_size= 4000
Writing TestType, count = 5, payload_size= 5000
Writing TestType, count = 6, payload_size= 6000
Writing TestType, count = 7, payload_size= 7000
Writing TestType, count = 8, payload_size= 8000
Writing TestType, count = 9, payload_size= 9000
COMMENDSrWriterService_write:!write. Reliable large data requires asynchronous writer.
PRESPsWriter_writeInternal:!srw->write
write error 1
Writing TestType, count = 10, payload_size= 10000
COMMENDSrWriterService_write:!write. Reliable large data requires asynchronous writer.
PRESPsWriter_writeInternal:!srw->write
write error 1
Writing TestType, count = 11, payload_size= 11000
COMMENDSrWriterService_write:!write. Reliable large data requires asynchronous writer.

Subscriber output:

./objs/x64Darwin10gcc4.2.1/TestType_subscriber 
Waiting for data...
Received data:  count = 1, payload_size = 1000
Received data:  count = 2, payload_size = 2000
Received data:  count = 3, payload_size = 3000
Received data:  count = 4, payload_size = 4000
Received data:  count = 5, payload_size = 5000
Received data:  count = 6, payload_size = 6000
Received data:  count = 7, payload_size = 7000
Received data:  count = 8, payload_size = 8000

As you can see the Publisher gets errors whenever it tries to send data which is bigger than 8KB and the subscriber stops receiving the data.

To configure the UDP transport to be able to send data larger than 9KB you can follow the instructions in this thread: http://community.rti.com/content/forum-topic/transport-file-size-message

I prepared the USER_QOS_PROFILE.xml file that is in the zip file to also include these QoS settigs. They are in the profile called TestType_LargeUDP. There is also a profile there called TestType_LargeDataWithFlowController but you only need this if you need to send data larger than 64KB.

If you you can reproduce the above errors in your system, then you are running into the default UDP transport size configuration I mentioned. In that case edit the USER_QOS_PROFILES.xml, serach for the line that says:

       <qos_profile is_default_qos="false" name="TestType_LargeUDP"  base_name="TestType_Profile">

And edit that line replacing the "false" with a "true"

        <qos_profile is_default_qos="true" name="TestType_LargeUDP"  base_name="TestType_Profile">

After doing this, run the programs again and you should no longer see the errors when you go over the 8KB.

./objs/x64Darwin10gcc4.2.1/TestType_publisher 
Writing TestType, count = 0, payload_size= 0
Writing TestType, count = 1, payload_size= 1000
Writing TestType, count = 2, payload_size= 2000
Writing TestType, count = 3, payload_size= 3000
Writing TestType, count = 4, payload_size= 4000
Writing TestType, count = 5, payload_size= 5000
Writing TestType, count = 6, payload_size= 6000
Writing TestType, count = 7, payload_size= 7000
Writing TestType, count = 8, payload_size= 8000
Writing TestType, count = 9, payload_size= 9000
Writing TestType, count = 10, payload_size= 10000
Writing TestType, count = 11, payload_size= 11000

 

./objs/x64Darwin10gcc4.2.1/TestType_subscriber 
Waiting for data...
Received data:  count = 1, payload_size = 1000
Received data:  count = 2, payload_size = 2000
Received data:  count = 3, payload_size = 3000
Received data:  count = 4, payload_size = 4000
Received data:  count = 5, payload_size = 5000
Received data:  count = 6, payload_size = 6000
Received data:  count = 7, payload_size = 7000
Received data:  count = 8, payload_size = 8000
Received data:  count = 9, payload_size = 9000
Received data:  count = 10, payload_size = 10000
Received data:  count = 11, payload_size = 11000

Gerardo 

 

File Attachments: 
Offline
Last seen: 11 years 6 months ago
Joined: 03/22/2013
Posts: 7

Hi Gerardo,

Thanks for the test code and procedure.  Here is what happens when I run the VX publisher and PC subscriber (no changes to your initial files).

On the pub side (VX):

-> taskSpawn("TT_pub", 10, 0x8, 100000, publisher_main)
value = 68514192 = 0x4157190
-> Writing TestType, count = 0, payload_size= 0
Writing TestType, count = 1, payload_size= 1000
Writing TestType, count = 2, payload_size= 2000
Writing TestType, count = 3, payload_size= 3000
Writing TestType, count = 4, payload_size= 4000
Writing TestType, count = 5, payload_size= 5000
Writing TestType, count = 6, payload_size= 6000 (at this point, no more activity takes place and the task is in the "PEND" state)

On the sub side (PC):

Waiting for data...
Received data:  count = 1, payload_size = 1000
Received data:  count = 2, payload_size = 2000
Received data:  count = 3, payload_size = 3000
Received data:  count = 4, payload_size = 4000
Received data:  count = 5, payload_size = 5000 (and no more data is ever seen).

 

 So, I next modified the XML file to enable "TestType_LargeUDP".  Now, on the pub side (VX), it keeps on going all the way up to 59,000.

But, on the sub side (PC), it gets up to 9000 successfully and then throws errors:

    NOTE: I could only enable "TestType_LargeUDP" on the pub side in the XML file.  If I set default to "true" on the sub side (PC), then the code spewed:

    NDDS_Transport_UDPv4_receive_rEA:!precondition: buffer_in->length < (self)->property->message_size_max

Waiting for data...
Received data:  count = 1, payload_size = 1000
Received data:  count = 2, payload_size = 2000
Received data:  count = 3, payload_size = 3000
Received data:  count = 4, payload_size = 4000
Received data:  count = 5, payload_size = 5000
Received data:  count = 6, payload_size = 6000
Received data:  count = 7, payload_size = 7000
Received data:  count = 8, payload_size = 8000
Received data:  count = 9, payload_size = 9000
NDDS_Transport_UDPv4_receive_rEA:OS recvfrom() failure, error 0
NDDS_Transport_UDPv4_receive_rEA:OS recvfrom() failure, error 0
NDDS_Transport_UDPv4_receive_rEA:OS recvfrom() failure, error 0
NDDS_Transport_UDPv4_receive_rEA:OS recvfrom() failure, error 0

 

Enabling the flow controller (TestType_LargeDataWithFlowController) had no impact as expected since the test is not exceeding 60KB.

 

Regards,

Mark

Gerardo Pardo's picture
Offline
Last seen: 3 weeks 3 days ago
Joined: 06/02/2010
Posts: 602

Hello Mark,

Sorry. It is my mistake. There is one more DomainParticipant QoS setting you need to adjust that I forgot to mention. Somehow the defaults in my system were such that I did not see it when I tested the QoS profile I sent you.

The reason you are getting that error is because the buffer that the DomainParticipant uses to receive the data from the UDP transport is too small.  This is different from the transport setting and it needs to be adjusted to at least be able to fit the largest packet that can be received accross all the enabled transports. That is. 65KB in your case.

This can be done adding the following XML inside your <participant_qos> section:

<participant_qos>
    <!-- other QoS policies ...  --> 
 
    <receiver_pool>
        <buffer_size>65530</buffer_size>
    </receiver_pool> 
 
    <!-- other Qos policies ...  -->
</participant_qos>   

I have adjusted the USER_QOS_PROFILE.xml inside the ZIP file attached to my first answer to also do this for the TestType_LargeUDPprofile.  Give it a try and see if it gets rid of the error.

Gerardo

Offline
Last seen: 11 years 6 months ago
Joined: 03/22/2013
Posts: 7

Hi Gerardo,

Thanks.  Adding the receiver_pool / buffer_size setting did the trick for sending data from the VX target back to the PC (and did not break sending data in the other direction which had been having no problem sending > 1MB from PC to the VX target).

Regards,

Mark

Offline
Last seen: 11 years 6 months ago
Joined: 03/22/2013
Posts: 7

Hi Gerardo,

Once again, thanks for the XML logic to enable sending larger chunks of data from my VxWorks target to a PC.

However, I recently noticed that the RTI Analyzer was no longer showing the Domain Participant information (ie. pubs/writers/subs/readers).  I backtracked and discovered that this started happening only once I included the changes that you had suggested.  In particular, it is only if I specify a value for "dds.transport.UDPv4.builtin.parent.message_size_max" that the RTI Analyzer stops showing the participant information.

I tried changing the value for this along with send_socket_buffer_size and recv_socket_buffer_size, but that had no effect.  Current values are as in the sample solution that you originally provided:

<participant_qos>
                 <receiver_pool>
                   <buffer_size>65530</buffer_size>
                </receiver_pool>
                <property>
                  <value>
                    <!--UDP/IP  Transport configuration  -->
                    <element>
                            <name>dds.transport.UDPv4.builtin.parent.message_size_max </name>                               
                            <value>65536</value>
                    </element>
                    <element>
                              <name>dds.transport.UDPv4.builtin.send_socket_buffer_size </name>
                              <value>1000000</value>
                     </element>
                     <element>
                            <name>dds.transport.UDPv4.builtin.recv_socket_buffer_size </name>
                            <value>2000000</value>
                    </element> 
                  </value>
                </property>
            </participant_qos>

Do you have any idea why RTI Analyzer would respond to this setting in this fashion?

Thanks,

Mark

Fernando Garcia's picture
Offline
Last seen: 4 months 1 week ago
Joined: 05/18/2011
Posts: 200

HI Mark,

You may also need to set Analyzer's QoS settings accordingly—I think that's why it is not discovering those entities.

To change Analyzer's QoS settings, stop your Spy Agent, and click on "Configure". This will pop up a dialog, like the one in the picture, where you can configure for each domain peer locators, discovery settings, transports, and other properties. You can also use your own XML QoS profiles clicking on "Configure using XML QoS Profile".

How to change RTI Analyzer's QoS settings 

Once you are done configuring Analyzer's QoS settings, start your Spy Agent again.

Please, let me know if this fixes your issue.

Thanks,
Fernando Garcia

Offline
Last seen: 11 years 6 months ago
Joined: 03/22/2013
Posts: 7

Hi Fernando,

Yes, that fixed the issue.

While our current configuration shares these settings across multiple domain participants (on three unique hw platforms), I am curious as to how the situation would have been handled if the communications from platform A to platform B used one configuration of these settings (e.g. the defaults) while the communications from platform A to platform C used yet another configuration of these settings (e.g. the settings described in this thread).  How would the Analyzer be set up in that situation?  I don't forsee that happening, but...

Thanks,

Mark

Gerardo Pardo's picture
Offline
Last seen: 3 weeks 3 days ago
Joined: 06/02/2010
Posts: 602

It should not be a problem. The settings do not have to be identical on all computers. In order to enable to receive you just need the  dds.transport.UDPv4.builtin.parent.message_size_max and the  dds.transport.UDPv4.builtin.recv_socket_buffer_size to be large enough to handle the largest UDP message that you will receive.

Setting dds.transport.UDPv4.builtin.recv_socket_buffer_size to values larger than 64KB does not affect the ability to receive data correctly. Only the performance in situations where messages are arriving in bursts faster than the application can pull them from the socket. In this situation having a larger receive buffer allows the socket to buffer multiple UDP datagrams without dropping them and forcing a resend by the reliability protocol. 

Given that UDP itself limits the maximum datagram to be 64KB (65536 bytes). The settings I recommended earlier should work for Analyzer independently of how the other computers are set.

The out-of-the-box (OOB) setting of 9KB was selected because some older real-time operating system platforms do not support larger buffer sizes and we wanted to avoid OOB issues with these platforms. In hindsight this was probably not a good tradeoff. In fact the next release of our product will default to a larger receive buffer.

Gerardo

 

 

JoãoMSM's picture
Offline
Last seen: 10 years 5 months ago
Joined: 05/01/2014
Posts: 8

Hi,

I was trying to send a payload size of 2073600 with the provided example (transport_size_test.zip) and I'm getting the following outputs

On the pub side (linux):

DDS_OctetSeq_set_length:available space 60000 < 2073600
Writing TestType, count = 0, payload_size= 2073600
DDS_OctetSeq_set_length:available space 60000 < 2073600
Writing TestType, count = 1, payload_size= 2073600

On the sub side (PC):

Waiting for data...
Received data:  count = 1, payload_size = 0

I checked if the correct USER_QOS_PROFILES.xml was in the directory that contains the executable files.

Also I was trying to find the files/project of the large data use case  (http://community.rti.com/rti-doc/500/ndds/doc/html/api_java/group__LargeDataExampleModule.html) to help to mitigate the issue.

Does anyone can provide me some clues regarding this issue?

rip
rip's picture
Offline
Last seen: 1 day 3 hours ago
Joined: 04/06/2012
Posts: 324
const long MAX_BYTES=60000;
struct TestType {
    long count;
    sequence<octet, MAX_BYTES> payload;
};

Your payload does not agree with the size of the payload provided by the IDL.

Also the value (2MB) is going to be larger than the underlying transport (UDP) can support, so you will need to enable asynchronous writes -- the default code as supplied in that zip file uses the default QoS, which isn't the large data type QoS.  You'll need to replace the DDS_DATAWRITER_QOS_DEFAULT (DDS_DATAREADER_QOS_DEFAULT) with the correct two parameter qos_library, qos_profile names:

    /* To customize data writer QoS, use 
       the configuration file USER_QOS_PROFILES.xml */
    writer = publisher->create_datawriter_with_profile(
        topic, "TestType_Library", "TestType_LargeUDP", NULL /* listener */,
        DDS_STATUS_MASK_NONE);
    if (writer == NULL) {
        printf("create_datawriter error\n");
        publisher_shutdown(participant);
        return -1;
    }

(and similar for the _subscriber.cxx file)

 

JoãoMSM's picture
Offline
Last seen: 10 years 5 months ago
Joined: 05/01/2014
Posts: 8

Hi rip,

thanks for the tip, it helped a lot.

Do you mean using the publisher->create_datawriter_with_profile (http://community.rti.com/rti-doc/500/ndds/doc/html/api_cpp/classDDSPublisher.html#ab7898867e7255a6958eedde428507895)? It worked with this method.

Using the TestType_LargeUDP profile I am having the "COMMENDSrWriterService_write:!write." issue and when I change it to the TestType_LargeDataWithFlowController profile it works but I'm getting the following outputs:

On the pub side (linux):
Writing TestType, count = 0, payload_size= 2073600
Writing TestType, count = 1, payload_size= 2073600
Writing TestType, count = 2, payload_size= 2073600
Writing TestType, count = 3, payload_size= 2073600
Writing TestType, count = 4, payload_size= 2073600
Writing TestType, count = 5, payload_size= 2073600
Writing TestType, count = 6, payload_size= 2073600
Writing TestType, count = 7, payload_size= 2073600
Writing TestType, count = 8, payload_size= 2073600
Writing TestType, count = 9, payload_size= 2073600


On the sub side (PC):
Waiting for data...


Something missing?

Thanks in advance.

rip
rip's picture
Offline
Last seen: 1 day 3 hours ago
Joined: 04/06/2012
Posts: 324

Ah, good catch.  I've updated the original post to add the _with_profile.

If you are not getting communication between the reader and writer, it may be some other problem unrelated to DDS (you haven't said whether you have normal comms, without the high-octet count).

Does rtiddsping work between the publisher and subscriber nodes? If not, this is a problem related to your network topology or configuration (firewalls, switches not propagating multicast, etc). 

Does rtiddsspy show the D lines to indicate that it is receiving data (I'd not recommend -printSample, since your samples are so large).

Regards,

rip

JoãoMSM's picture
Offline
Last seen: 10 years 5 months ago
Joined: 05/01/2014
Posts: 8

Hi all,

First of all sorry the late reply.

I have never used the rtiddsping but with the little search I found  rtiddsping -subscriber and  rtiddsping -publisher that might be usefull for my situation. Using them I could verify the data is being correctly transfered. Am I correct?

joao@joao-pc:~/Downloads/transport_size_test$ rtiddsping -subscriber

RTI Data Distribution Service Ping built with DDS version: 5.1.0 (Core: 1.7a.00, C: 1.7a.00, C++: 1.7a.00)
Copyright 2012 Real-Time Innovations, Inc.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
NddsPing is listening for data, press CTRL+C to stop it.
Found 1 additional ping publishers(s).
Current publisher tally is: 1
Found 1 additional alive ping publishers(s).
Current alive publisher tally is: 1
NddsPing, issue received: 0000002
Detected Missed Sample(s) current: 2 cumulative: 2  (66.66)%
NddsPing, issue received: 0000003
NddsPing, issue received: 0000004

 joao@joao-pc:~/Downloads/transport_size_test$ rtiddsping -publisher

RTI Data Distribution Service Ping built with DDS version: 5.1.0 (Core: 1.7a.00, C: 1.7a.00, C++: 1.7a.00)
Copyright 2012 Real-Time Innovations, Inc.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Sending data...   value: 0000000
Found 1 additional ping subscriber(s).
Current subscriber tally is: 1
Sending data...   value: 0000001
Found 1 additional ping subscriber(s).
Current subscriber tally is: 2
Sending data...   value: 0000002
Sending data...   value: 0000003
Sending data...   value: 0000004

Using the the rtiddsspy I am having the following output:

joao@joao-pc:~$ rtiddsspy -domainId 0

RTI Data Distribution Service Spy built with DDS version: 5.1.0 (Core: 1.7a.00, C: 1.7a.00, C++: 1.7a.00)
Copyright 2012 Real-Time Innovations, Inc.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
NddsSpy is listening for data, press CTRL+C to stop it.

source_timestamp   Info  Src HostId  topic               type              
-----------------  ----  ----------  ------------------  ------------------  
1399983110.312574  R +N  C0A80172    Example hpf         hpf               
1399988989.104941  R +N  C0A80172    Example TestType    TestType          
1399988990.894492  W +N  C0A80172    Example TestType    TestType          
1399988991.895819  d +N  C0A80172    Example TestType    TestType          
1399988992.896139  d +M  C0A80172    Example TestType    TestType          
1399988993.897087  d +M  C0A80172    Example TestType    TestType          
DDS_TIME_INVALID   W ?M  C0A80172    Example TestType    TestType          
DDS_TIME_INVALID   d ?M  C0A80172    Example TestType    TestType 

Is everything working properly?
I have attached the source code, if you could try both processes in your machine I would appreciate it.
Thanks in advance.

File Attachments: