Performance of content filter

8 posts / 0 new
Last post
Offline
Last seen: 3 years 5 months ago
Joined: 02/15/2013
Posts: 20
Performance of content filter

Hello there,

I have modified the "ndds.5.0.0\example\CPP\Hello_dynamic" example with a content filter on the subscriber side. The filter expression filters on a key field (I added an unsigned short key field in the dynamic type creation) and compares for equality "(CEP = %0)" where the parameter is "0x180". Everything works as expected.

Without the content filter I get about 30000 samples / second via UDPv4 (explicit configured to use network transport and not shared memory). When the content filtered topic is used for the data reader then I get about 600 samples / second. That is a factor of 50 less!

I was hoping the content filter would be applied at the data writer side and the thruput would stay about the same.

What am I doing wrong here?

Is there any way to check if the filter is applied on the data writer?

I have added the following code to the HelloSubscriber.cpp:

  // Base pointer
  DDSTopicDescription* internal_topic = topic;

  // Set to true to enable content filter
  if (true) {

    // Sequence of parameters for the content filter expression
    DDS_StringSeq parameters(1);
    const char* param_list[] = { "0x0180" };
    parameters.from_array(param_list, 1);

    // Create the content filtered topic
    // The Content Filter Expresion has one parameter:
    // - %0 -- CEP must be equal to %0.
    DDSContentFilteredTopic *cft = participant->create_contentfilteredtopic("ContentFilteredTopic", topic, "(CEP = %0)", parameters);
    if (cft == NULL) {
      cerr << "! Unable to create content filtered topic" << endl;
      return false;
    }

    // Replace topic with content filtered one
    internal_topic = cft;
  }

  HelloListener listener(verbose);
  DDSDataReader* dataReader = participant->create_datareader(internal_topic, DDS_DATAREADER_QOS_DEFAULT, &listener, DDS_STATUS_MASK_ALL);

The reduction in performance makes content filtering unusable for me.

Regards

Josef

Organization:
Offline
Last seen: 11 years 3 months ago
Joined: 07/12/2013
Posts: 5

Hi Josef,

The section 5.4.2 of the RTI Core Libraries and Utilities Users Manual specifies all the coniditions that needs to be satisfied inorder to perform writer side filtering. I have listed the section below:

A DataWriter will automatically filter data samples for a DataReader if all of the following are
true; otherwise filtering is performed by the DataReader.
1. The DataWriter is filtering for no more than writer_resource_limits.max_remote_reader_filters DataReaders at the same time.

  • There is a resource-limit on the DataWriter called writer_resource_limits.max_remote_reader_filters (see DATA_WRITER_RESOURCE_LIMITS QosPolicy (DDS Extension) (Section 6.5.4)).This value can be from 0-32. 0 means do not filter any DataReader and 32 (default value) means filter up to 32 DataReaders.
  • If a DataWriter is filtering max_remote_reader_filters DataReaders at the same time and a new filtered DataReader is created, then the newly created DataReader (max_remote_reader_filters + 1) is not filtered. Even if one of the first (max_remote_reader_filters) DataReaders is deleted, that already created DataReader (max_remote_reader_filters + 1) will still not be filtered. However, any subsequently created DataReaders will be filtered as long as the  number of DataReaders currently being filtered is not more than writer_resource_limits.max_remote_reader_filters.

2. The DataReader is not subscribing to data using multicast.
3. There are no more than 4 matching DataReaders in the same locator (see Peer Descriptor Format (Section 14.2.1)).
4. The DataWriter has infinite liveliness. (See LIVELINESS QosPolicy (Section 6.5.13).)
5. The DataWriter is not using an Asynchronous Publisher. (That is, the DataWriter’s PUBLISH_MODE QosPolicy (DDS Extension) (Section 6.5.18) kind is set to DDS_SYNCHRONOUS_PUBLISHER_MODE_QOS.) See Note  elow.
6. If you are using a custom filter (not the default one), it must be registered in the DomainParticipant of the DataWriter and the DataReader.

Notes:
Connext supports limited writer-side filtering if asynchronous publishing is enabled. The mid-
dleware will not send any sample to a destination if the sample is filtered out by all the
DataReaders on that destination. However, if there is one DataReader to which the sample has to
be sent, all the DataReaders on the destination will do reader side filtering for the incoming sam-
ple.
In addition to filtering new samples, a DataWriter can also be configured to filter previously
written samples stored in the DataWriter’s queue for newly discovered DataReaders. To do so,
use the refilter field in the DataWriter’s HISTORY QosPolicy (Section 6.5.10).

To check if the filtering is being performed on the reader you can monitor the value of the filtered_sample_count in the DDS_DataReaderProtocolStatus. You can use the get_datareader_protocol_status api to query the current value of DDS_DataReaderProtocolStatus.

Inorder to understand the performance difference that you are observing when using content filter, I will need some more information about your test:

  1. How many data readers do you have with content filters installed?
  2. In your test are you writing samples for the same instance or different instances? If you are writing for different instances then can you specify how many samples are you writing per instance on an average?
  3. Are you measuring the throughput on the reader side or on the writer side? If you are mesuring throughput on the reader side then are you taking into account the samples that might have been filtered and hence never received by the reader?

Also it will be helpful if you are able to send me a reproducer inorder to look into this further.

Regards,

Roshan

Offline
Last seen: 11 years 1 month ago
Joined: 06/24/2013
Posts: 8

Hi Josef:

I'd like to follow up on what Roshan asked about your test sending new instances every time, or multiple samples for a small set of instances.

When filtering on key fields ONLY (i.e., the filter does not filter on anything other than key fields), once an instance has been filtered, the filter can cache the results of the filter (pass or fail). When subsequent samples of that instance need to be filtered, Connext DDS checks the cache results based on the instance handle, and avoids the need to filter the actual sample.

Especially when using DynamicData, which deserializes the sample before filtering, saving the overhead of deserialization and filtering can result in a significant improvement in performance. But this improvement is only realized if the filter has seen that instance before. If the test is always sending new instances (such as writing an incrementing value in the key field) then the optimization is never invoked.

Filters that include non-key fields, or types that have so many key fields that every sample is its own instance, prevent this optimization from being utilized.

As you can see, both the design on the test, and the design of the filter, can significantly influence the performance results you get.

Regards,

Kevin

Offline
Last seen: 3 years 5 months ago
Joined: 02/15/2013
Posts: 20

Hello roshan & KevinJ,

thank your for your detailed comments.

I am still investigating the performance issue and the mysteries get more and more:

1) I now show the 'filtered_sample_count' on both writer and reader side. Interestingly both counts remain zero even if the filter works perfectly. Sometimes the counter of the writer increments a little (e.g. to 15 for 500 samples sent), sometimes a little more but most of the time both counters remain at zero.

2) The thruput is dependent wether I start the subscriber or publisher first. This is interesting and should not happen.

3) The thruput is dependent on the value I filter for (e.g. when I filter for 384 then I get 600 samples/sec, when I filter for 640 I get 8000 samples/sec.)

4) The thruput is highly dependent on the QoS. I have used different QoS files from the examples, mainly reliable.xml, high_throughput.xml and low_latency.xml. I started with the reliable.xml which shows the described behaviour. The low_latency.xml is not so extreme but overall slower (about 1500 samples/sec).

Some information for my scenario:

I have used the hello - dynamic sample which sends back-to-back as fast as possible topics with a payload of 1 KB. This sample generates an enormous load on the communication. The dynamic data type was extended by me with an unsigned short key field which alternates between two values (384 and 640) on every write. This is the only key field. So half of the topics has 384 and the other half has 640 as key. The instance handle optimization should work here.

The application is started in either 'pub' or 'sub' mode and depending on the it creates a single writer or a single reader. I have not yet started more than two applications (one pub, one sub) as this would again increase my variations. There is enough mystery for now.

The variations are to many to test them in evey combination.

Please note that I'm using the standard sample distributed by RTI which was slightly modified with an added key field and a filter. The QoS XML files are also in the example\QoS directory.

Regards

Josef 

Offline
Last seen: 3 years 5 months ago
Joined: 02/15/2013
Posts: 20

Hello there,

I have attached the modified sample.It uses a makefile instead of a solution so you will need a commandline to compile.

Also the added timing code is specific to MS-Windows (QueryPerformance...).

Your results will vary but the effects can be seen by this example (at least on my machine).

Regards

Josef

File Attachments: 
Offline
Last seen: 11 years 3 months ago
Joined: 07/12/2013
Posts: 5

Hi Josef,

I was looking through the reproducer you have attached. One thing I noticed is that you have added "CEP" as a NONKEY_MEMBER. So the type you are using is not a keyed type and hence the optimization will not take effect. If you want the member to be used as a key then use the following snippet;

structTc->add_member("CEP",-1,factory->get_primitive_tc(DDS_TK_USHORT),DDS_TYPECODE_KEY_MEMBER,ex);

if (ex != DDS_NO_EXCEPTION_CODE) {

    std::cerr << "! Unable to add member to struct typecode, error=" << ex << std::endl;

    goto done;
}

Can you give that a try and see if you are getting better results? In the meantime I am also trying to reproduce the behavior you are observing.

 

Regards,

Roshan

Offline
Last seen: 3 years 5 months ago
Joined: 02/15/2013
Posts: 20

Hello Roshan,

in my original program the CEP is a key member. Please see the short snippet:

  structTc->add_member("CEP", -1, factory->get_primitive_tc(DDS_TK_USHORT), DDS_TYPECODE_KEY_MEMBER, ex);

When I created the reproducer this was an oversight. My real program wraps most of the typecode stuff in a C++ wrapper with smart pointers and automatic cleanup which I did not want to publuish here. So I created the reproducer from the sample program and added the relevant stuff.

The attached sample contains the fix.

Interestingly using CEP as a nonkey-member does not really influence the observed behaviour.

Regards

Josef

File Attachments: 
Offline
Last seen: 11 years 3 months ago
Joined: 07/12/2013
Posts: 5

Hi Josef,

I tried compiling the application you provided but ran into issues with that. So instead I have updated the shipped Hello_dynamic example to include the changes you made for using CFT and ran some tests inhouse.

In my tests I was able to verify that when using Content Filtered Topic where the filter expression is only based on a key fields there is about 20% degradation in throughput. Note that I am not able to reproduce the 50 times degradation that you have reported. 

Also note that with some Qos modifications I am even able to achieve ~0% degradation in throughput.

I noticed that in your test you were writing to 2 different instances but the filter expression was filtering for only one of the instances being written. So when measuring the throughput on the subscriber you were only considering 50% of the samples that were originally being written. So the throughput stats being reported on the subscriber application was not reflective of the actual throughput the publishing application was able to achieve. In my test I have changed this so that the filter expression is filtering for all the instances that are being written.

I am attaching the updated Hello_dynamic code so that you can run the same test and verify if you see a similar behavior. I have added 2 new parameters to the application:

  • -instanceVal <value> : This is the value that will be assigned to CEP (key field) when publishing. On the subscriber if content filter is being used then the expression will be CEP = <value>.If this option is not provided then the publisher will switch between CEP=384 and CEP=640 and the subscriber if using content filter will have the expression (CEP=384) or (CEP=640)
  • -useCft                     : This option enables the use of content filter on the subscriber application. This is not needed on the publisher application.

Also note that with the USER_QOS_PROFILES.xml that I have provided here I expect ~0% degradation in throughput. 

If required do modify the application to reproduce the original problem you were reporting and send it back to me.

Regards,

Roshan

File Attachments: