Hello there,
I have modified the "ndds.5.0.0\example\CPP\Hello_dynamic" example with a content filter on the subscriber side. The filter expression filters on a key field (I added an unsigned short key field in the dynamic type creation) and compares for equality "(CEP = %0)" where the parameter is "0x180". Everything works as expected.
Without the content filter I get about 30000 samples / second via UDPv4 (explicit configured to use network transport and not shared memory). When the content filtered topic is used for the data reader then I get about 600 samples / second. That is a factor of 50 less!
I was hoping the content filter would be applied at the data writer side and the thruput would stay about the same.
What am I doing wrong here?
Is there any way to check if the filter is applied on the data writer?
I have added the following code to the HelloSubscriber.cpp:
// Base pointer
DDSTopicDescription* internal_topic = topic;
// Set to true to enable content filter
if (true) {
// Sequence of parameters for the content filter expression
DDS_StringSeq parameters(1);
const char* param_list[] = { "0x0180" };
parameters.from_array(param_list, 1);
// Create the content filtered topic
// The Content Filter Expresion has one parameter:
// - %0 -- CEP must be equal to %0.
DDSContentFilteredTopic *cft = participant->create_contentfilteredtopic("ContentFilteredTopic", topic, "(CEP = %0)", parameters);
if (cft == NULL) {
cerr << "! Unable to create content filtered topic" << endl;
return false;
}
// Replace topic with content filtered one
internal_topic = cft;
}
HelloListener listener(verbose);
DDSDataReader* dataReader = participant->create_datareader(internal_topic, DDS_DATAREADER_QOS_DEFAULT, &listener, DDS_STATUS_MASK_ALL);
The reduction in performance makes content filtering unusable for me.
Regards
Josef
Hi Josef,
The section 5.4.2 of the RTI Core Libraries and Utilities Users Manual specifies all the coniditions that needs to be satisfied inorder to perform writer side filtering. I have listed the section below:
To check if the filtering is being performed on the reader you can monitor the value of the filtered_sample_count in the DDS_DataReaderProtocolStatus. You can use the get_datareader_protocol_status api to query the current value of DDS_DataReaderProtocolStatus.
Inorder to understand the performance difference that you are observing when using content filter, I will need some more information about your test:
Also it will be helpful if you are able to send me a reproducer inorder to look into this further.
Regards,
Roshan
Hi Josef:
I'd like to follow up on what Roshan asked about your test sending new instances every time, or multiple samples for a small set of instances.
When filtering on key fields ONLY (i.e., the filter does not filter on anything other than key fields), once an instance has been filtered, the filter can cache the results of the filter (pass or fail). When subsequent samples of that instance need to be filtered, Connext DDS checks the cache results based on the instance handle, and avoids the need to filter the actual sample.
Especially when using DynamicData, which deserializes the sample before filtering, saving the overhead of deserialization and filtering can result in a significant improvement in performance. But this improvement is only realized if the filter has seen that instance before. If the test is always sending new instances (such as writing an incrementing value in the key field) then the optimization is never invoked.
Filters that include non-key fields, or types that have so many key fields that every sample is its own instance, prevent this optimization from being utilized.
As you can see, both the design on the test, and the design of the filter, can significantly influence the performance results you get.
Regards,
Kevin
Hello roshan & KevinJ,
thank your for your detailed comments.
I am still investigating the performance issue and the mysteries get more and more:
1) I now show the 'filtered_sample_count' on both writer and reader side. Interestingly both counts remain zero even if the filter works perfectly. Sometimes the counter of the writer increments a little (e.g. to 15 for 500 samples sent), sometimes a little more but most of the time both counters remain at zero.
2) The thruput is dependent wether I start the subscriber or publisher first. This is interesting and should not happen.
3) The thruput is dependent on the value I filter for (e.g. when I filter for 384 then I get 600 samples/sec, when I filter for 640 I get 8000 samples/sec.)
4) The thruput is highly dependent on the QoS. I have used different QoS files from the examples, mainly reliable.xml, high_throughput.xml and low_latency.xml. I started with the reliable.xml which shows the described behaviour. The low_latency.xml is not so extreme but overall slower (about 1500 samples/sec).
Some information for my scenario:
I have used the hello - dynamic sample which sends back-to-back as fast as possible topics with a payload of 1 KB. This sample generates an enormous load on the communication. The dynamic data type was extended by me with an unsigned short key field which alternates between two values (384 and 640) on every write. This is the only key field. So half of the topics has 384 and the other half has 640 as key. The instance handle optimization should work here.
The application is started in either 'pub' or 'sub' mode and depending on the it creates a single writer or a single reader. I have not yet started more than two applications (one pub, one sub) as this would again increase my variations. There is enough mystery for now.
The variations are to many to test them in evey combination.
Please note that I'm using the standard sample distributed by RTI which was slightly modified with an added key field and a filter. The QoS XML files are also in the example\QoS directory.
Regards
Josef
Hello there,
I have attached the modified sample.It uses a makefile instead of a solution so you will need a commandline to compile.
Also the added timing code is specific to MS-Windows (QueryPerformance...).
Your results will vary but the effects can be seen by this example (at least on my machine).
Regards
Josef
Hi Josef,
I was looking through the reproducer you have attached. One thing I noticed is that you have added "CEP" as a NONKEY_MEMBER. So the type you are using is not a keyed type and hence the optimization will not take effect. If you want the member to be used as a key then use the following snippet;
Can you give that a try and see if you are getting better results? In the meantime I am also trying to reproduce the behavior you are observing.
Regards,
Roshan
Hello Roshan,
in my original program the CEP is a key member. Please see the short snippet:
structTc->add_member("CEP", -1, factory->get_primitive_tc(DDS_TK_USHORT), DDS_TYPECODE_KEY_MEMBER, ex);
When I created the reproducer this was an oversight. My real program wraps most of the typecode stuff in a C++ wrapper with smart pointers and automatic cleanup which I did not want to publuish here. So I created the reproducer from the sample program and added the relevant stuff.
The attached sample contains the fix.
Interestingly using CEP as a nonkey-member does not really influence the observed behaviour.
Regards
Josef
Hi Josef,
I tried compiling the application you provided but ran into issues with that. So instead I have updated the shipped Hello_dynamic example to include the changes you made for using CFT and ran some tests inhouse.
In my tests I was able to verify that when using Content Filtered Topic where the filter expression is only based on a key fields there is about 20% degradation in throughput. Note that I am not able to reproduce the 50 times degradation that you have reported.
Also note that with some Qos modifications I am even able to achieve ~0% degradation in throughput.
I noticed that in your test you were writing to 2 different instances but the filter expression was filtering for only one of the instances being written. So when measuring the throughput on the subscriber you were only considering 50% of the samples that were originally being written. So the throughput stats being reported on the subscriber application was not reflective of the actual throughput the publishing application was able to achieve. In my test I have changed this so that the filter expression is filtering for all the instances that are being written.
I am attaching the updated Hello_dynamic code so that you can run the same test and verify if you see a similar behavior. I have added 2 new parameters to the application:
Also note that with the USER_QOS_PROFILES.xml that I have provided here I expect ~0% degradation in throughput.
If required do modify the application to reproduce the original problem you were reporting and send it back to me.
Regards,
Roshan