We use rti dds to distribute stocks, futures and options quote data to clients. The number if client will be millions.
1.How to design topic? We use each symbol as a topic. The number of symbol is hundred thousands,so the topic number could be hundred thousands. The performance could be affected if we use so many topics? What's the limitation of the topic numbers in a domain?
2.If we use stocks as a topic,futures as a topic,options as a topic. each symbol as key.One publisher distribute stocks data,the second publisher distribute futures data,the third publisher distribute options data. Than how can we send symbol IBM data to some subscribers which they subscribe stock IBM, and send symbol GOOGLE data to some another subscribers? Should we use filter function? As far as we know, the filter function will affect the performance of the system. Is that right?
3. In a domain, there are some publishers, in our case,we want have three or more publishers. each publisher distribute the same data, which also mean the same topic. So the client side also named as subscriber connect only one publisher at the same time.They don't need connect other publishers, how to do this? Because we think there would be a waste if each subscriber connect all the publishers the whole time. Is that right? Is there a limitation of the number of subscribers to connect one publisher ? Is there a limitation of the number of subscribers in a domain?
Hi,
1) From point 2, I assume you want to define 3 topics: stock, future, and option. Are these topics related? Could you provide an example of the data type that would define each of these topics so we can help you to improve the solution? Maybe information you want to share can be grouped in one simple topic.
2) For the scenario you describe, you would need to create one publisher per topic. To receive information published by each of these publishers, you would need to define 3 subscribers. Each subscriber would need to create a content filtered topic to receive data from only a certain company (i.e., if you want IBM stocks, you would define a content filtered topic that would allow the data reader to receive only IBM data). The impact of filtering in the performance of your system will deppend on the scenario. If you want to avoid filtering, one thing you could do is to define a topic per company--you would only need to create Google or IBM publishers/subscribers.
3) Subscribers only receive information on the topic they are subscribed to. In your scenario, you would need to create 3 subscribers that wouldn't receive information from different topics. There are no limitations on the number of subscribers that can interact in a domain. There's an implicit limitation on the number of participants due to the ports that can be used in a machine, but participants can create as many publishers and subscribers as far as I know.
Hi,
Given you have a very large number of symbols you should definitely not map each symbol to a different DDS Topic. This would not scale due to the resources all those DataWriters and DataReaders would consume. It would also make it harder to program, having different topics would require you to also manage different DataWriters/DataReaders to publish/subscribe each Topic. So if an application is publishing many symbols it would need that many DataWriters and it it subscribes to many symbols it would have to create that many DataReaders, set the corresponding listeners or Conditions, etc. In contrast if you use the 'simpleTopic' you describe then you can publish any symbol with a single DataWriter and you can subscribe to a collection of symbols using a single DataReader and a ContentFilteredTopic where all the symbols of interest has been added.
So as suggested by Juan Martin in point (2), the right mapping in that situation is to use fewer Topics, which really map more to the different data-types, include the symbol information in the data-type marking those mambers as 'key', and use ContentFilteredTopic in the DataReader to control what data they get. Basically what you show in the 'simpleTopic' struct.
With regards to the performance. Our latest release Connext DDS 5.0.0 has improved the scalability and performance of content filtering significantly and moreover there are no limits now on the number of DataReader filters that a DataWriter can manage. However you will need to modify the DataWriterQoS. Specifically the DataWriterResourceLimitsQosPolicy because the out-of-the box setting of this QoS limits the writer to filter on the writer side for only 32 DataReaders.
For your specific use-case of filtering on stock symbols. I would recommend you use the STRINGMATCH filter. It was developed with this use-case in mind and really simplifies setting-up this kinds of filters. However to use this you will need to modify your IDL slightly so you have a single member as the key. That is:
This is described in section 5.4.7 titled "STRINGMATCH Filter Expression Notation" of the RTI Connext DDS Users Manual.
I would also recommend you take a look at using MultiChannel DataWriters. See Chapter 18 titled "Multi-channel DataWriters" of the RTI Connext DDS Users Manual. This feature was also developed to assist the distribution of real-time stock/option data to large numbers of consumers.
Regarding your third question. I want to offer a clarification because the language can be a bit confusing. In DDS the entity that declares intent to publish data and writes it is the DataWriter. Similarly the DataReader declares intent to subscribe and actually receives the data. The discovery and matching het DDS performs is done per DataWriter and per DataReader. So each DataWriter knows which DataReaders it should be sending data to. DDS also has entities called Publisher and Subscriber. But a DDS Publisher is simply a convenience grouping of DataWriters that facilitates their configuration and manages local shared resources like threads. The same applies to the Subscriber; it is just a local convenience grouping of DataReaders.
With this in mind, your last question (number 3) is really about DDS DataWriters and DataReaders. If you use the mapping suggested above you will only need one DataWriter for Stocks, maybe another for Options, etc. One per data-type. Then a given application would only have one DataReader to Stocks with content filter to specify the symbols it wants.
The fact that the 'Stock' DataWriter is matching all the 'Stock' DataReaders even if a particular DataWriter does not publish any symbol of interest does introduce some inefficiencies. But the ContentFilters, specifically the STRINGMATCH one have been developed for this. And if you know the symbols a particular DataWriter wil publish you can use Multi-channel DataWriters to declare this and this would pre-match only the DataReaders that have overlapping interest.
I hope this is clear, if not feel free to ask more questions.
Gerardo
Hi Gerardo, thanks for your reply.
We use RTI DDS on WAN, the DataReaders locate on Internet all over the world, maybe some locate on behind the LAN. So we may not take all the advance of RTI DDS, such as MultiChannel DataWriters. Is it right?
Although there are no limits now on the number of DataReader filters that a DataWriter can manage, but how many would be the best performance balance for RTI DDS? We expect the number is above 10,000 and the DateWriter still work on low latency. Is it OK?
Hello,
The questions you are asking are not so easy to answer without a deeper understanding of your specific deployment scenario. For example you state that "We use RTI DDS on WAN, the DataReaders are located on Internet all over the world, maybe some locate on behind the LAN".
As you correctly state you cannot use multicast over the WAN because generally speking the open internet does not route multicast packets. Moreover you will be facing issues like firewalls and NATs and you will need special configuration of the RTI DDS transports to go through them. However, depending on your setup there are many things you could do. For example, if you had many consumers on the same LAN (even if they are separated by a WAN from the producers), you could use the RTI DDS Routing Service to serve as a gateway/proxy for all the consumers in that LAN so that the Routing Service receives a single message from the producers and relays it to the appropriate consumers. In addition to saving bandwidth and CPU on the producer side this setup would also allow you using multicast and MultiChannel DataWriters from the Routing Service to the consumers on the LAN.
Even if your consumers are fully dispersed in different LANs it may still make sense to create an overlay network of Routing Services to scale the distribution in a tree fashion so that messages do not have to flow peer to peer end-to-end. There are many other features that you can also use to help scalability like the recently introduced "Collaborative Datawriters" feature.
Your deployment seems sufficiently large and complex that I would be concerned about its feasibility. I would really recommend you get some training from our Consulting & Preofessional Services group. They have a lot of experience in deploying large systems and can train you quickly on all the technology and techniques you can use. Perhaps more importantly they can save you a lot of time and frustration avoiding the situation where you follow the wrong path to only hit some sort of scalability limit when you try to test and deploy. Is getting this kind of service something that is feasible for you?
Regards,
Gerardo