I'm attempting to run a large number of participants (via individual C++ exes), each containing one subscriber with a data reader for a content filtered topic.
I'm finding that I can start approx 130 of them and then get the following error:
RTIOsapiThread_new:OS pthread_create() failure, error 0: Success, attr sched policy 0, schedparam priority 0, inheritsched 1, detach state 1, scope 0, stack size 67108864COMMENDActiveFacade_addReceiverThread:!create rR030066c5501COMMENDLocalReaderRO_init:!create multicast entryPortCOMMENDSrReaderService_createReader:!init roPRESPsService_enableLocalEndpointWithCursor:!create Reader in srrServicePRESPsService_enableLocalEndpoint:!enable local endpointDDSDataReader_impl::createl:ERROR: Failed to auto-enable entityDDSFactoryPluginSupport::createDataReader:Error: Failed to create DataReaderDDS_FactoryXmlPlugin_createDataReadersWithNamesl:!create DataReaderDDS_FactoryXmlPlugin_createDataReaders:!create DataReaders from XML DataReader "::MyParticipantLibrary::MyParticipant::MySubscriber::MyReader"
Note that I'm using multicast and the participants are defined in an XML file and created via XML Application Creation.
I have also modified the following QoS settings to ensure that I can create enough participants in the same domain (0):
domain_id_gain = 2000participant_id_gain = 2
I'm assuming it's port related based on the reference to "create multicast entryPort", but I've worked through the port numbering algorithm and can't see that I should be getting any problems. When I start the first participant I see 4 ports being used (7400, 7401, 7410 & 7411), the second participant uses another 4 (7400, 7401, 7412 & 7413) and so on....
Any ideas?
Hi,
I do not think this problem is port related. I believe that the problem is that pthread_create() is failing. This is what the first line in the error message is saying.
This log message should be reporting the reason for the error. But extrangely it is printing that pthread_create() returned "0" (which would be a success). However inspecting the code I can see that the only way it can get to print that message is if pthread_create() returned a non-zero value.
Each DDS DomainParticipant creates a number of threads. Some of these threads are created to serve the different receive ports. So in particular the DomainParticipant tries to create a thread to listen to the multicast address and this is failing.
Since we cannot see the error value returned by pthread_create() I am not sure what is causing the failure. But one thing I would suspect is memory.
What platform are you running on? Depending on the platform each thread may be getting a default stack size which in some Linux computers is about 8MB. In this case, given that each participant creates on the order of half a dozen threads it would end up requiring about 50MB of stack space per DomainParticipant. To create 130 participants it would need more than 6GB of memory which may exceed what you have available.
It that were indeed the reason for the failure then you could work around it by specifying a smaller stack size. This you can do using the ThreadSettings_t which appear in several QoS Policies. See the detailed description section in: http://community.rti.com/rti-doc/510/ndds.5.1.0/doc/html/api_cpp/structDDS__ThreadSettings__t.html
Gerardo
Thanks for your response Gerardo.
A bit more background information....
I'm running on a Linux platform with the following:
Memory available = approx 23GB
Max stack size (based on ulimit -s) = 64MB (and I note that you can see this value (in bytes) being passed into the pthread_create call as listed in the error message)
As per your suggestion, I've tried modifying the stack_size participant QoS settings for both <event> and <receiver_pool> to a different smaller value, but it does not seem to effect how many processes I can run. I still get the same error message (except I can see the reduced stack size limit being passed into pthread_create, so at least I know the value is getting through!). I've tried it with a range of values, both smaller (32K) and larger (80MB), with the same results. I can still only get approximately 130 processes/participants to run (remember that I'm running each participant as a separate executable).
As we seem to have ample memory, I'm wondering if there is another limit I'm exceeding. Perhaps running out of file descriptors or something similar?
Julie
Well, it turns out I was exceeding the number of max user processes/threads allowed.
The default max value was 1024, so by the time you account for the number of threads each executable starts, plus other processes I was running (eg shells etc), it was exceeding the limit.
I increased the max to 10000 (ulimit -u 10000) and it now works!