Failed to create DataReader error

4 posts / 0 new
Last post
Offline
Last seen: 10 years 2 months ago
Joined: 08/04/2014
Posts: 6
Failed to create DataReader error

I'm attempting to run a large number of participants (via individual C++ exes), each containing one subscriber with a data reader for a content filtered topic.

I'm finding that I can start approx 130 of them and then get the following error:

RTIOsapiThread_new:OS pthread_create() failure, error 0: Success, attr sched policy 0, schedparam priority 0, inheritsched 1, detach state 1, scope 0, stack size 67108864
COMMENDActiveFacade_addReceiverThread:!create rR030066c5501
COMMENDLocalReaderRO_init:!create multicast entryPort
COMMENDSrReaderService_createReader:!init ro
PRESPsService_enableLocalEndpointWithCursor:!create Reader in srrService
PRESPsService_enableLocalEndpoint:!enable local endpoint
DDSDataReader_impl::createl:ERROR: Failed to auto-enable entity
DDSFactoryPluginSupport::createDataReader:Error: Failed to create DataReader
DDS_FactoryXmlPlugin_createDataReadersWithNamesl:!create DataReader
DDS_FactoryXmlPlugin_createDataReaders:!create DataReaders from XML DataReader "::MyParticipantLibrary::MyParticipant::MySubscriber::MyReader"

 

Note that I'm using multicast and the participants are defined in an XML file and created via XML Application Creation.

I have also modified the following QoS settings to ensure that I can create enough participants in the same domain (0):

domain_id_gain = 2000
participant_id_gain = 2

 

I'm assuming it's port related based on the reference to "create multicast entryPort", but I've worked through the port numbering algorithm and can't see that I should be getting any problems. When I start the first participant I see 4 ports being used (7400, 7401, 7410 & 7411), the second participant uses another 4 (7400, 7401, 7412 & 7413) and so on....

Any ideas?

Gerardo Pardo's picture
Offline
Last seen: 1 week 1 day ago
Joined: 06/02/2010
Posts: 602

Hi,

I do not think this problem is port related. I believe that the problem is that pthread_create() is failing. This is what the first line in the error message is saying.  

This log message should be reporting the reason for the error. But extrangely it is printing that pthread_create() returned "0" (which would be a success). However inspecting the code I can see that the only way it can get to print that message is if  pthread_create() returned a non-zero value.

Each DDS DomainParticipant creates a number of threads. Some of these threads are created to serve the different receive ports. So in particular the DomainParticipant tries to create a thread to listen to the multicast address and this is failing.

Since we cannot see the error value returned by pthread_create() I am not sure what is causing the failure. But one thing I would suspect is memory.

What platform are you running on? Depending on the platform each thread may be getting a default stack size which in some Linux computers is about 8MB. In this case, given that each participant creates on the order of half a dozen threads it would end up requiring about 50MB of stack space per DomainParticipant. To create 130 participants it would need more than 6GB of memory which may exceed what you have available.

It that were indeed the reason for the failure then you could work around it by specifying a smaller stack size. This you can do using the ThreadSettings_t which appear in several QoS Policies. See the detailed description section in:  http://community.rti.com/rti-doc/510/ndds.5.1.0/doc/html/api_cpp/structDDS__ThreadSettings__t.html

Gerardo

Offline
Last seen: 10 years 2 months ago
Joined: 08/04/2014
Posts: 6

Thanks for your response Gerardo.

A bit more background information....

I'm running on a Linux platform with the following:

Memory available = approx 23GB

Max stack size (based on ulimit -s) = 64MB (and I note that you can see this value (in bytes) being passed into the pthread_create call as listed in the error message)

As per your suggestion, I've tried modifying the stack_size participant QoS settings for both <event> and <receiver_pool> to a different smaller value, but it does not seem to effect how many processes I can run. I still get the same error message (except I can see the reduced stack size limit being passed into pthread_create, so at least I know the value is getting through!). I've tried it with a range of values, both smaller (32K) and larger (80MB), with the same results. I can still only get approximately 130 processes/participants to run (remember that I'm running each participant as a separate executable).

As we seem to have ample memory, I'm wondering if there is another limit I'm exceeding. Perhaps running out of file descriptors or something similar?

Julie

 

Offline
Last seen: 10 years 2 months ago
Joined: 08/04/2014
Posts: 6

Well, it turns out I was exceeding the number of max user processes/threads allowed.

The default max value was 1024, so by the time you account for the number of threads each executable starts, plus other processes I was running (eg shells etc), it was exceeding the limit.

I increased the max to 10000 (ulimit -u 10000) and it now works!