Extremely Slow functions in the RTI Connext Python Connector

11 posts / 0 new
Last post
Offline
Last seen: 6 years 3 months ago
Joined: 07/23/2018
Posts: 20
Extremely Slow functions in the RTI Connext Python Connector

I am integrating ROS2 with our tool using RTI DDS Python Connector. It is the best available API for use case and has brought extremely good results until now.

As we dig deeper into the performance side, we found some problems which I need to address :

Our application is time critical and this was the reason for opting to DDS.

endTime = icos.getEndTime()
print("endtime: ",endTime)
microsteps = int(endTime*100.0)
print("microsteps: ",microsteps)
sim_stepsize = icos.getModelDefsSpecial()['TimeStep']

connector = rti.Connector("MyParticipantLibrary::Zero",
filepath + "/../ROS2_openadx_PubSub.xml")
outputDDS = connector.getOutput("MyPublisher::VehicleObjectWriter")
inputDDS = connector.getInput("MySubscriber::VehicleControlReader")
starttime = time.clock()
times=[]

for x in range(microsteps+1):
tstep = (float(x)/100.0)
times.append( time.clock())
a=icos.getScalarInput("id",tstep)
b=icos.getScalarInput("object_id",tstep)
c=icos.getScalarInput("x",tstep)
d=icos.getScalarInput("y",tstep)
e=icos.getScalarInput("v_rad",tstep)
f=icos.getScalarInput("a_rad",tstep)

#print("timestep : ", tstep)
#print("x : ",x)
times.append( time.clock())

outputDDS.instance.setNumber("id_",a)
times.append( time.clock())
outputDDS.instance.setNumber("objects_[1].id_",b)
times.append( time.clock())
outputDDS.instance.setNumber("objects_[1].x_",c)
times.append( time.clock())
outputDDS.instance.setNumber("objects_[1].y_",d)
times.append( time.clock())
outputDDS.instance.setNumber("objects_[1].v_rad_",e)
times.append( time.clock())
outputDDS.instance.setNumber("objects_[1].a_rad_",f)
times.append( time.clock())
outputDDS.write()
times.append( time.clock())
inputDDS.take()
times.append( time.clock())
numOfSamples = inputDDS.samples.getLength()
for j in range(1, numOfSamples+1):
if inputDDS.infos.isValid(j):
# This gives you a dictionary
sample = inputDDS.samples.getDictionary(j)
acc_ped = sample['accelerator_pedal_']
brk_ped = sample['brake_pedal_']
steering = sample['steering_wheel_']
gear = sample['gear_']

toPrint = " accel_pedal: " + repr(acc_ped) + " brake_pedal: " + repr(brk_ped) + " steering_wheel: " + repr(steering)+ " gear: " + repr(gear)
print(toPrint)

icos.postScalarOutput("acc_ped", tstep, acc_ped)
icos.postScalarOutput("brk_ped", tstep, brk_ped)
icos.postScalarOutput("steering", tstep, steering)
icos.postScalarOutput("gear", tstep, gear)

times.append( time.clock())

for x in range(len(times)):
print ( "Timestep: " + str(x) +": " + str(times[x]-times[0]))
times = []


#toPrint = "id: " + repr(a) + " x: " + repr(b)+ "y"+repr(c)+"v_rad"+repr(d)
toPrint = "id_ : " + repr(a)

print(toPrint)

Now as you see, I have used the time.append(time.clock()) function multiple times. I use this function because the simulation was running extremely slow and I wanted to debug where the problem lies. To my surprise, I found out that the problem is in the setNumber function when it is writing data in an object.

Here are the results for two iterations :

endtime: 9999999.0
microsteps: 999999900
Timestep: 0: 0.0
Timestep: 1: 0.0011698700124171055
Timestep: 2: 0.0012413458610560557
Timestep: 3: 0.15906731310980113
Timestep: 4: 0.3018129703138191
Timestep: 5: 0.4538803910020659
Timestep: 6: 0.5918397190561541
Timestep: 7: 0.7305710592100825
Timestep: 8: 0.7307603243398968
Timestep: 9: 0.7307884041375764
Timestep: 10: 0.7308037203908562
id_ : 0.0
accel_pedal: 0 brake_pedal: 1 steering_wheel: 0 gear: 1
Timestep: 0: 0.0
Timestep: 1: 0.0009145991244208229
Timestep: 2: 0.000947419667163163
Timestep: 3: 0.1395704520300417
Timestep: 4: 0.28383570036303174
Timestep: 5: 0.41934333387426437
Timestep: 6: 0.5653145210844636
Timestep: 7: 0.7018961156887663
Timestep: 8: 0.7020864748366723
Timestep: 9: 0.7021090845438948
Timestep: 10: 0.704974317925304
id_ : 0.02
accel_pedal: 0 brake_pedal: 1 steering_wheel: 0 gear: 1

Now, if you look at it carefully, Timestep1 is the time it takes to establish communication to our tool i.e. 1.1ms which is ok.

Next, it takes less than 1ms (timestep2-timestep1) to execute the 

outputDDS.instance.setNumber("id_",a) 

since id_ is only an int32 i.e. singular.

The strange part comes next where you see that it takes more than 100ms to write data into an object.

This creates a huge problem for us and we are unable to work with such slow system as this should be faster or equal to realtime.

Now the question is why ?

  • What I could think of is, since the same function is being executed quite fast in terms of singular but takes a long time with objects. Could it be so that it takes time to defragment the name of the object, e.g. objects_[1].x_ and then look it up and then write data to it ?
  • What could be a possible solution to it ?

 

gianpiero's picture
Offline
Last seen: 11 months 2 weeks ago
Joined: 06/02/2010
Posts: 177

 

Hello Aakash,

Thank you for writing us about your finding and for using the RTI Connector for Python. The issue you are running into is related to two know issues:  1) the way DynamicData (wich connector uses) works and 2) Issue in accessing arrays of primitive types (more here

Regarding 1):

Dynamic Data API needs to 'bind' to access a complex member (more here) and to 'unbind' once is done. When you are accessing a top level member like "id_" there is no 'bind' required, hence there is no unbind. 

When you are accessing "objects_[1].x_", beside the parsing of the string, the dynamic data API has to bind to the comples member (array), then to the complex structure that contains "x_" and then give it back to then, and of course, unbind. Then it has to do the same again to access  "objects_[1].y_" and so on.. 

So it can be a bit slow. The good new is that we are going to realease a faster version of the underline DynamicData that shoul speed up everything, including the connector. 

Regarding 2):

Today we have two way to set/get data in and out from connector, the way you are using (field-by-field) or by calling get/set Dictionary(). So you should try the second way and see if that is fast enough in your case. It can be, but i doubt it because there is a lot of parsing strings and so on. The good news is that we are implementing a third way that should help use cases like the array/sequences with a lot of primitives. Is a way to get a native pointer to the underlying dynamic data native object and then call set_<primitive>_array all together. 

First thing i would ask you to do is to try with the setDictinary(sample) to see if that helps. See more doc here

Let me know how that goes. If that does not work I may be able to share an engineering build with you that expose thegetNative() and you can try that way. 

I hope this helps! 

Best,
  Gianpiero

Offline
Last seen: 6 years 3 months ago
Joined: 07/23/2018
Posts: 20

Hi Gianpiero,

Thank you for the quick response. I tried the same thing with the setDictionary method and the results were even more surprising (if i did not do anything wrong).

Here is the code and as you recommended, I created a dictionary of the same object type :

outputDDS = connector.getOutput("MyPublisher::VehicleObjectWriter")

starttime = time.clock()
times=[]


for i in range(5000):

times.append( time.clock())
sample = dict([('id_' ,i),('objects_[1].id_' , 1),
('objects_[1].x_' ,2),('objects_[1].y_' , 3),
('objects_[1].v_rad_',4),('objects_[1].a_rad_',5)])
times.append( time.clock())
outputDDS.instance.setDictionary(sample)
times.append( time.clock())
outputDDS.write()
times.append( time.clock())


for x in range(len(times)):
print ( "Timestep: " + str(x) +": " + str(times[x]-times[0]))
times = []

Now, look at the result : 

Timestep: 0: 0.0
Timestep: 1: 8.3874536821104e-06
Timestep: 2: 0.7527564637182457
Timestep: 3: 0.7528567484905317
Timestep: 0: 0.0
Timestep: 1: 5.8347503876232665e-06
Timestep: 2: 0.7258527943531285
Timestep: 3: 0.725920623326384
Timestep: 0: 0.0
Timestep: 1: 8.022781782912602e-06
Timestep: 2: 0.7557172348680308
Timestep: 3: 0.7557857931850844

It takes almost 750 ms to execute the setDictionary() function which is way too much.

Could you please look into it ? 

 

 

 

gianpiero's picture
Offline
Last seen: 11 months 2 weeks ago
Joined: 06/02/2010
Posts: 177

Hello Aakash,

Thanks for trying. I created a reproducer to test your scenario and then debug it. You can find both the XML and the python file here

Unfortunately I wasn't able to get your results. I got this instead:

Timestep 0: 0.0
Timestep 1: 6.899999999999962e-05
Timestep 2: 0.0001769999999999966
Timestep 3: 0.00020799999999999985
Timestep 4: 0.0002299999999999941
Timestep 5: 0.0002520000000000022
Timestep 6: 0.00027399999999999647
Timestep 7: 0.00033400000000000096

or in ms: 

0.06899999999999960000
0.10799999999999600000
0.03100000000000300000
0.02199999999999500000
0.02200000000000800000
0.02199999999999400000
0.06000000000000400000

So, I don't see that big difference that you are observing.. Can you please share your reproducer with me? 

Best,
    Gianpiero

Offline
Last seen: 6 years 3 months ago
Joined: 07/23/2018
Posts: 20

I guess I found where the problem is. In your example, the maximum sequence length is 10. Since I use an unbounded sequence, RTI interprets it as MAX_INT_32 and thus the sequenceMaxLength is 2147483647.

When I change this in your xml, the results are the same as mine. You can try with this xml configuration.

gianpiero's picture
Offline
Last seen: 11 months 2 weeks ago
Joined: 06/02/2010
Posts: 177

Hello Aakash,

Thanks for your file: I was able to reproduce the issue using your file. If I use max_int i get this:

0.05700000000000140000
0.18000000000000000000
0.10899999999999800000
0.08800000000000400000
0.08900000000000600000
0.08699999999999000000
0.08700000000000400000

 

While if I limit the size to 10 I get this:

0.05700000000000140000
0.11799999999999300000
0.02900000000000100000
0.02399999999999700000
0.02300000000000900000
0.02299999999999500000
0.07999999999999700000

I digged around and asked some people and I found that this behavior is due to a bug in the underlying implementation of DynamicData used by the Connector. For future reference that is bug # CORE-8358 in our tracking system. EDIT: Basically, when a we bind to a sequence element of a complex member, all the elements are allocated. 

As I mention before, we are re-writing the implementation of Dynamic Data so this bug will be fixed in the next release, but unfortunately I don't have an easy work around. 

Can you change the size of the sequence to a smaller size? Are you trying to have connector talk to a native application that uses complile types? If yes you can have the connector sequence with a smaller size and then set a property to ensure that the two type match despite the difference in sequence length. The property is dds.type_consistency.ignore_sequence_bounds. More info in here

Best,
  Gianpiero

Offline
Last seen: 6 years 3 months ago
Joined: 07/23/2018
Posts: 20

Hi Gianpiero,

I am trying to connect RTI DDS connext python connector with ROS2 and my simulation tool. In ROS2 i use an unbounded sequence which is interpreted as MAX_INT32 i.e. 2147483647.  I am not sure if i can reduce the sequence suze and still communicate with ROS2. Could you please check it internally if it is possible ?

Also, How much time would it take approx to fix this bug ?

gianpiero's picture
Offline
Last seen: 11 months 2 weeks ago
Joined: 06/02/2010
Posts: 177

Hello, 

Yes. You should be able to reduce the sequence size on the connector XML and communicate with ROS two if you ALSO set this property dds.type_consistency.ignore_sequence_bounds: "When set to 1, sequences and strings in a DataReader type can have a maximum length smaller than that of the DataWriter type. When the length of the sequence in a particular samples is larger than the maximum length, that sample is discarded."

Find the full documentation here: https://community.rti.com/static/documentation/connext-dds/5.2.3/doc/manuals/connext_dds/getting_started_extras/html_files/RTI_ConnextDDS_CoreLibraries_GettingStarted_ExtensibleTypesAddendum/index.htm#ExtensibleTypesAddendum/Type_Consistency_Enforcement.htm

As I said before, the bug will be likely be fixed in the next release because we are rewriting dynamic data. If you want to know when the next release is going to be, or if you want a specific bug fix, you should contact your local representative or ask support.

Let me know if setting the smaller size in the connector and using that qos, solves your issue for now. 

Best,
  Gianpiero

Offline
Last seen: 6 years 3 months ago
Joined: 07/23/2018
Posts: 20

Hi,

I observed a very strange thing. I tried the same configuration in 2 different laptops and to my surprise the results are totally different. On One, I have already posted the results above and on the other, the time is perfectly fine, i mean less than few milliseconds. What could be the catch here ?

gianpiero's picture
Offline
Last seen: 11 months 2 weeks ago
Joined: 06/02/2010
Posts: 177

Hello,

I am not sure why you are observing that behavior.. are you using same architecture? It should be doing malloc in the same way.

Best,
  Gianpiero

Offline
Last seen: 6 years 3 months ago
Joined: 07/23/2018
Posts: 20

Yes. I am using exactly the same architecture. To verify this, i copied the files from one pc to another and observed the different behaviour.