Performance implications for using XTypes

8 posts / 0 new
Last post
Offline
Last seen: 6 years 1 month ago
Joined: 03/06/2014
Posts: 6
Performance implications for using XTypes

We are looking into using the DynamicData / DynamicType interfaces specified by the DDS-XTypes standard. The spec mentions "may be lower performance to use than plain data objects" for the use of "dynamic language bindings". Before we dive deeper into trying it and building prototypes with that we would like to ask for an assessment about the performance implications. We are intrested in a rough estimation (rather then exact numbers).

Is the performance impact mostly limited to the discovery phase or does it affect the handling of every single message?

What is the to-be-expected overhead for each message handled / serialized / transfered?

Thanks!

Organization:
Gerardo Pardo's picture
Offline
Last seen: 3 weeks 1 day ago
Joined: 06/02/2010
Posts: 601

Hi Dirk,

Your question is really about the DynamicData aspect of XTYPES. XTYPES itself does not force the use of DynamicData so if you use XTYPES to get the flexibility of being able to evolve your type system but define the data types in IDL/XML and generate the serailization/deserialization code, there should be no performance impact. Well, almost if your type is "Mutable" rather than the  "Extensible" default then the encapsulation is a bit more verbose... But it is the minimal necessary to support modifying data types without breaking interoperability.

Regarding the use of DynamicData. It is hard to answer this question in a generic fashion because it will largely depend on the particular vendor implementation as well as the complexity of the data-type itself. 

The impact is per message. There is very little impact at discovery time.

Assume you have types such as this:

struct RobotPosition {
    float x;
    float y;
    float z;
};

struct RobotVelocity {
    float vx;
    float vy;
    float vz;
};

struct RobotState {
     Position pos;
     Velocity vel;
};
              

If you generate code from the above types in IDL then the serialization/deserialzation code is generated to access each field in the data type via direct member access and then call functions (which we, RTI, typically optimize to macros) to copy the bytes into a stream.

So we will generate something that ends up being (in pseudo-code):

int Serialize_RobotPosition(char *buffer, int bufferSize, struct RobotPosition *data)
   int numBytes = 0;

    // checks for buffer boundaries, errors, etc have are omitted
    memcpy(*bufferPosition, data->x, 4); numBytes += 4;
    memcpy(*bufferPosition, data->y, 4); numBytes += 4;
    memcpy(*bufferPosition, data->z, 4); numBytes += 4;

   return numBytes;
};

int Serialize_RobotVelocity(char *buffer, int bufferSize, struct RobotVeocity *data)
   int numBytes = 0;

    // checks for buffer boundaries, errors, etc have are omitted
    memcpy(*bufferPosition, data->vx, 4); numBytes += 4;
    memcpy(*bufferPosition, data->vy, 4); numBytes += 4;
    memcpy(*bufferPosition, data->vz, 4); numBytes += 4;

   return numBytes;
}

int Serialize_RobotState(char *buffer, int bufferSize, struct RobotState *data)
   int numBytes = 0;
    // checks for buffer boundaries, errors, etc have are omitted

   numBytes  += Serialize_RobotPosition(buffer, &data->pos);
   numBytes  += Serialize_RobotVelocity(buffer, &data->pos);

   return numBytes;
};

This has omitted some details, aside from error checkin it also does not manage the fact that a float may not be 4 bytes in certain platforms. The point was to illustrate that the end result is pretty close to optimal. You can see the actual code when you run rtiddsgen it is all placed in the <Typename>Support.c file.

However if you use the dynamic data API, then there is no code generated, so:

(1) The typecode has to be interpreted to determine what needs to be serialized and the type of the element

(2) The access to the actual data value is also a function call rather than a direct member access as in:

int Serialize_DynamicData(char *buffer, int bufferSize, DynamicData *data)
   int numBytes = 0;

   for (int i=0; i<  data->get_type()->get_member_count(); ++i ) {
       switch ( type->get_member_kind() ) { 
       case STRUCT_TYPE:
           DynamicData *nestedData =  data->get_member_as_dynamic_data(i);
           numBytes += SerializeDynamicData(buffer + numBytes, bufferSize - numBytes, nestedData);
           break; 
       case FLOAT_KIND:
          float floatValue = data->get_member_value_as_float(i);
          memcpy(buffer, &floatValue, 4); numBytes += 4;
          break;
       case INT_KIND:
          ... // other member types 
       }
   }     

   return numBytes;
}

 

So there are more function calls and loops iterating over each member with switches to determine the member kind do the appropriate function can be called, etc.

The above is not exactly how it is done, but it reflects the fact that things that were generated before are now "interpreted".

The actual performance impact of the extra function calls will depend on the complexity of the type. If the type is deeply nested and has a lot of primitive fields then the impact will be bigger. If the type is shallower and the main types are strings and arrays/sequences of primitve elements then the impact will be smaller. This is because the arrays of primitive elements are optimized so that a single function is called to serialize the while array rather then iterating over each member.

I think to get a more precise answer would require a test with some concrete data types...

Gerardo

 

Offline
Last seen: 6 years 1 month ago
Joined: 03/06/2014
Posts: 6

Hi Gerardo,

thank you for the thorough answer.

After prototyping some stuff with DynamicData I noticed that the API does not provide a method for loaning any of the member data or getting the pointers to contiguous memory.

Did I miss that in the API or is that another case where using the DynamicData interface will cost us performance?

If it is currently not available - is it planned to add these kind of accessors in the future?

Thanks!

Offline
Last seen: 3 years 1 month ago
Joined: 01/15/2013
Posts: 94

Hi Dirk,

By loaning a member, do you think the API DDS_DynamicData_bind_complex_member() would be what you're looking for? When using this API you get acces to a complex field (e.g. a struct) inside a DynamicData object. Take into account that this method has to be used in conjunction with DDS_DynamicData_unbind_complex_member() as you go down and back up in the DynamicData object tree.

Hope this helps.

Thanks,

Juanlu

Offline
Last seen: 6 years 1 month ago
Joined: 03/06/2014
Posts: 6

Hi Juanlu,

I was thinking about the loan_contiguous / unloan functions to efficiently set for examle an OctetSeq value.

The DynamicData interface doesn't seem to provide those and therefore I have to copy the bytes one more time which I would like to avoid.

Thanks

Offline
Last seen: 3 years 1 month ago
Joined: 01/15/2013
Posts: 94

Hi Dirk,

The primitive types' sequence-based set/get methods in the DynamicData API get the contiguous buffer of a sequence and set it directly in the DynamicData sample. For example, to set an octet sequence, you could use DDS_DynamicData_set_octet_seq(). Have you seen these method family?

Thanks,

Juanlu

Offline
Last seen: 6 years 1 month ago
Joined: 03/06/2014
Posts: 6

Hi Juanlu,

yes, I have seen these methods and use them for now.

But they imply that the underlying data is copied. I was wondering if there is a way to avoid that extra copy similar to:

* DataReaders providing `take` / `return_load`

* Sequences providing `loan_contiguous` / `unload`

Thanks

Offline
Last seen: 3 years 1 month ago
Joined: 01/15/2013
Posts: 94

Hi Dirk,

Yes, the set methods end up being a memcpy call. Although they are efficient, they're not more efficient than a loan.

I don't think there's an API for what you're looking for. And as far as I know, I don't think there are plans to add this APIs any soon.

Sorry for not being very helpful.

Thanks,

Juanlu