How do I send strings with non-English characters (UTF-8, unicode, etc.) over DDS?

Note: This solution applies to RTI Data Distribution Service 4.x. and above.

Regular strings in RTI Connext DDS consist of single-byte characters. Hence, only US-ASCII characters can be represented in a string. Characters for other languages, such as Greek, Cyrillic, Coptic, Armenian, Hebrew, Arabic, Ryriac and Tāna, as well as almost all Latin-derived alphabets, require more than one byte.

To correctly represent these types of characters, RTI Connext supports wide strings as defined in the IDL specification. Wide strings are strings consisting of wide characters. The wide characters supported by Connext are four bytes long, so they are large enough to store not only two-byte unicode/UTF-16 characters but also UTF-32 characters.

In your IDL, wide strings are declared similarly to regular strings, but using the keyword “wstring”. For example:

struct myStruct{
    wstring<100> myWideString;
};  

The generated code varies depending on the language:

  • For C/C++, the field myWideString is represented as DDS_Wchar * myWideString.
  • For C++/CLI, the field myWideString is represented as System::String^ myWideString.
  • For Java,  the field myWideString is represented as String myWideString.
Programming Language:

Comments

The statement "Hence, only US-ASCII characters can be represented in a string" seems misguided. UTF-8 and many other character encodings can be stored in byte arrays, and that is what the DDS type "string" is for. Could this page be updated to explicitly indicate how to encode/decode strings based on the character encoding of the platform/programming language?

Hi Ben,

As you can see in this link to our documentation, this issue has been resolved in our latest release, RTI Connext DDS 5.3.0. Upgrading to this version should allow you to send UTF-8 strings.

-Fran