How do I send strings with non-English characters (UTF-8, unicode, etc.) over DDS?
Note: This solution applies to RTI Data Distribution Service 4.x. and above.
Regular strings in RTI Connext DDS consist of single-byte characters. Hence, only US-ASCII characters can be represented in a string. Characters for other languages, such as Greek, Cyrillic, Coptic, Armenian, Hebrew, Arabic, Ryriac and Tāna, as well as almost all Latin-derived alphabets, require more than one byte.
To correctly represent these types of characters, RTI Connext supports wide strings as defined in the IDL specification. Wide strings are strings consisting of wide characters. The wide characters supported by Connext are four bytes long, so they are large enough to store not only two-byte unicode/UTF-16 characters but also UTF-32 characters.
In your IDL, wide strings are declared similarly to regular strings, but using the keyword “wstring”. For example:
struct myStruct{ wstring<100> myWideString; };
The generated code varies depending on the language:
- For C/C++, the field
myWideString
is represented asDDS_Wchar * myWideString
. - For C++/CLI, the field
myWideString
is represented asSystem::String^ myWideString
. - For Java, the field
myWideString
is represented asString myWideString
.
Comments
ben.hochstedler...
Mon, 10/27/2014 - 11:22
Permalink
UTF-8 and string
The statement "Hence, only US-ASCII characters can be represented in a string" seems misguided. UTF-8 and many other character encodings can be stored in byte arrays, and that is what the DDS type "string" is for. Could this page be updated to explicitly indicate how to encode/decode strings based on the character encoding of the platform/programming language?
Francisco Porcel
Fri, 09/29/2017 - 04:57
Permalink
Hi Ben,As you can see in this
Hi Ben,
As you can see in this link to our documentation, this issue has been resolved in our latest release, RTI Connext DDS 5.3.0. Upgrading to this version should allow you to send UTF-8 strings.
-Fran