Syntax for DDS Queries and Filters

A subset of the WHERE clause in SQL is used in several parts of the specification:

The filter_expression in the com.rti.dds.topic.ContentFilteredTopic
The query_expression in the com.rti.dds.subscription.QueryCondition
<<extension>> The filter_expression in the com.rti.dds.subscription.TopicQuerySelection
<<extension>> The filter_expression in the com.rti.dds.infrastructure.ChannelSettings_t

Those expressions may use a subset of SQL, extended with the possibility to use program variables in the SQL expression. The allowed SQL expressions are defined with the BNF-grammar below.

The following notational conventions are made:

NonTerminals are typeset in italics.
'Terminals' are quoted and typeset in a fixed width font. They are written in upper case in most cases in the BNF-grammar below, but should be case insensitive.
TOKENS are typeset in bold.
The notation (element // ',') represents a non-empty comma-separated list of elements.

SQL grammar in BNF

FilterExpression ::= Condition
Condition   ::= Predicate
            |   Condition 'AND' Condition
            |   Condition 'OR' Condition
            |   'NOT' Condition
            |   '(' Condition ')'
Predicate   ::= ComparisonPredicate
            |   BetweenPredicate
ComparisonPredicate ::= ComparisonTerm RelOp ComparisonTerm
ComparisonTerm      ::= FieldIdentifier
                    | Parameter
BetweenPredicate    ::= FieldIdentifier 'BETWEEN' Range
                    |   FieldIdentifier 'NOT BETWEEN' Range
FieldIdentifier      ::= FIELDNAME
                    | IDENTIFIER
RelOp     ::= '=' | '>' | '>=' | '<' | '<=' | '<>' | 'LIKE' | 'MATCH'
Range     ::= Parameter 'AND' Parameter
Parameter ::= INTEGERVALUE
          |   CHARVALUE
          |   FLOATVALUE
          |   STRING
          |   ENUMERATEDVALUE
          |   BOOLEANVALUE
          |   NULLVALUE
          |   PARAMETER

Token expression

The syntax and meaning of the tokens used in the SQL grammar is described as follows:

IDENTIFIER - An identifier for a FIELDNAME, and is defined as any series of characters 'a', ..., 'z', 'A', ..., 'Z', '0', ..., '9', '_' but may not start with a digit.

Formal notation:

      IDENTIFIER: LETTER (PART_LETTER)*
      where LETTER: [ "A"-"Z","_","a"-"z" ]
            PART_LETTER: [ "A"-"Z","_","a"-"z","0"-"9" ]

FIELDNAME - A fieldname is a reference to a field in the data structure. The dot '.' is used to navigate through nested structures. The number of dots that may be used in a FIELDNAME is unlimited. The FIELDNAME can refer to fields at any depth in the data structure. The names of the field are those specified in the IDL definition of the corresponding structure, which may or may not match the fieldnames that appear on the language-specific (e.g., C/C++, Java) mapping of the structure. To reference to the n+1 element in an array or sequence, use the notation '[n]', where n is a natural number (zero included). FIELDNAME must resolve to a primitive IDL type; that is either boolean, octet, (unsigned) short, (unsigned) long, (unsigned) long long, float double, char, wchar, string, wstring, or enum.

Formal notation:

      FIELDNAME: FieldNamePart ( "." FieldNamePart )*
      where FieldNamePart : IDENTIFIER ( "[" Index "]" )*
            Index> : (["0"-"9"])+
                   | ["0x","0X"](["0"-"9", "A"-"F", "a"-"f"])+

Primitive IDL types referenced by FIELDNAME are treated as different types in Predicate according to the following table:

Predicate Data Type	IDL Type
BOOLEANVALUE	boolean
INTEGERVALUE	octet, (unsigned) short, (unsigned) long, (unsigned) long long
FLOATVALUE	float, double
CHARVALUE	char, wchar
STRING	string, wstring
ENUMERATEDVALUE	enum

TOPICNAME - A topic name is an identifier for a topic, and is defined as any series of characters 'a', ..., 'z', 'A', ..., 'Z', '0', ..., '9', '_' but may not start with a digit.

Formal notation:

       TOPICNAME : IDENTIFIER

INTEGERVALUE - Any series of digits, optionally preceded by a plus or minus sign, representing a decimal integer value within the range of the system. 64-bit integers (int64 and uint64) must be followed by either l or L, otherwise the value is treated as a 32-bit integer. A hexadecimal number is preceded by 0x and must be a valid hexadecimal expression.

Formal notation:

       INTEGERVALUE : (["+","-"])? (["0"-"9"])+ [("L","l")]?
                    | (["+","-"])? ["0x","0X"](["0"-"9", "A"-"F", "a"-"f"])+ [("L","l")]?

CHARVALUE - A single character enclosed between single quotes.

Formal notation:

       CHARVALUE : "'" (~["'"])? "'"

FLOATVALUE - Any series of digits, optionally preceded by a plus or minus sign and optionally including a floating point ('.'). A power-of-ten expression may be postfixed, which has the syntax en or En, where n is a number, optionally preceded by a plus or minus sign.

Formal notation:

      FLOATVALUE : (["+","-"])? (["0"-"9"])* (".")? (["0"-"9"])+ (EXPONENT)?
      where EXPONENT: ["e","E"] (["+","-"])? (["0"-"9"])+

STRING - Any series of characters encapsulated in single quotes, except the single quote itself.

Formal notation:

      STRING : "'" (~["'"])* "'"

ENUMERATEDVALUE - An enumerated value is a reference to a value declared within an enumeration. Enumerated values consist of the name of the enumeration label enclosed in single quotes. The name used for the enumeration label must correspond to the label names specified in the IDL definition of the enumeration.

Formal notation:

      ENUMERATEDVALUE : "'" ["A" - "Z", "a" - "z"] ["A" - "Z", "a" - "z", "_", "0" - "9"]* "'"

BOOLEANVALUE - Can either be 'TRUE' or 'FALSE', case insensitive.

Formal notation (case insensitive):

      BOOLEANVALUE : ["TRUE","FALSE"]

NULLVALUE - Can be null, and is case insensitive.

Formal notation (case insensitive):

      NULLVALUE : "null"

PARAMETER - A parameter is of the form %n, where n represents a natural number (zero included) smaller than 100. It refers to the n + 1 ^th argument in the given context. Argument can only in primitive type value format. It cannot be a FIELDNAME.

Formal notation:

      PARAMETER : "%" (["0"-"9"])+

String Parameters

Strings used as parameter values must contain the enclosing quotation marks (') within the parameter value, and not place the quotation marks within the expression statement. For example, the following expression is legal:

" symbol MATCH %0 " with parameter 0 = " 'IBM' "

whereas the following expression will not compile:

" symbol MATCH '%0' " with parameter 0 = " IBM "

Type compatability in Predicate

Only certain combination of type comparisons are valid in Predicate. The following table marked all the compatible pairs with 'YES':

	BOOLEANVALUE	INTEGERVALUE	FLOATVALUE	CHARVALUE	STRING	ENUMERATEDVALUE
BOOLEAN	YES
INTEGERVALUE		YES	YES
FLOATVALUE		YES	YES
CHARVALUE				YES	YES	YES
STRING				YES	YES(*1)	YES
ENUMERATEDVALUE		YES		YES(*2)	YES(*2)	YES(*3)

(*1) See SQL Extension: Regular Expression Matching
(*2) Because the formal notation of the Enumeration values, they are compatible with string and char literals, but they are not compatible with string or char variables, i.e., "MyEnum='EnumValue'" would be correct, but "MyEnum=MyString" is not allowed.
(*3) Only for same type Enums.

SQL Extension: Regular Expression Matching

The relational operator MATCH may only be used with string fields. The right-hand operator is a string pattern. A string pattern specifies a template that the left-hand field value must match. The characters ,/\?*[]-^!% have special meanings unless they are escaped by the escape character "\". MATCH is case-sensitive. The pattern allows limited "wild card" matching under the following rules: <TABLE> <TR> <TD>Character</TD> <TD>Meaning</TD></TR> <TR> <TD>,</TD> <TD>"," separates a list of alternate patterns. The field string is matched if it matches one or more of the patterns.</TD> </TR> <TR> <TD>/</TD> <TD>"/" in the pattern string matches a / in the field string. This character is used to separate a sequence of mandatory substrings.</TD> </TR> <TR> <TD>?</TD> <TD>"?" in the pattern string matches any single non-special characters in the field string.</TD> </TR> <TR> <TD>*</TD> <TD>"*" in the pattern string matches 0 or more non-special characters in field string.</TD> </TR> <TR> <TD>[charlist]</TD> <TD>Matches any one of the characters from the list of characters in charlist.</TD> </TR> <TR> <TD>[s-e]</TD> <TD>Matches any character any character from s to e, inclusive.</TD> </TR> <TR> <TD>%</TD> <TD>"%" is used to designate filter expressions parameters.</TD> </TR> <TR> <TD>[!charlist] or [^charlist]</TD> <TD>Matches any characters not in charlist (not supported).</TD> </TR> <TR> <TD>[!s-e] or [^s-e]</TD> <TD>Matches any characters not in the interval [s-e] (not supported).</TD> </TR> <TR> <TD>\\</TD> <TD>Escape character for special characters.</TD> </TR> </TABLE> The syntax is similar to the POSIX fnmatch syntax (1003.2-1992 section B.6). The MATCH syntax is also similar to the 'subject' strings of TIBCO Rendezvous. Note: To use special characters as regular characters in regular expressions, you must escape them using the character "". For example, 'A[' is considered a malformed expression and the result is undefined.

Character Encoding

The default encoding for IDL strings is UTF-8. RTI Connext offers ISO 8859-1 as an alternative encoding for IDL strings.

In order to configure ISO 8859-1 as the encoding for filtering of IDL strings, you can set the DomainParticipant property dds.domain_participant.filtering_character_encoding to ISO-8859:

The possible values for dds.domain_participant.filtering_character_encoding are:

UTF-8 (default value)
ISO-8859-1

Unicode Normalization

Unicode supports multiple ways to encode some characters, most notably accented characters. A composed character in Unicode can often have a number of different ways of representing the character. For example:

Precomposed Ḽ is represented by \u1e3c
Composed Ḽ = L + ^ is represented by \u004c + \u032d

The lexical comparison of the two characters above will return false. In order to do the correct comparison the characters need to be normalized, that is, reduced to the same character composition.

When the character encoding for filtering of IDL strings is UTF-8, the Unicode normalization behavior can be controlled using a DomainParticipant property called dds.domain_participant.filtering_unicode_normalization.

The possible values of the normalization property are:

OFF: Disables normalization
NFD: Canonical Decomposition
NFC (default value): Canonical Decomposition, followed by Canonical Composition
NFK: Compatibility Decomposition, followed by Canonical Composition
NFKC_Casefold: Casefold followed by NFKC normalization

Because normalization may affect performance, and it is enabled by default, the property allows disabling the normalization process per DomainParticipant using the value OFF. However, you should be aware that doing this may lead to unexpected behavior.

Examples

Assuming Topic "Location" has as an associated type a structure with fields "flight_id, x, y, z", and Topic "FlightPlan" has as fields "flight_id, source, destination". The following are examples of using these expressions.

Example of a filter_expression (for com.rti.dds.topic.ContentFilteredTopic) or a query_expression (for com.rti.dds.subscription.QueryCondition):

```
 "z < 1000 AND x < 23" 
```

Examples of a filter_expression using MATCH (for com.rti.dds.topic.ContentFilteredTopic) operator:

"symbol MATCH 'NASDAQ/GOOG'"
"symbol MATCH 'NASDAQ/[A-M]*'"