Pages

Friday, August 22, 2014

CQL Syntax cassandra - Cassandra Query Language

Preamble

This document describes the Cassandra Query Language (CQL) version 3. CQL v3 is not backward compatible with CQL v2 and differs from it in numerous ways. Note that this document describes the last version of the languages. However, the changes section provides the diff between the different versions of CQL v3.
CQL v3 offers a model very close to SQL in the sense that data is put in tables containing rows of columns. For that reason, when used in this document, these terms (tables, rows and columns) have the same definition than they have in SQL. But please note that as such, they do not refer to the concept of rows and columns found in the internal implementation of Cassandra and in the thrift and CQL v2 API.

Conventions

To aid in specifying the CQL syntax, we will use the following conventions in this document:
  • Language rules will be given in a BNF -like notation:
<start> ::= TERMINAL <non-terminal1> <non-terminal1>
  • Nonterminal symbols will have <angle brackets>.
  • As additional shortcut notations to BNF, we’ll use traditional regular expression’s symbols (?+ and *) to signify that a given symbol is optional and/or can be repeated. We’ll also allow parentheses to group symbols and the [<characters>] notation to represent any one of <characters>.
  • The grammar is provided for documentation purposes and leave some minor details out. For instance, the last column definition in a CREATE TABLE statement is optional but supported if present even though the provided grammar in this document suggest it is not supported.
  • Sample code will be provided in a code block:
SELECT sample_usage FROM cql;
  • References to keywords or pieces of CQL code in running text will be shown in a fixed-width font.

Identifiers and keywords

The CQL language uses identifiers (or names) to identify tables, columns and other objects. An identifier is a token matching the regular expression [a-zA-Z0-9_]*.
A number of such identifiers, like SELECT or WITH, are keywords. They have a fixed meaning for the language and most are reserved. The list of those keywords can be found in Appendix A.
Identifiers and (unquoted) keywords are case insensitive. Thus SELECT is the same than select or sElEcT, and myId is the same than myid or MYID for instance. A convention often used (in particular by the samples of this documentation) is to use upper case for keywords and lower case for other identifiers.
There is a second kind of identifiers called quoted identifiers defined by enclosing an arbitrary sequence of characters in double-quotes("). Quoted identifiers are never keywords. Thus "select" is not a reserved keyword and can be used to refer to a column, while select would raise a parse error. Also, contrarily to unquoted identifiers and keywords, quoted identifiers are case sensitive ("My Quoted Id" is different from"my quoted id"). A fully lowercase quoted identifier that matches [a-zA-Z0-9_]* is equivalent to the unquoted identifier obtained by removing the double-quote (so "myid" is equivalent to myid and to myId but different from "myId"). Inside a quoted identifier, the double-quote character can be repeated to escape it, so "foo "" bar" is a valid identifier.

Constants

CQL defines the following kind of constants: strings, integers, floats, booleans, uuids and blobs:
  • A string constant is an arbitrary sequence of characters characters enclosed by single-quote('). One can include a single-quote in a string by repeating it, e.g. 'It''s raining today'. Those are not to be confused with quoted identifiers that use double-quotes.
  • An integer constant is defined by '-'?[0-9]+.
  • A float constant is defined by '-'?[0-9]+('.'[0-9]*)?([eE][+-]?[0-9+])?. On top of that, NaN and Infinity are also float constants.
  • A boolean constant is either true or false up to case-insensitivity (i.e. True is a valid boolean constant).
  • UUID constant is defined by hex{8}-hex{4}-hex{4}-hex{4}-hex{12} where hex is an hexadecimal character, e.g. [0-9a-fA-F] and {4} is the number of such characters.
  • A blob constant is an hexadecimal number defined by 0[xX](hex)+ where hex is an hexadecimal character, e.g. [0-9a-fA-F].
For how these constants are typed, see the data types section.

Comments

A comment in CQL is a line beginning by either double dashes (--) or double slash (//).
Multi-line comments are also supported through enclosure within /* and */ (but nesting is not supported).
-- This is a comment
// This is a comment too
/* This is
   a multi-line comment */

Statements

CQL consists of statements. As in SQL, these statements can be divided in 3 categories:
  • Data definition statements, that allow to set and change the way data is stored.
  • Data manipulation statements, that allow to change data
  • Queries, to look up data
All statements end with a semicolon (;) but that semicolon can be omitted when dealing with a single statement. The supported statements are described in the following sections. When describing the grammar of said statements, we will reuse the non-terminal symbols defined below:
<identifier> ::= any quoted or unquoted identifier, excluding reserved keywords
 <tablename> ::= (<identifier> '.')? <identifier>

    <string> ::= a string constant
   <integer> ::= an integer constant
     <float> ::= a float constant
    <number> ::= <integer> | <float>
      <uuid> ::= a uuid constant
   <boolean> ::= a boolean constant
       <hex> ::= a blob constant

  <constant> ::= <string>
               | <number>
               | <uuid>
               | <boolean>
               | <hex>
  <variable> ::= '?'
               | ':' <identifier>
      <term> ::= <constant>
               | <collection-literal>
               | <variable>
               | <function> '(' (<term> (',' <term>)*)? ')'

  <collection-literal> ::= <map-literal>
                         | <set-literal>
                         | <list-literal>
         <map-literal> ::= '{' ( <term> ':' <term> ( ',' <term> ':' <term> )* )? '}'
         <set-literal> ::= '{' ( <term> ( ',' <term> )* )? '}'
        <list-literal> ::= '[' ( <term> ( ',' <term> )* )? ']'

    <function> ::= <ident>

  <properties> ::= <property> (AND <property>)*
    <property> ::= <identifier> '=' ( <identifier> | <constant> | <map-literal> )

Please note that not every possible productions of the grammar above will be valid in practice. Most notably, <variable> and nested <collection-literal> are currently not allowed inside <collection-literal>.
<variable> can be either anonymous (a question mark (?)) or named (an identifier preceded by :). Both declare a bind variables for prepared statements. The only difference between an anymous and a named variable is that a named one will be easier to refer to (how exactly depends on the client driver used).
The <properties> production is use by statement that create and alter keyspaces and tables. Each <property> is either a simple one, in which case it just has a value, or a map one, in which case it’s value is a map grouping sub-options. The following will refer to one or the other as the kind (simple or map) of the property.
<tablename> will be used to identify a table. This is an identifier representing the table name that can be preceded by a keyspace name. The keyspace name, if provided, allow to identify a table in another keyspace than the currently active one (the currently active keyspace is set through the USE statement).
For supported <function>, see the section on functions.

Prepared Statement

CQL supports prepared statements. Prepared statement is an optimization that allows to parse a query only once but execute it multiple times with different concrete values.
In a statement, each time a column value is expected (in the data manipulation and query statements), a <variable> (see above) can be used instead. A statement with bind variables must then be prepared. Once it has been prepared, it can executed by providing concrete values for the bind variables. The exact procedure to prepare a statement and execute a prepared statement depends on the CQL driver used and is beyond the scope of this document.

No comments:

Post a Comment