Preamble
This document describes the Cassandra Query Language (CQL) version 3. CQL v3 is not backward compatible with CQL v2 and differs from it in numerous ways. Note that this document describes the last version of the languages. However, the changes section provides the diff between the different versions of CQL v3.CQL v3 offers a model very close to SQL in the sense that data is put in tables containing rows of columns. For that reason, when used in this document, these terms (tables, rows and columns) have the same definition than they have in SQL. But please note that as such, they do not refer to the concept of rows and columns found in the internal implementation of Cassandra and in the thrift and CQL v2 API.
Conventions
To aid in specifying the CQL syntax, we will use the following conventions in this document:- Language rules will be given in a BNF -like notation:
<start> ::= TERMINAL <non-terminal1> <non-terminal1>
- Nonterminal symbols will have
<angle brackets>
. - As additional shortcut notations to BNF, we’ll use traditional regular expression’s symbols (
?
,+
and*
) to signify that a given symbol is optional and/or can be repeated. We’ll also allow parentheses to group symbols and the[<characters>]
notation to represent any one of<characters>
. - The grammar is provided for documentation purposes and leave some minor details out. For instance, the last column definition in a
CREATE TABLE
statement is optional but supported if present even though the provided grammar in this document suggest it is not supported. - Sample code will be provided in a code block:
SELECT sample_usage FROM cql;
- References to keywords or pieces of CQL code in running text will be shown in a
fixed-width font
.
Identifiers and keywords
The CQL language uses identifiers (or names) to identify tables, columns and other objects. An identifier is a token matching the regular expression[a-zA-Z0-9_]
*
.A number of such identifiers, like
SELECT
or WITH
, are keywords. They have a fixed meaning for the language and most are reserved. The list of those keywords can be found in Appendix A.Identifiers and (unquoted) keywords are case insensitive. Thus
SELECT
is the same than select
or sElEcT
, and myId
is the same than myid
or MYID
for instance. A convention often used (in particular by the samples of this documentation) is to use upper case for keywords and lower case for other identifiers.There is a second kind of identifiers called quoted identifiers defined by enclosing an arbitrary sequence of characters in double-quotes(
"
). Quoted identifiers are never keywords. Thus "select"
is not a reserved keyword and can be used to refer to a column, while select
would raise a parse error. Also, contrarily to unquoted identifiers and keywords, quoted identifiers are case sensitive ("My Quoted Id"
is different from"my quoted id"
). A fully lowercase quoted identifier that matches [a-zA-Z0-9_]
*
is equivalent to the unquoted identifier obtained by removing the double-quote (so "myid"
is equivalent to myid
and to myId
but different from "myId"
). Inside a quoted identifier, the double-quote character can be repeated to escape it, so "foo "" bar"
is a valid identifier.Constants
CQL defines the following kind of constants: strings, integers, floats, booleans, uuids and blobs:- A string constant is an arbitrary sequence of characters characters enclosed by single-quote(
'
). One can include a single-quote in a string by repeating it, e.g.'It''s raining today'
. Those are not to be confused with quoted identifiers that use double-quotes. - An integer constant is defined by
'-'?[0-9]+
. - A float constant is defined by
'-'?[0-9]+('.'[0-9]*)?([eE][+-]?[0-9+])?
. On top of that,NaN
andInfinity
are also float constants. - A boolean constant is either
true
orfalse
up to case-insensitivity (i.e.True
is a valid boolean constant). - A UUID constant is defined by
hex{8}-hex{4}-hex{4}-hex{4}-hex{12}
wherehex
is an hexadecimal character, e.g.[0-9a-fA-F]
and{4}
is the number of such characters. - A blob constant is an hexadecimal number defined by
0[xX](hex)+
wherehex
is an hexadecimal character, e.g.[0-9a-fA-F]
.
Comments
A comment in CQL is a line beginning by either double dashes (--
) or double slash (//
).Multi-line comments are also supported through enclosure within
/*
and */
(but nesting is not supported).-- This is a comment // This is a comment too /* This is a multi-line comment */
Statements
CQL consists of statements. As in SQL, these statements can be divided in 3 categories:- Data definition statements, that allow to set and change the way data is stored.
- Data manipulation statements, that allow to change data
- Queries, to look up data
;
) but that semicolon can be omitted when dealing with a single statement. The supported statements are described in the following sections. When describing the grammar of said statements, we will reuse the non-terminal symbols defined below:<identifier> ::= any quoted or unquoted identifier, excluding reserved keywords <tablename> ::= (<identifier> '.')? <identifier> <string> ::= a string constant <integer> ::= an integer constant <float> ::= a float constant <number> ::= <integer> | <float> <uuid> ::= a uuid constant <boolean> ::= a boolean constant <hex> ::= a blob constant <constant> ::= <string> | <number> | <uuid> | <boolean> | <hex> <variable> ::= '?' | ':' <identifier> <term> ::= <constant> | <collection-literal> | <variable> | <function> '(' (<term> (',' <term>)*)? ')' <collection-literal> ::= <map-literal> | <set-literal> | <list-literal> <map-literal> ::= '{' ( <term> ':' <term> ( ',' <term> ':' <term> )* )? '}' <set-literal> ::= '{' ( <term> ( ',' <term> )* )? '}' <list-literal> ::= '[' ( <term> ( ',' <term> )* )? ']' <function> ::= <ident> <properties> ::= <property> (AND <property>)* <property> ::= <identifier> '=' ( <identifier> | <constant> | <map-literal> )
Please note that not every possible productions of the grammar above will be valid in practice. Most notably,
<variable>
and nested <collection-literal>
are currently not allowed inside <collection-literal>
.A
<variable>
can be either anonymous (a question mark (?
)) or named (an identifier preceded by :
). Both declare a bind variables for prepared statements. The only difference between an anymous and a named variable is that a named one will be easier to refer to (how exactly depends on the client driver used).The
<properties>
production is use by statement that create and alter keyspaces and tables. Each <property>
is either a simple one, in which case it just has a value, or a map one, in which case it’s value is a map grouping sub-options. The following will refer to one or the other as the kind (simple or map) of the property.A
<tablename>
will be used to identify a table. This is an identifier representing the table name that can be preceded by a keyspace name. The keyspace name, if provided, allow to identify a table in another keyspace than the currently active one (the currently active keyspace is set through the USE statement).For supported
<function>
, see the section on functions.Prepared Statement
CQL supports prepared statements. Prepared statement is an optimization that allows to parse a query only once but execute it multiple times with different concrete values.In a statement, each time a column value is expected (in the data manipulation and query statements), a
<variable>
(see above) can be used instead. A statement with bind variables must then be prepared. Once it has been prepared, it can executed by providing concrete values for the bind variables. The exact procedure to prepare a statement and execute a prepared statement depends on the CQL driver used and is beyond the scope of this document.
No comments:
Post a Comment