Pages

Friday, August 22, 2014

Data Manipulation - Cassandra Query Language

INSERT

Syntax:
<insertStatement> ::= INSERT INTO <tablename>
                             '(' <identifier> ( ',' <identifier> )* ')'
                      VALUES '(' <term-or-literal> ( ',' <term-or-literal> )* ')'
                      ( IF NOT EXISTS )?
                      ( USING <option> ( AND <option> )* )?

<term-or-literal> ::= <term>
                    | <collection-literal>

<option> ::= TIMESTAMP <integer>
           | TTL <integer>

Sample:
INSERT INTO NerdMovies (movie, director, main_actor, year)
                VALUES ('Serenity', 'Joss Whedon', 'Nathan Fillion', 2005)
USING TTL 86400;
The INSERT statement writes one or more columns for a given row in a table. Note that since a row is identified by its PRIMARY KEY, at least the columns composing it must be specified.
Note that unlike in SQL, INSERT does not check the prior existence of the row by default: the row is created if none existed before, and updated otherwise. Furthermore, there is no mean to know which of creation or update happened.
It is however possible to use the IF NOT EXISTS condition to only insert if the row does not exist prior to the insertion. But please note that using IF NOT EXISTS will incur a non negligible performance cost (internally, Paxos will be used) so this should be used sparingly.
All updates for an INSERT are applied atomically and in isolation.
Please refer to the UPDATE section for information on the <option> available and to the collections section for use of <collection-literal>. Also note that INSERT does not support counters, while UPDATE does.

UPDATE

Syntax:
<update-stmt> ::= UPDATE <tablename>
                  ( USING <option> ( AND <option> )* )?
                  SET <assignment> ( ',' <assignment> )*
                  WHERE <where-clause>
                  ( IF <condition> ( AND condition )* )?

<assignment> ::= <identifier> '=' <term>
               | <identifier> '=' <identifier> ('+' | '-') (<int-term> | <set-literal> | <list-literal>)
               | <identifier> '=' <identifier> '+' <map-literal>
               | <identifier> '[' <term> ']' '=' <term>

<condition> ::= <identifier> '=' <term>
              | <identifier> '[' <term> ']' '=' <term>

<where-clause> ::= <relation> ( AND <relation> )*

<relation> ::= <identifier> '=' <term>
             | <identifier> IN '(' ( <term> ( ',' <term> )* )? ')'
             | <identifier> IN '?'

<option> ::= TIMESTAMP <integer>
           | TTL <integer>

Sample:
UPDATE NerdMovies USING TTL 400
SET director = 'Joss Whedon',
    main_actor = 'Nathan Fillion',
    year = 2005
WHERE movie = 'Serenity';

UPDATE UserActions SET total = total + 2 WHERE user = B70DE1D0-9908-4AE3-BE34-5573E5B09F14 AND action = 'click';

The UPDATE statement writes one or more columns for a given row in a table. The <where-clause> is used to select the row to update and must include all columns composing the PRIMARY KEY (the IN relation is only supported for the last column of the partition key). Other columns values are specified through <assignment> after the SET keyword.
Note that unlike in SQL, UPDATE does not check the prior existence of the row by default: the row is created if none existed before, and updated otherwise. Furthermore, there is no mean to know which of creation or update happened.
It is however possible to use the conditions on some columns through IF, in which case the row will not be updated unless such condition are met. But please note that using IF conditions will incur a non negligible performance cost (internally, Paxos will be used) so this should be used sparingly.
In an UPDATE statement, all updates within the same partition key are applied atomically and in isolation.
The c = c + 3 form of <assignment> is used to increment/decrement counters. The identifier after the ‘=’ sign must be the same than the one before the ‘=’ sign (Only increment/decrement is supported on counters, not the assignment of a specific value).
The id = id + <collection-literal> and id[value1] = value2 forms of <assignment> are for collections. Please refer to the relevant section for more details.

<options>

The UPDATE and INSERT statements allows to specify the following options for the insertion:
  • TIMESTAMP: sets the timestamp for the operation. If not specified, the current time of the insertion (in microseconds) is used. This is usually a suitable default.
  • TTL: allows to specify an optional Time To Live (in seconds) for the inserted values. If set, the inserted values are automatically removed from the database after the specified time. Note that the TTL concerns the inserted values, not the column themselves. This means that any subsequent update of the column will also reset the TTL (to whatever TTL is specified in that update). By default, values never expire. A TTL of 0 or a negative one is equivalent to no TTL.

DELETE

Syntax:
<delete-stmt> ::= DELETE ( <selection> ( ',' <selection> )* )?
                  FROM <tablename>
                  ( USING TIMESTAMP <integer>)?
                  WHERE <where-clause>
                  ( IF ( EXISTS | ( <condition> ( AND <condition> )*) ) )?

<selection> ::= <identifier> ( '[' <term> ']' )?

<where-clause> ::= <relation> ( AND <relation> )*

<relation> ::= <identifier> '=' <term>
             | <identifier> IN '(' ( <term> ( ',' <term> )* )? ')'
             | <identifier> IN '?'

<condition> ::= <identifier> '=' <term>
              | <identifier> '[' <term> ']' '=' <term>

Sample:
DELETE FROM NerdMovies USING TIMESTAMP 1240003134 WHERE movie = 'Serenity';

DELETE phone FROM Users WHERE userid IN (C73DE1D3-AF08-40F3-B124-3FF3E5109F22, B70DE1D0-9908-4AE3-BE34-5573E5B09F14);

The DELETE statement deletes columns and rows. If column names are provided directly after the DELETE keyword, only those columns are deleted from the row indicated by the <where-clause> (the id[value]syntax in <selection> is for collection, please refer to the collection section for more details). Otherwise whole rows are removed. The <where-clause> allows to specify the key for the row(s) to delete (the INrelation is only supported for the last column of the partition key).
DELETE supports the TIMESTAMP options with the same semantic that in the UPDATE statement.
In a DELETE statement, all deletions within the same partition key are applied atomically and in isolation.
DELETE operation application can be conditioned using IF like for UPDATE and INSERT. But please not that as for the later, this will incur a non negligible performance cost (internally, Paxos will be used) and so should be used sparingly.

BATCH

Syntax:
<batch-stmt> ::= BEGIN ( UNLOGGED | COUNTER ) BATCH
                 ( USING <option> ( AND <option> )* )?
                    <modification-stmt> ( ';' <modification-stmt> )*
                 APPLY BATCH

<modification-stmt> ::= <insert-stmt>
                      | <update-stmt>
                      | <delete-stmt>

<option> ::= TIMESTAMP <integer>

Sample:
BEGIN BATCH
  INSERT INTO users (userid, password, name) VALUES ('user2', 'ch@ngem3b', 'second user');
  UPDATE users SET password = 'ps22dhds' WHERE userid = 'user3';
  INSERT INTO users (userid, password) VALUES ('user4', 'ch@ngem3c');
  DELETE name FROM users WHERE userid = 'user1';
APPLY BATCH;
The BATCH statement group multiple modification statements (insertions/updates and deletions) into a single statement. It serves several purposes:
  1. It saves network round-trips between the client and the server (and sometimes between the server coordinator and the replicas) when batching multiple updates.
  2. All updates in a BATCH belonging to a given partition key are performed in isolation.
  3. By default, all operations in the batch are performed atomically. See the notes on UNLOGGED for more details.
Note that:
  • BATCH statements may only contain UPDATEINSERT and DELETE statements.
  • Batches are not a full analogue for SQL transactions.
  • If a timestamp is not specified for each operation, then all operations will be applied with the same timestamp. Due to Cassandra’s conflict resolution procedure in the case of timestamp ties, operations may be applied in an order that is different from the order they are listed in the BATCH statement. To force a particular operation ordering, you must specify per-operation timestamps.

UNLOGGED

By default, Cassandra uses a batch log to ensure all operations in a batch are applied atomically. (Note that the operations are still only isolated within a single partition.)
There is a performance penalty for batch atomicity when a batch spans multiple partitions. If you do not want to incur this penalty, you can tell Cassandra to skip the batchlog with the UNLOGGED option. If theUNLOGGED option is used, operations are only atomic within a single partition.

COUNTER

Use the COUNTER option for batched counter updates. Unlike other updates in Cassandra, counter updates are not idempotent.

<option>

BATCH supports both the TIMESTAMP option, with similar semantic to the one described in the UPDATE statement (the timestamp applies to all the statement inside the batch). However, if used, TIMESTAMP must not be used in the statements within the batch.

No comments:

Post a Comment