100 Questions About Cassandra

Fast facts about Cassandra

  • A Columnar based fault tolerant NoSQL database
  • An AP system (Sacrifice Consistency for Available and Partition)
  • Easy to scale horizontally (No master)
  • No join or subquery for aggregation

Q & A

What is Cassandra’s Replication Strategy?

Replication strategies define the technique how the replicas are placed in a cluster.
There are mainly two types of Replication Strategy:

  • Simple strategy: For single data center
  • Network Topology Strategy: For multi-datacenter

What is Cassandra Consistency Level?

The minimum number of Cassandra nodes that must acknowledge a read or write operation before the operation can be considered successful

  • Write Consistency: ALL, ANY, ONE, EACH_QUORUM, LOCAL_ONE, LOCAL_QUORUM
  • Read Consistency: ALL, ONE, TWO, THREE, QUORUM, LOCAL_ONE, LOCAL_QUORUM
1
quorum = (sum_of_replication_factors / 2) + 1

https://teddyma.gitbooks.io/learncassandra/content/replication/turnable_consistency.html

What is Cassandra’s compaction strategy?

To improve read performance as well as to utilize disk space, Cassandra periodically does compaction to create & use new consolidated SSTable files instead of multiple old SSTables.

  • SizeTieredCompactionStrategy: for write-intensive workloads
  • LeveledCompactionStrategy: read-intensive workloads

Partition key and clustering key

Partition key is similar to primary key in relational databases, it decides which node to store the record.

Clustering key is responsible for sorting data within a partition

Compound key

Compound key are partition keys with multiple columns, but only the first column is considered as partition key and the rest are clustering keys.

1
PRIMARY KEY (p1, c1, c2, c3)

Composite key

Composite keys are partition keys that consist of multiple columns.
But when you do query, you will need to include all partition keys.

1
PRIMARY KEY ((p1, p2), c1, c2)

What are some of Cassandra’s limitations?

  • A single column value is recommended to <= 1 Mb (max is 2Gb)
  • Number of rows within a partition is better to below 100,000 items and disk size under 100 Mb

How does Cassandra use bloom filters?

Cassandra uses bloom filters to check if a partition key exists in any of the SSTables or not, without actually having to read their contents.

Each SSTable has a bloom filter, bloom filter will be updated when a memtable is flushed to disk.

What are seed node in Cassandra cluster setup?

Seeds are used during startup to discover the cluster. Seeds are also referred by new nodes on bootstrap to learn other nodes in ring. When you add a new node to ring, you need to specify at least one live seed to contact. Once a node join the ring, it learns about the other nodes, so it doesn’t need seed on subsequent boot.

How to add a new node to a single datacenter cluster?

  • Calculate tokens for the new node, below is the script to generate tokens with Murmur3Partitioner.
1
python -c "print [str(((2**64 / number_of_tokens) * i) - 2**63) for i in range(number_of_tokens)]"
  • Install Cassandra on the node with proper cassandra.yaml

  • Use nodetool move to assign new token for it.

  • Use nodetool cleanup to remove keys that no longer belong to the previously existing nodes.

https://docs.datastax.com/en/archived/cassandra/3.0/cassandra/operations/opsAddRplSingleTokenNodes.html