When experimenting with Cassandra I've observed that Cassandra writes to the following files:
/.../cassandra/commitlog/CommitLog-<id>.log
/.../cassandra/data/Keyspace1/Standard1-1-Data.db
/.../cassandra/data/Keyspace1/Standard1-1-Filter.db
/.../cassandra/data/Keyspace1/Standard1-1-Index.db
/.../cassandra/data/system/LocationInfo-1-Data.db
/.../cassandra/data/system/LocationInfo-1-Filter.db
/.../cassandra/data/system/LocationInfo-1-Index.db
/.../cassandra/data/system/LocationInfo-2-Data.db
/.../cassandra/data/system/LocationInfo-2-Filter.db
/.../cassandra/data/system/LocationInfo-2-Index.db
/.../cassandra/data/system/LocationInfo-3-Data.db
/.../cassandra/data/system/LocationInfo-3-Filter.db
/.../cassandra/data/system/LocationInfo-3-Index.db
/.../cassandra/system.log
The general structure seems to be:
/.../cassandra/commitlog/CommitLog-ID.log
/.../cassandra/data/KEYSPACE/COLUMN_FAMILY-N-Data.db
/.../cassandra/data/KEYSPACE/COLUMN_FAMILY-N-Filter.db
/.../cassandra/data/KEYSPACE/COLUMN_FAMILY-N-Index.db
/.../cassandra/system.log
What is the Cassandra file structure? More specifically, how are the data
, commitlog
directories used, and what is the structure of the files in the data
directory (Data
/Filter
/Index
)?
A write to a Cassandra node first hits the CommitLog (sequential). (Then Cassandra stores values to column-family specific, in-memory data structures called Memtables. The Memtables are flushed to disk whenever one of the configurable thresholds is exceeded. (1, datasize in memtable. 2, # of objects reach certain limit, 3, lifetime of a memtable expires.))
The data folder contains a subfolder for each keyspace. Each subfolder contains three kind of files:
Cassandra File Format in detail
Each ColumnFamily(Eg. object) in separated sstable files