For the sake of understanding I setup a 4 node cluster using the latest/greatest(released) version of Cassandra. The four nodes were brought up in sequence using almost entirely default settings and seem to be communicating properly.
I then created a schema as follows:
CREATE KEYSPACE first WITH replication = {
'class': 'SimpleStrategy',
'replication_factor': '1'
};
Create a simple table with 5 columns and added ~100K rows of data. All well and good. Data is available from every client so I'm thinking it's evenly spread about.
So I'm looking into a backup strategy and starting to mess about with snapshots and so forth. After running nodetool snapshot
on each machine I want to know what it created. I go to the first machine and look in /var/lib/cassandra/data/first and see that it's empty. Hmm.. second machine.. same thing.. third.. finally on the 4th machine I see files in the data folder and a snapshot directory.
Running nodetool ring
shows that each system owns roughly 25% but the load is heavily biased towards the one system that (seems to have) ended up with all the data.
Is all the data truly on this one machine? What step did I miss in the configuration?