1

We're trying to verify the state of replication in our cassandra cluster. My colleague has found that only a small number of sstable files exist on multiple nodes. The others are all unique.

To me, this makes sense. As I understand it, each node should be responsible for a unique set of ranges, and should have sstables that reflect those ranges. But now I'm not sure.

Should we find at least n copies of each sstable with replication factor of n? Or are the copies of the sstables a result of the bootstrap, and haven't yet been compacted?

daxlerod
  • 223
  • 1
  • 6

1 Answers1

1

SSTable files are created when flush of memtable happen, and when the SSTables are compacted. Every node may have this happen at different times (plus other factors, like, short downtime, etc.).

To have all data correctly replicated you need to have repair process implemented - either by explicitly invoking the nodetool repair, or using some tools, like, DataStax's OpsCenter (only for DSE), or Reaper, (or something like).

Alex Ott
  • 316
  • 1
  • 5