1

Hi I'm running a 5 node dse cassandra cluster. Every node is about 90 % disk usage ,so I've deleted data from my keyspace(I've only one keyspace).but my disk space is still 90 % .Is there anyway to regain disk space of deleted data ??

Sachin PK
  • 83
  • 1
  • 2
  • 6
  • You shouldn't go over 50% disk usage, since some streaming operations including compactions will re-write potentially all of the data. Of course this all depends on your particular setup and compaction strategy. You can try to do a compaction on the smaller keyspaces. Check out [this StackOverflow question](http://stackoverflow.com/questions/30743626/cassandra-node-almost-out-of-space-but-nodetool-cleanup-is-increasing-disk-use) – LHWizard May 27 '16 at 15:23

2 Answers2

4

Warning: If you have any legitimate snapshots of keyspaces, this will clear those out as well (so you'll want to back those up).

Go on each of the nodes and call in a terminal:

nodetool clearsnapshot

After deleting a keyspace, Cassandra still keeps the data around until it's called to clear it out explicitly.

http://docs.datastax.com/en/cassandra/2.1/cassandra/tools/toolsClearSnapShot.html

patrickceg
  • 41
  • 4
  • This is true by default, although you can turn off that feature by editing the Cassandra settings (`cassandra.yaml`) and setting `autosnapshot: false`. I personally always do that because the replication is what ensures my data is safe. If I delete by accident on a production server, I probably need heavy medical attention... – Alexis Wilke Dec 29 '18 at 01:30
3

Think he said deleted data from his one and only keyspace, and thus now got tombstones, try to see here beware about doing manual/major compaction and possible needed free head room on disk. If you've filled your disk too much to let compaction handle this, maybe remove one node, wipe it, and bootstrap it again and do this node by node around your cluster. Consider always leaving head room to do compaction and handle node failures, ie. don't fill your nodes too much (maybe less than 50-75% disk usage).

  • 2
    You may want to clearly specify that after the wipe + re-add you need to wait for the node to be 100% up to date... You wouldn't want someone to wipe out their other nodes too soon and lose data! It also supposes that you have a replication factor of 2 or more (also I would hope people never use less than 3 as a replication factor!) – Alexis Wilke Dec 29 '18 at 01:28