2

I'm not new to ZFS. I've been using it a couple of years but I've only just started making snapshots and I've created a cronjob and script to take a daily snapshot of a few of my datasets, which all reside under the same pool.

I was looking for a quick way to list my snapshots in another cron script so that I could destroy the oldest one. The goal is to have a continuously rolling set of 7 snapshots: so I can roll back to any day out of the last 7 days.

I have my script running and I currently only have 1 set of snapshots (today). I found a very cool answer, here by https://serverfault.com/users/15810/AaronLS for finding the oldest snapshot and destroying it. I plan to add this into my bash script, so that it takes a snapshot each day and immediately afterwards it finds the oldest one and destroys it.

The answer by AaronLS is in this question: How to delete all but last [n] ZFS snapshots?

My question is one of ignorance, really. I thought that the first snapshot you made from any given dataset was a complete image of that dataset, with any further snapshots being based on that first snapshot and any more recent snapshots were merely a record of the changes since the first snapshot.

So if I delete the very oldest snapshot, does ZFS then have to alter the second-oldest snapshot in order to have a complete "first snapshot" image again?

Have I explained myself correctly? Surely if you delete the original snapshot then the next oldest now becomes the "orginal" snapshot and would need some data juggling around in it, in order to become a full dataset image?

Could someone explain it to me in laymans terms why my assumptions are wrong? I feel that I kind of understand snapshots but I'm just lacking confidence. I was going to try running my script tonight, on a 10 minute crontab, so that I could simulate a week's worth of snapshots in 70 minutes but I'm not confident that I understand the maths before I go ahead.

I apologize for the bad formatting, too. If I knew how to quote a username and link to another article properly I will edit this post to make it neater. I've not posted on here for a long while.

Thanks.

bitofagoob
  • 133
  • 6

2 Answers2

1

If I'm not mistaken, when you take a snapshot, nothing happens, it's just a timestamp. Then your data starts changing. Before every change, the original data is copied to the snapshot.

Gerard H. Pille
  • 2,469
  • 1
  • 12
  • 10
  • Yep. That's kind of what I mean. The very first snapshot is an 'image' of the complete filesystem. The next snapshot you do would be identical to the first if there were no changes in between. Let's say you created /home/goob/myfile662 one day. The only difference in the next snapshot would be the addition of "myfile662" (ignoring OS log files etc). So, you create a series of 7 snapshots and then delete the one from 7 days ago. But where has your "baseline" gone to? You would then have no starting point which you would need to perform a roll back. Do you see where my confusion lies now? – bitofagoob Jul 07 '20 at 20:25
  • 1
    @bitofagoob Your "baseline" is the current state of your dataset, not any snapshot. A snapshot only contains the old data that you overwrote after making the snapshot. – Michael Hampton Jul 07 '20 at 20:29
  • AHHH! Yo La Tengo! So any individual snapshot could be deleted, regardless of the order they re deleted? And obviously, the older the snapshot, the more likely there is to be more changes, therefore the older the snapshot the larger it is? Does this mean that a snapshot you left for a long time would continue to slowly increase in size? – bitofagoob Jul 07 '20 at 20:43
  • 1
    Indeed it would. Now there are systems where snapshots rely on each other, and when you delete one of those (probably you can only delete the oldest one), the more recent one has to undergo some sort of consolidation, which takes quite some time. Probable not the way ZFS snapshots work. – Gerard H. Pille Jul 08 '20 at 08:35
  • That's a great word to describe it: consolidation. I had been thinking that the 2nd oldest would have to undergo some kind of consolidation process. But I have played around with ZFS snapshots some more and destroyed a bunch I made in a random order (the deletes were in a random order) and I'm pretty sure that any one ZFS snapshot doesn't rely on another in order to rescue a partition. – bitofagoob Jul 09 '20 at 23:43
  • 1
    Do not forget to try a rescue operation. The proof of the pudding, eh? – Gerard H. Pille Jul 10 '20 at 08:10
  • Yes. I had to do a rescue a couple of days ago. Well, I thought I needed a rescue, as I thought I'd wiped a dataset (~190GB of data) The rescue worked and it was very fast (SSD array) and I realised afterwards that I'd done something stupid with a "mount --bind" command. I can't remember the exact error I'd typed but I think I'd mounted a directory over the top of the ZFS mountpoint. I rewound back to a snapshot and rebooted and my data was there (albeit, data that was from a couple of days ago). The dataset in question was full of old comedy mp3's, so I didn't really lose any changes. :-) – bitofagoob Jul 10 '20 at 22:57
  • 1
    Now look what a mess you've gotten us into, eh? – Gerard H. Pille Jul 11 '20 at 02:41
1

Each snapshot is a compete index of the data required to represent the file system at a point in time. Apart from the shared data space, the snapshots are totally independent of each other.

You are thinking in terms of deltas, but zfs snapshots are not deltas. They don't contain any of the actual data so there's no benefit in doing it that way.

  • `So if I delete the very oldest snapshot, does ZFS then have to alter the second-oldest snapshot in order to have a complete "first snapshot" image again?` your answer does not solve this question, it only tells what is already known. – djdomi Jul 21 '21 at 04:26
  • no it doesnt, i rephrased my answer – Scott Schlechtleitner Jul 22 '21 at 00:34