8

As we're relying on RDS Postgresql manual backup for our backup strategy, we encountered the issue with the possible downtime of the RDS instance (single AZ) during snapshot creation. According to AWS:

Creating this DB snapshot on a Single-AZ DB instance results in a brief I/O suspension that can last from a few seconds to a few minutes, depending on the size and class of your DB instance.

which is not really clear how we can be sure if the DB instance I/O is functioning normally during snapshotting period, as if the DB is down for a short period we'd like to stop our corresponding web server or take it out of the load balancer to ensure no connection interruption could happen from customer side.

What made us quite wondering are:

  • Does the DB really have downtime during snapshotting, AWS just says about "I/O suspension" and "latencies"? I read somewhere that the downtime lasts for short period (from few seconds to minute) just during snapshot initialization, can we know if that downtime has passed and the DB instance is ready to serve (while its snapshot still being created)?

  • What is general best practice to deal with these IO suspensions? As seems it happens even with automated backup, does it mean the site could possibly have a downtime everyday when DB snapshot creation is in progress?

Arcobaleno
  • 191
  • 1
  • 3
  • 2
    In the MySQL variant of RDS, I find the "brief" suspension is so brief as to be undetectable. I would expect the impact on Postgres to be similar. Have you actually experienced a disruption? – Michael - sqlbot Jul 05 '18 at 09:27
  • 1
    If you do notice a disruption the next sentence on [that page](https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/USER_CreateSnapshot.html) is key: "Multi-AZ DB instances are not affected by this I/O suspension since the backup is taken on the standby." – Tim Jul 05 '18 at 09:30

1 Answers1

9

The answer comes from understanding how snapshotting works.

At the start of a snapshot, a message (command) is sent to all applications to come to a consistent state and flush necessary data to disk.

How long this flush takes depends on how much data is in memory, what state the data is in, and how long it takes to write the data to disk.

Once each application that supports snapshotting completes its preparation for freezing, the snaphot process then snaps the file systems, which means that if any further data blocks are written to, a copy is made first for the backup process (COW - Copy on Write). Then the thaw (resume) message / command is sent to each application.

For a lightly used database this freeze / thaw process may take only a few hundred milliseconds. For a large database with GBs of memory that need to be flushed to disk, a number of seconds will be required.

During the time that the freeze / thaw cycle is occurring, disk I/O for new user requests is suspended. The database is still running but all requests will pause while the disks / file systems are synchronized. Everything resumes with receipt of the thaw message.

For Master-Slave databases, the master is not affected. The snapshot will be taken on a slave. This is one of the nice AWS RDS features.

John Hanley
  • 4,287
  • 1
  • 9
  • 20
  • What is an "application that supports snapshotting" refer to specifically for an AWS RDS instance? It's not clear how an application connecting to a database in an RDS instance would flush any data in the database to disk – flush any pending transactions maybe? Or are these 'applications' internal to AWS RDS? If it's the former, I'd imagine that's implemented per 'database engine' and not really specific to RDS per-se. – Kenny Evitt Oct 12 '21 at 21:57
  • @KennyEvitt - create a new question. – John Hanley Oct 12 '21 at 22:12
  • The questions in my previous comment are about your answer – it wouldn't be right to create a separate question to ask about an answer on another question. – Kenny Evitt Oct 12 '21 at 22:45
  • 2
    @KennyEvitt - the first part of my answer refers to how snapshots work. The application inside RDS that receives the snapshot notification is the database application (MySQL, PostgreSQL, etc.) and not an application outside of RDS. If you have more questions create a new question. – John Hanley Oct 12 '21 at 23:36