GCP Cloud SQL postgres in bad state after maintenance

Question

There was what looks like a planned maintenance in our HA managed postgres cloud sql instance and it's in a bad state since.

Jan 7, 2018, 2:08:21 AM Update An unknown error occurred.

The failover instance did not work "by failing over", now even do not exist anymore, we can't restart or perform any other operation, this production database is totally unavailable

2018/01/08 09:41:24 couldn't connect to "ourprojectid:us-central1:instance name": googleapi: Error 409: The instance or operation is not in an appropriate state to handle the request., invalidState

We also tried to contact the support by directly sending email as suggested in similar posted issues.

https://stackoverflow.com/questions/42719547/cloud-sql-instances-are-not-starting-or-restarting-its-stuck

We start considering to create a new instance and restore from a backup, but I would expect some more resiliency from an HA managed instance and upon an schedule maintenance, this is being out for more than a day.

Thanks in advance

We finally recreated and restored from the last backup, happy to provide further details (project-id and instance name) for investigation... — snebel29, Jan 08 '18 at 16:28

score 1 · Answer 1 · answered Aug 10 '20 at 18:41

Firstly, please do not share your GCP project ID or CloudSQL Instance information on a Community thread as this. Reach out to GCP Support Engineers directly if you require such review on your CloudSQL instance

As the error suggests, It is either an Operation is stuck or the CloudSQL instance is stuck on the error. There are a bunch of reasons why this error may be obtained, which includes:

Trying to reuse an Instance name within a week after the Instance was deleted. Similar issue reported here
If an Operation is indeed stuck. This would require the GCP Support Engineers to stop the stuck operation.
The Instance for whatever reasons, including other Internal or underlying issue, may also become unhealthy or unavailable. GCP Engineers will also be able to help on this case.

Generally, recreating the CloudSQL instance and restoring backups, as you have rightly done, would be helpful to avoid dealing with the issue.

score 0 · Answer 2 · answered Aug 11 '20 at 19:48

It’s being a long time since this question was asked, I’ll update this thread with further details.

The project and instance ID were originally replaced with an arbitrary string to avoid exposing the real ones, but thanks for the advise.
The account had only community support which did not included direct access to support/engineers and only support available and recommended by GCP docs was stack overflow and server fault.
We finally managed to get an answer from the engineering team through direct messages which confirmed a known bug for what back in time was still a service in beta not covered by standard SLA. They fixed it and no further operation was required from our side.

Thanks

GCP Cloud SQL postgres in bad state after maintenance

2 Answers2