Questions tagged [disaster-recovery]

Disaster recovery and preparedness is an unfortunate aspect of systems administration. This tag should be used for help with planning, implementation and best-practices related to recovering from a catastrophic event on a server or in a datacenter environment.

Recovering from an unplanned, catastrophic outage is a painful process whether you are managing a single server or an entire datacenter. Roof leaks, broken water lines, power outages and any number of other events can take what was a great day and turn it into a living nightmare when you are responsible for keeping systems others rely on available.

The key to recovering from any disaster is preparedness. Knowing the steps required to bring the network and systems back online is critical. Before one can properly prepare for a disaster it is necessary to understand the risks, bottlenecks and other critical components of the overall system, e.g. who controls the power, internet, etc at your site. Understanding the aspects of disaster recovery that are within ones control is a very important aspect when planning; if there is not someone on staff who can fix the power, HVAC, etc make sure that the contact info for someone who can is written down somewhere. Having a large amount of information available before a disaster occurs will help to keep everyone calm, cool and on-task when something actually does happen.

Once a risks are assessed and a plan is created, print out physical copies, email it, and make sure everyone with admin level access to the systems/datacenter has read and is familiar with them. The best plan in the world is worthless if it is on a system that is down and cannot be easily restored without following the plan. After everyone is familiar with the plan, practice when possible; in many situations it may not be realistic, but if possible take advantage of planned downtimes or natural outages to go through the recovery plan and refine it.

In summary, when a disaster happens:

Don't Panic! Panic turns a debacle into a catastrophe every time.
Plan ahead, understand the risks, and know what is within your control
Follow the plan but be flexible, a recovery plan is more of a jazz tune than a military march
Stay calm and organized, use check lists, keep notes
If you are working in a team or group communicate and collaborate
Be vigilant, update your plan as the environment changes
Check your backups, make sure they happen at regular intervals and that the data contained therein is still good.

358 questions

votes

3 answers

Battery Backed Write Cache

I recently got some U server price quotes and some of them include BBWC: What exactly does it do? Is it just for RAID configurations? If there is a power malfunction, isn't the data loss inevitable? Are there any performance improvements from it…

raid storage disaster-recovery

asked Sep 14 '09 at 11:12

Dani

1,216
1
13
20

votes

4 answers

How do I backup my TRAC installations?

We use separate TRAC instances as our ticket system for many projects and need to have them moved off site several times a day for disaster recovery. What is the best way to make this happen? Is there something similar to svnsync for subversion?

backup disaster-recovery trac

asked May 08 '09 at 06:33

Mike Schall

votes

3 answers

How to actually use mysql slave as soon the master is failover or got burnt

I have MySQL master-slave replication that works fine; I googled the whole net and MySQL site to find the standard procedure to make use of the replication but found nothing. It is as if admins are happy to have replication on, but when the time…

mysql replication disaster-recovery

asked Aug 22 '11 at 07:42

Jawad Al Shaikh

votes

12 answers

What's the first thing you check when an untouched unix server starts going berserk?

So you have this neatly setup unix server and it's super fast and works swell and everything is great for months, and suddenly all kinds of weird errors start showing up for a variety of different services and none of them make a lot of sense on…

unix troubleshooting disaster-recovery debugging

asked May 18 '09 at 07:56

kch

4,472
3
19
17

votes

5 answers

High server availabilty for a small business

After having a bit of scare with a server that wouldn't come up one morning, the higher ups have decided that the business needs a high availability / fail over setup. We have 5 main servers (4x Linux, 1x OpenBSD) all of which need to be running for…

high-availability disaster-recovery small-business

asked Aug 25 '09 at 05:22

Matthew

votes

3 answers

Database accidentally deleted with a bash script

Edit: a follow-up question: Restore mongoDB by --repair and WiredTiger. My developer committed a huge mistake and we cannot find our Mongo database anywhere in the server. He logged into the server, and saved the following shell under…

filesystems shell ubuntu-14.04 data-recovery disaster-recovery

asked Mar 24 '19 at 11:36

SoftTimur

votes

3 answers

Backing up VirtualBox VMs

Does anyone have a good complete strategy for backing up a bunch of virtual machines running under VirtualBox? I intend to run a handful of virtual machines on a single hardware platform and back them up nightly to external disks, which will be…

backup virtualbox disaster-recovery virtual-machines snapshot

asked Jul 31 '09 at 16:05

James Green

votes

3 answers

Active Directory disaster recovery with DPM

I have a sort of catch-22 question here. Suppose I'm using Microsoft System Center Data Protection Manager (2010 or 2012, it works the same way) to backup, amongst various other things, my Active Directory environment (as in "the System State of my…

active-directory disaster-recovery restore scdpm system-state

asked Sep 06 '12 at 09:40

Massimo

68,714
56
196
319

votes

1 answer

Recover data from SCSI hard disk

We've got an old server with SCSI hard disk. The server crashed last week and it isn't exactly known what hardware component is damaged. Since the server is due to be retired anyway we don't want to repair it but just restore the data from the SCSI…

hard-drive disaster-recovery data-recovery scsi

asked Jul 30 '12 at 14:13

Tom

votes

1 answer

Recovery strategy for Master-Master replication

I have implemented a HA solution for mysql based on master-master replication. There is a mechanism on the front end part which guarantees that only one db will be read/written to at a given time (i.e. we only use replication for HA). I have…

mysql mysql-replication disaster-recovery master-master

asked Apr 09 '11 at 08:43

David Cournapeau

votes

1 answer

Does one failed drive + one single bad sector destroy an entire RAID 5?

During planning my RAID setup on a Synology Disk Station I've done a lot of reading about various RAID types, being this a great reading: RAID levels and the importance of URE (Unrecoverable Read Error). However, one thing remains unclear to…

raid data-recovery disaster-recovery

asked Jun 27 '21 at 13:27

adamsfamily

votes

1 answer

How do I configure a stretch cluster without shared storage between two sites?

I am trying to redesign our IT infrastructure and seeking help in implementing DR solution for our company. I see that as 2 data centers in active-passive mode with the data replication. Currently we have two Windows Servers 2016 at the primary…

storage disaster-recovery windows-server-2016

asked Feb 20 '17 at 16:05

katyn12

votes

2 answers

How to recover data from an Exchange 2013 database after a complete Active Directory loss?

Scenario: a single Exchange 2013 server in a Windows Server 2003 AD domain; one DC malfunctioned months ago and was dismissed (without proper demotion, no less); the other DC died yesterday and there are no available backups. Simply put, that AD is…

active-directory exchange exchange-2013 disaster-recovery

asked May 26 '15 at 20:14

Massimo

68,714
56
196
319

votes

2 answers

Hadoop HDFS Backup & DR Strategy

We are preparing to implement our first Hadoop cluster. As such we are starting out small with a four node setup. (1 master node, and 3 worker nodes) Each node will have 6TB of storage. (6 x 1TB disks) We went with a SuperMicro 4-node chassis so…

backup disaster-recovery hadoop hdfs

asked Aug 13 '13 at 23:32

Matt Keller

votes

3 answers

If DNS Failover is not recommended, what is?

As a followup question to his very popular question: Why is DNS failover not recommended?, I think it was agreed that DNS failover is not 100% reliable due to caching. However the highest voted answer did not really discuss what is the better…

domain-name-system failover high-availability disaster-recovery datacenter

asked Sep 06 '12 at 03:42

IMB

Prev 1

…

23 24 Next