9

Edit: a follow-up question: Restore mongoDB by --repair and WiredTiger.

My developer committed a huge mistake and we cannot find our Mongo database anywhere in the server.

He logged into the server, and saved the following shell under ~/crontab/mongod_back.sh:

mongod_back.sh

#!/bin/sh
DUMP=mongodump
OUT_DIR=/data/backup/mongod/tmp  // 备份文件临时目录
TAR_DIR=/data/backup/mongod      // 备份文件正式目录
DATE=`date +%Y_%m_%d_%H_%M_%S`   // 备份文件将以备份对间保存
DB_USER=Guitang                  // 数库操作员
DB_PASS=qq■■■■■■■■■■■■■■■■■■■■■  // 数掘库操作员密码
DAYS=14                          // 保留最新14天的份
TARBAK="mongod_bak_$DATE.tar.gz" // 备份文件命名格式
cd $OUT_DIR                      // 创建文件夹
rm -rf $OUT_DIR/*                // 清空临时目录
mkdir -p $OUT_DIR/$DATE          // 创建本次备份文件夹
$DUMP -d wecard -u $DB_USER -p $DB_PASS -o $OUT_DIR/$DATE  // 执行备份命令
tar -zcvf $TAR_DIR/$TAR_BAK $OUT_DIR/$DATE       // 将份文件打包放入正式
find $TAR_DIR/ -mtime +$DAYS -delete             // 除14天前的旧备

And then he ran it and it outputted permission denied messages, so he pressed Ctrl+C. The server shut down automatically. He tried to restart it but got a grub error:

GRUB

He contacted AliCloud, the engineer connected the disk to another working server so that he could check the disk. Looks like some folders are gone, including /data/ where the mongodb is!

  1. We don't understand how the script could destroy the disk including /data/;
  2. And of course, is it possible to get the /data/ back?

PS: He did not take snapshot of the disk before.

PS2: As people mention "backups" a lot, we have lots of important users and data coming these 2 days, the purpose of this action was to backup them (for the first time), then they turned out to be entirely deleted.

SoftTimur
  • 307
  • 2
  • 5
  • 14
  • 4
    Your script has no error checking. If the line `cd $OUT_DIR` fails, it's going to delete everything in the current path, which may well be `/` . This is why you have backups - use them. – Jenny D Mar 24 '19 at 11:42
  • He run the shell under `~/crontab/`, how could `rm` or `find -delete` delete folders under `/`? – SoftTimur Mar 24 '19 at 11:50
  • Make a raw backup of the full hard disk before yo do anything, this will increase your low changes for data recovery – Ferrybig Mar 24 '19 at 15:52
  • 8
    Wow - did this script get into your version control system? Did it go through peer review? `rm -rf $OUT_DIR/*` really? And why was the script not tested on a non-production server? Once you have restored from backup you have _many_ critical procedural failings to address here before automating anything else. I hope you're not _too_ hard on your developer over it, as a result (though they also have quite a bit to answer for) – Lightness Races in Orbit Mar 24 '19 at 18:30
  • @SoftTimur your cloud provider may have a backup/snapshot of your server. You should ask them. – topher Mar 24 '19 at 20:38
  • @topher, they said no. – SoftTimur Mar 24 '19 at 20:40
  • This will soon be my most-upvoted question ever in Stack Exchange. – SoftTimur Mar 24 '19 at 21:25
  • 3
    Re: this was to be your backup script, never test a new procedure against the only copy of your data. Prior to your very first backup, create a separate test database, put in some fake data, and restore test that. – John Mahowald Mar 24 '19 at 22:12
  • Yet another disaster that could have been averted by `set -eu`. Any bash script without it makes me nervous, especially if it involves `rm`. – Matteo Italia Mar 24 '19 at 22:40
  • Cannot believe that, regardless of those answers, comments and upvotes, there are still moderators who marked this question duplicate as [Monday morning mistake: sudo rm -rf --no-preserve-root /](https://serverfault.com/questions/587102/monday-morning-mistake-sudo-rm-rf-no-preserve-root]), seriously there is something wrong with Stack Exchange today. – SoftTimur Mar 25 '19 at 01:09
  • 2
    Two suggestions to improve your questions: 1) use all-English. The problem with the non-English content is that people don't understanding it, also don't know if it is important or not. Thus, they can't be sure that their answer is okay, and thus they tend to rather don't answer. 2) If you can use textual copy-paste, use that and never use screenshots. – peterh Mar 25 '19 at 07:31

3 Answers3

36

Easy enough. The // sequence isn't a comment in bash (# is).

The statement OUT_DIR=x // text had no effect* except a cryptic error message.

Thus, with the OUT_DIR being an empty string, one of the commands eventually executed was rm -rf /*. Some directories placed directly underneath / weren't removed due to user not having permissions, but it appears that some vital directories were removed. You need to restore from backup.


* The peculiar form of bash statement A=b c d e f is roughly similar to:

export A=b
c d e f
unset A

A common example:

export VISUAL=vi                       # A standard visual editor to use is `vi`
visudo -f dummy_sudoers1               # Starts vi to edit a fake sudo config. Type :q! to exit
VISUAL=nano visudo -f dummy_sudoers2   # Starts nano to edit a fake sudo config
visudo -f dummy_sudoers3               # Starts vi again (!)

And the problematic line of script amounted to this:

export OUT_DIR=/data/backup/mongod/tmp
// 备份文件临时目录   # shell error as `//` isn't an executable file!
unset OUT_DIR
kubanczyk
  • 13,502
  • 5
  • 40
  • 55
  • 3
    This is one of the reasons I always use `set -euo pipefail`, would have resulted in an exit instead of blundering forward with unset variables – ThisGuy Mar 25 '19 at 17:45
8

1) He erroneously assumed that // was a bash comment. It is not, only # is.

The shell interpreted // text as a normal command, and did not find a binary called //, and did nothing.

In bash, when you have a variable assignment (OUT_DIR=/data/backup/mongod/tmp) directly preceding a command (// text), it only sets the variable while running the command. Therefore, it unsets OUT_DIR immediately, and when the rm line is reached, OUT_DIR is now unset, and rm -rf / is now called, deleting everything you have permission to delete.

2) The solution is the same as all rm -rf / cases: restore from backup. There is no other solution because you do not have physical access to the hard drive.

  • why having physical access to the hard drive may help to restore? – SoftTimur Mar 24 '19 at 19:02
  • 1
    Possible forensics, professional hard drive recovery methods. I know this because I know that `rm -rf` is not extremely secure, and doesn't overwrite the hard drive. – not my real name Mar 24 '19 at 19:03
  • 2
    @SoftTimur `rm` usually just "unlinks" files but the data is still physically there until overwritten. This is why professionals can "undelete" sometimes if they have physical access _and_ you haven't done lots of things with the disk after the catastrophe occurred. If you don't have backups, that's the best you can hope for. – Lightness Races in Orbit Mar 24 '19 at 19:52
  • You don't need physical access, with almost no disk activity since the deletion file restore utilities might be able to relatively easily find almost everything – ThisGuy Mar 25 '19 at 17:47
4

1) Bash comments start with #. Sorry for your loss. 2) Restore from backup is the only way to proceed here, unfortunately.

RMPJ
  • 41
  • 1