23

This is based upon this hoax question here. The problem described is having a bash script which contains something to the effect of:

rm -rf {pattern1}/{pattern2}

...which if both patterns include one or more empty elements will expand to at least one instance of rm -rf /, assuming that the original command was transcribed correctly and the OP was doing brace expansion rather than parameter expansion.

In the OP's explanation of the hoax, he states:

The command [...] is harmless but it seems that almost no one has noticed.

The Ansible tool prevents these errors, [...] but [...] no one seemed to know that, otherwise they would know that what I have described could not happen.

So assuming you have a shell script that emits an rm -rf / command through either brace expansion or parameter expansion, is it true that using Ansible will prevent that command from being executed, and if so, how does it do this?

Is executing rm -rf / with root privileges really "harmless" so long as you're using Ansible to do it?

aroth
  • 393
  • 3
  • 9
  • 4
    I debated what to do with this question, but ultimately I decided to upvote and answer it, so as to move towards finally putting this whole sorry ridiculous mess in the past where it belongs. – Michael Hampton Apr 20 '16 at 05:34
  • I think the answer really lies in the `rm` source, which I analyzed below. – Aaron Hall Apr 21 '16 at 16:08

2 Answers2

54

I have virtual machines, let's blow a bunch of them up! For science.

[root@diaf ~]# ansible --version
ansible 2.0.1.0
  config file = /etc/ansible/ansible.cfg
  configured module search path = Default w/o overrides

First attempt:

[root@diaf ~]# cat killme.yml 
---
- hosts: localhost
  gather_facts: False
  tasks:
    - name: Die in a fire
      command: "rm -rf {x}/{y}"
[root@diaf ~]# ansible-playbook -l localhost -vvv killme.yml
Using /etc/ansible/ansible.cfg as config file
1 plays in killme.yml

PLAY ***************************************************************************

TASK [Die in a fire] ***********************************************************
task path: /root/killme.yml:5
ESTABLISH LOCAL CONNECTION FOR USER: root
localhost EXEC /bin/sh -c '( umask 22 && mkdir -p "` echo $HOME/.ansible/tmp/ansible-tmp-1461128819.56-86533871334374 `" && echo "` echo $HOME/.ansible/tmp/ansible-tmp-1461128819.56-86533871334374 `" )'
localhost PUT /tmp/tmprogfhZ TO /root/.ansible/tmp/ansible-tmp-1461128819.56-86533871334374/command
localhost EXEC /bin/sh -c 'LANG=en_US.UTF-8 LC_ALL=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 /usr/bin/python /root/.ansible/tmp/ansible-tmp-1461128819.56-86533871334374/command; rm -rf "/root/.ansible/tmp/ansible-tmp-1461128819.56-86533871334374/" > /dev/null 2>&1'
changed: [localhost] => {"changed": true, "cmd": ["rm", "-rf", "{x}/{y}"], "delta": "0:00:00.001844", "end": "2016-04-20 05:06:59.601868", "invocation": {"module_args": {"_raw_params": "rm -rf {x}/{y}", "_uses_shell": false, "chdir": null, "creates": null, "executable": null, "removes": null, "warn": true}, "module_name": "command"}, "rc": 0, "start": "2016-04-20 05:06:59.600024", "stderr": "", "stdout": "", "stdout_lines": [], "warnings": ["Consider using file module with state=absent rather than running rm"]}
 [WARNING]: Consider using file module with state=absent rather than running rm


PLAY RECAP *********************************************************************
localhost                  : ok=1    changed=1    unreachable=0    failed=0

OK, so command just passes the literals along, and nothing happens.

How about our favorite safety bypass, raw?

[root@diaf ~]# cat killme.yml
---
- hosts: localhost
  gather_facts: False
  tasks:
    - name: Die in a fire
      raw: "rm -rf {x}/{y}"
[root@diaf ~]# ansible-playbook -l localhost -vvv killme.yml
Using /etc/ansible/ansible.cfg as config file
1 plays in killme.yml

PLAY ***************************************************************************

TASK [Die in a fire] ***********************************************************
task path: /root/killme.yml:5
ESTABLISH LOCAL CONNECTION FOR USER: root
localhost EXEC rm -rf {x}/{y}
ok: [localhost] => {"changed": false, "invocation": {"module_args": {"_raw_params": "rm -rf {x}/{y}"}, "module_name": "raw"}, "rc": 0, "stderr": "", "stdout": "", "stdout_lines": []}

PLAY RECAP *********************************************************************
localhost                  : ok=1    changed=0    unreachable=0    failed=0

No go again! How hard can it possibly be to delete all your files?

Oh, but what if they were undefined variables or something?

[root@diaf ~]# cat killme.yml
---
- hosts: localhost
  gather_facts: False
  tasks:
    - name: Die in a fire
      command: "rm -rf {{x}}/{{y}}"
[root@diaf ~]# ansible-playbook -l localhost -vvv killme.yml
Using /etc/ansible/ansible.cfg as config file
1 plays in killme.yml

PLAY ***************************************************************************

TASK [Die in a fire] ***********************************************************
task path: /root/killme.yml:5
fatal: [localhost]: FAILED! => {"failed": true, "msg": "'x' is undefined"}

NO MORE HOSTS LEFT *************************************************************
        to retry, use: --limit @killme.retry

PLAY RECAP *********************************************************************
localhost                  : ok=0    changed=0    unreachable=0    failed=1

Well, that didn't work.

But what if the variables are defined, but empty?

[root@diaf ~]# cat killme.yml 
---
- hosts: localhost
  gather_facts: False
  tasks:
    - name: Die in a fire
      command: "rm -rf {{x}}/{{y}}"
  vars:
    x: ""
    y: ""
[root@diaf ~]# ansible-playbook -l localhost -vvv killme.yml
Using /etc/ansible/ansible.cfg as config file
1 plays in killme.yml

PLAY ***************************************************************************

TASK [Die in a fire] ***********************************************************
task path: /root/killme.yml:5
ESTABLISH LOCAL CONNECTION FOR USER: root
localhost EXEC /bin/sh -c '( umask 22 && mkdir -p "` echo $HOME/.ansible/tmp/ansible-tmp-1461129132.63-211170666238105 `" && echo "` echo $HOME/.ansible/tmp/ansible-tmp-1461129132.63-211170666238105 `" )'
localhost PUT /tmp/tmp78m3WM TO /root/.ansible/tmp/ansible-tmp-1461129132.63-211170666238105/command
localhost EXEC /bin/sh -c 'LANG=en_US.UTF-8 LC_ALL=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 /usr/bin/python /root/.ansible/tmp/ansible-tmp-1461129132.63-211170666238105/command; rm -rf "/root/.ansible/tmp/ansible-tmp-1461129132.63-211170666238105/" > /dev/null 2>&1'
fatal: [localhost]: FAILED! => {"changed": true, "cmd": ["rm", "-rf", "/"], "delta": "0:00:00.001740", "end": "2016-04-20 05:12:12.668616", "failed": true, "invocation": {"module_args": {"_raw_params": "rm -rf /", "_uses_shell": false, "chdir": null, "creates": null, "executable": null, "removes": null, "warn": true}, "module_name": "command"}, "rc": 1, "start": "2016-04-20 05:12:12.666876", "stderr": "rm: it is dangerous to operate recursively on ‘/’\nrm: use --no-preserve-root to override this failsafe", "stdout": "", "stdout_lines": [], "warnings": ["Consider using file module with state=absent rather than running rm"]}

NO MORE HOSTS LEFT *************************************************************
        to retry, use: --limit @killme.retry

PLAY RECAP *********************************************************************
localhost                  : ok=0    changed=0    unreachable=0    failed=1

Finally, some progress! But it still complains that I didn't use --no-preserve-root.

Of course, it also warns me that I should try using the file module and state=absent. Let's see if that works.

[root@diaf ~]# cat killme.yml 
---
- hosts: localhost
  gather_facts: False
  tasks:
    - name: Die in a fire
      file: path="{{x}}/{{y}}" state=absent
  vars:
    x: ""
    y: ""
[root@diaf ~]# ansible-playbook -l localhost -vvv killme.yml    
Using /etc/ansible/ansible.cfg as config file
1 plays in killme.yml

PLAY ***************************************************************************

TASK [Die in a fire] ***********************************************************
task path: /root/killme.yml:5
ESTABLISH LOCAL CONNECTION FOR USER: root
localhost EXEC /bin/sh -c '( umask 22 && mkdir -p "` echo $HOME/.ansible/tmp/ansible-tmp-1461129394.62-191828952911388 `" && echo "` echo $HOME/.ansible/tmp/ansible-tmp-1461129394.62-191828952911388 `" )'
localhost PUT /tmp/tmpUqLzyd TO /root/.ansible/tmp/ansible-tmp-1461129394.62-191828952911388/file
localhost EXEC /bin/sh -c 'LANG=en_US.UTF-8 LC_ALL=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 /usr/bin/python /root/.ansible/tmp/ansible-tmp-1461129394.62-191828952911388/file; rm -rf "/root/.ansible/tmp/ansible-tmp-1461129394.62-191828952911388/" > /dev/null 2>&1'
fatal: [localhost]: FAILED! => {"changed": false, "failed": true, "invocation": {"module_args": {"backup": null, "content": null, "delimiter": null, "diff_peek": null, "directory_mode": null, "follow": false, "force": false, "group": null, "mode": null, "original_basename": null, "owner": null, "path": "/", "recurse": false, "regexp": null, "remote_src": null, "selevel": null, "serole": null, "setype": null, "seuser": null, "src": null, "state": "absent", "validate": null}, "module_name": "file"}, "msg": "rmtree failed: [Errno 16] Device or resource busy: '/boot'"}

NO MORE HOSTS LEFT *************************************************************
        to retry, use: --limit @killme.retry

PLAY RECAP *********************************************************************
localhost                  : ok=0    changed=0    unreachable=0    failed=1

Good news, everyone! It started trying to delete all my files! But unfortunately it ran into an error. I'll leave fixing that and getting the playbook to destroy everything using the file module as an exercise to the reader.


DO NOT run any playbooks you see beyond this point! You'll see why in a moment.

Finally, for the coup de grâce...

[root@diaf ~]# cat killme.yml
---
- hosts: localhost
  gather_facts: False
  tasks:
    - name: Die in a fire
      raw: "rm -rf {{x}}/{{y}}"
  vars:
    x: ""
    y: "*"
[root@diaf ~]# ansible-playbook -l localhost -vvv killme.yml
Using /etc/ansible/ansible.cfg as config file
1 plays in killme.yml

PLAY ***************************************************************************

TASK [Die in a fire] ***********************************************************
task path: /root/killme.yml:5
ESTABLISH LOCAL CONNECTION FOR USER: root
localhost EXEC rm -rf /*
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/ansible/executor/process/result.py", line 102, in run
  File "/usr/lib/python2.7/site-packages/ansible/executor/process/result.py", line 76, in _read_worker_result
  File "/usr/lib64/python2.7/multiprocessing/queues.py", line 117, in get
ImportError: No module named task_result

This VM is an ex-parrot!

Interestingly, the above failed to do anything with command instead of raw. It just printed the same warning about using file with state=absent.

I'm going to say that it appears that if you aren't using raw that there is some protection from rm gone amok. You should not rely on this, though. I took a quick look through Ansible's code, and while I found the warning, I did not find anything that would actually suppress running the rm command.

Michael Hampton
  • 237,123
  • 42
  • 477
  • 940
  • 10
    +1 for science. I'd +1 more for the host name, but it would be fraud ;p/ – Journeyman Geek Apr 20 '16 at 05:19
  • Looks like you might have a filesystem mounted at `/boot`. – 84104 Apr 20 '16 at 05:25
  • 1
    @84104 Funny, that. By sheer coincidence, `boot` is the first directory entry in `/`. So no files were lost. – Michael Hampton Apr 20 '16 at 05:26
  • So `/1-usr-local-is-for-people-who-know-what-is-what` might have been lost had it existed? – 84104 Apr 20 '16 at 05:28
  • @84104 The low level `readdir()` reads directory entries in on-disk order, which for a newly created filesystem is going to be the order the inodes were created. Since this is a freshly installed VM, it created the `/boot` mountpoint right after creating the filesystem, and thus it was naturally first. As directory entries get deleted, new ones take those places, and a directory is wildly out of order. Try `ls -U` to see the actual on-disk order. Practically, `/boot` is unlikely to ever be deleted, so it will always retain its first position in the directory. – Michael Hampton Apr 20 '16 at 05:30
  • Hooray for science! Though seems like `rm` should get at least some of the credit for preventing disaster with its `--no-preserve-root` check? – aroth Apr 20 '16 at 05:34
  • 5
    @aroth Exactly! But, for science, try `rm -rf {{x}}/{{y}}` when `y` is set to `"*"`. The `--no-preserve-root` check is useful for what it is, but it will not get you out of every possible situation; it's easy enough to bypass. Which is why that question wasn't caught out as a hoax immediately: Taking into account the bad English and the apparent syntax errors, _it is plausible_. – Michael Hampton Apr 20 '16 at 05:36
  • 1
    Besides [`raw`](https://docs.ansible.com/ansible/raw_module.html), a bad [`cron`](https://docs.ansible.com/ansible/cron_module.html) might be another way to wreck a system. – 84104 Apr 20 '16 at 05:52
  • Now to get all the news outlets reporting the hoax to report on this post instead, people might learn something! – Reaces Apr 21 '16 at 12:31
  • Cool man that is a great experience. – Raul Hugo Apr 21 '16 at 16:20
3

Will Ansible prevent the execution of rm -rf / in a shell script?

I did inspect the coreutils rm source, which has the following:

  if (x.recursive && preserve_root)
    {
      static struct dev_ino dev_ino_buf;
      x.root_dev_ino = get_root_dev_ino (&dev_ino_buf);
      if (x.root_dev_ino == NULL)
        error (EXIT_FAILURE, errno, _("failed to get attributes of %s"),
               quoteaf ("/"));
    }

The only way to wipe from the root is to get past this code block. From this source:

struct dev_ino *
get_root_dev_ino (struct dev_ino *root_d_i)
{
  struct stat statbuf;
  if (lstat ("/", &statbuf))
    return NULL;
  root_d_i->st_ino = statbuf.st_ino;
  root_d_i->st_dev = statbuf.st_dev;
  return root_d_i;
}

I interpret this to mean that the the function get_root_dev_ino returns null on /, and thus rm fails.

The only way to bypass the first code block (with recursion) is to have --no-preserve-root and it does no use an environment variable to override, so it would have to be passed explicitly to rm.

I believe this proves that unless Ansible explicitly passes --no-preserve-root to rm, it will not do this.

Conclusion

I do not believe that Ansible explicitly prevents rm -rf / because rm itself prevents it.

Aaron Hall
  • 296
  • 3
  • 12