3

I have ubuntu 14.04.3 server running in a vm with kernel version 3.13.0-83-generic running. I have tested several PoC's gathered here. Most of them crash the kernel (not all the times but sometimes) and they're not reliable; Except for the lib-c to root exploit. This one exploits the vulnerability successfully and pops out a root shell for a few seconds (about 20-30 seconds). To make it stable I did echo 0 > /proc/sys/vm/dirty_writeback_centisecs as mentioned here. Everything is fine until reboot. While rebooting, the kernel crashes :(

I have 2 questions:

First, what do the crashes depend on? Is it the hardware? The kernel version? The exploit?

Second, How can I fix the kernel crash on reboot while I'm using the lib-c to root exploit?

UPDATE 1

This is what I get with kdump:

[  388.077362] kernel BUG at /build/linux-03BQvT/linux-3.13.0/fs/ext4/inode.c:2420!
[  388.077497] invalid opcode: 0000 [#1] SMP 
[  388.077601] Modules linked in: crct10dif_pclmul crc32_pclmul vmw_balloon aesni_intel aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd serio_raw vmw_vmci lp parport psmouse ahci e1000 libahci floppy mptspi mptscsih mptbase
[  388.078190] CPU: 1 PID: 453 Comm: kworker/u256:28 Not tainted 3.13.0-83-generic #127-Ubuntu
[  388.078426] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 05/20/2014
[  388.078627] Workqueue: writeback bdi_writeback_workfn (flush-8:0)
[  388.078755] task: ffff880135e69800 ti: ffff880135e70000 task.ti: ffff880135e70000
[  388.078878] RIP: 0010:[<ffffffff81241298>]  [<ffffffff81241298>] mpage_prepare_extent_to_map+0x2b8/0x2c0
[  388.079027] RSP: 0018:ffff880135e719d8  EFLAGS: 00010246
[  388.079102] RAX: 01ffff000002007d RBX: ffff880135e71a18 RCX: 0000000000000000
[  388.079187] RDX: ffff880135e71a18 RSI: 0000000000000000 RDI: ffff8801377824a0
[  388.079272] RBP: ffff880135e71aa8 R08: 0000000000000000 R09: 0000000000000000
[  388.079357] R10: 0000000000000100 R11: 0000000000000210 R12: 0000000000003400
[  388.079441] R13: 0007ffffffffffff R14: ffffea0002ec8c80 R15: ffff880135e71b50
[  388.079527] FS:  0000000000000000(0000) GS:ffff88013a620000(0000) knlGS:0000000000000000
[  388.079651] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  388.079729] CR2: 0000000000410000 CR3: 00000000377b5000 CR4: 00000000001407e0
[  388.079852] Stack:
[  388.079912]  ffff880135e71a18 0000000000000000 ffff880137782498 ffff880135e71a18
[  388.080089]  0000000000000001 0000000000000001 0000000000000000 ffffea0002ec8c80
[  388.080265]  ffff8800bba09000 ffff880135e71a68 ffffffff81288bc3 ffff880100000050
[  388.080441] Call Trace:
[  388.080506]  [<ffffffff81288bc3>] ? jbd2__journal_start+0xf3/0x1e0
[  388.080587]  [<ffffffff81245276>] ? ext4_writepages+0x3c6/0xd20
[  388.080667]  [<ffffffff8126f7f9>] ? __ext4_journal_start_sb+0x69/0xe0
[  388.080749]  [<ffffffff812452a2>] ext4_writepages+0x3f2/0xd20
[  388.080830]  [<ffffffff8115bc2e>] do_writepages+0x1e/0x40
[  388.080907]  [<ffffffff811e7f10>] __writeback_single_inode+0x40/0x220
[  388.080989]  [<ffffffff811e8cd7>] writeback_sb_inodes+0x247/0x3e0
[  388.081069]  [<ffffffff811e8f0f>] __writeback_inodes_wb+0x9f/0xd0
[  388.081149]  [<ffffffff811e9183>] wb_writeback+0x243/0x2c0
[  388.081228]  [<ffffffff810870c6>] ? set_worker_desc+0x76/0x90
[  388.081307]  [<ffffffff811ea9a8>] bdi_writeback_workfn+0x108/0x430
[  388.081388]  [<ffffffff81083d22>] process_one_work+0x182/0x450
[  388.081468]  [<ffffffff81084b11>] worker_thread+0x121/0x410
[  388.081545]  [<ffffffff810849f0>] ? rescuer_thread+0x430/0x430
[  388.081624]  [<ffffffff8108b8f2>] kthread+0xd2/0xf0
[  388.081706]  [<ffffffff8108b820>] ? kthread_create_on_node+0x1c0/0x1c0
[  388.081787]  [<ffffffff817364e8>] ret_from_fork+0x58/0x90
[  388.081861]  [<ffffffff8108b820>] ? kthread_create_on_node+0x1c0/0x1c0
[  388.081940] Code: 00 00 00 48 8d bd 58 ff ff ff 89 85 48 ff ff ff e8 6e cf f1 ff 8b 85 48 ff ff ff eb ca 48 8d bd 58 ff ff ff e8 5a cf f1 ff eb 80 <0f> 0b 0f 0b 0f 1f 40 00 0f 1f 44 00 00 55 48 89 e5 41 57 41 56 
[  388.083376] RIP  [<ffffffff81241298>] mpage_prepare_extent_to_map+0x2b8/0x2c0
[  388.083472]  RSP <ffff880135e719d8>
arashkgpt
  • 31
  • 4

1 Answers1

3

TL;DR a race condition occurred resulting in the writeback bit being set on a page immediately after waiting for the bit to be unset, tripping over a sanity check that crashed the kernel.

Unfortunately, I cannot answer your two questions, since they require a lot more debugging, but I can shed some light on why this is occurring, which may help. It seems likely that this is being triggered when the kernel syncs the filesystems right before reboot. You can test this by either running sync yourself (or doing an emergency sync with sysrq-s, e.g. via echo s > /proc/sysrq-trigger) to see if that triggers this behavior, or by rebooting without syncing first, i.e. with reboot -n and seeing if no crash occurs.

The issue itself

This appears to be coming from a BUG() in fs/ext4/inode.c:mpage_prepare_extent_to_map. The BUG() macro is used to intentionally oops the kernel by generating an illegal instruction exception. It is intended to be used as a runtime sanity check. The BUG() macro unconditionally raises an exception, whereas BUG_ON() takes a single argument and raises the exception if the argument evaluates true.

In the log you posted, BUG_ON(PageWriteback(page)) is triggered, meaning its argument evaluated to true. This means that, at this point in the code, PageWriteback(page) should always return false. This is used to prevent a simultaneous writeout of the same page. The relevant lines are:

wait_on_page_writeback(page);
BUG_ON(PageWriteback(page));

The first line is a function defined in include/linux/pagemap.h:

/*
 * Wait for a page to complete writeback
 */
static inline void wait_on_page_writeback(struct page *page)
{
    if (PageWriteback(page))
        wait_on_page_bit(page, PG_writeback);
}

The second line comes from include/linux/page-flags.h:

#define TESTPAGEFLAG(uname, lname, policy)                             \
    static __always_inline int Page##uname(struct page *page)          \
    { return test_bit(PG_##lname, &policy(page, 0)->flags); }

The root of the issue

Unfortunately, the exact reason why this is occurring isn't clear to me. Judging by how the DirtyCow exploit works, it makes sense, since it seems obvious that this is the result of a race condition (otherwise the page should not suddenly regain the writeback bit after waiting for it to be removed unless explicitly told to). You would want to do more thorough debugging to understand the issue completely. Since it is likely a race condition, you won't have much luck with only this oops log. You'll have to look at the (non-faulting, so silent) code being run on another CPU which is triggering this.

I can say that it resulted in one of three possible events that triggered this outcome:

  • The target page did not have the writeback bit set. As a result, wait_on_page_writeback() acted as a no-op. Somehow, the bit got set shortly after the function returned.
  • The target page had the writeback bit set. The wait_on_page_writeback() waited until the bit was removed before returning. Somehow, after the bit was removed, it was set again.
  • The target page had the writeback bit set, and wait_on_page_writeback() was supposed to return only when the bit was unset. Somehow, it returned even though the bit was still set.

Regardless of which of these is the case, the wait_on_page_writeback() function returned, yet the page still had or quickly regained the writeback bit. This was caught by a BUG_ON() sanity check because it should never happen.

How the oops log reveals this

Rather than just saying "this is what it is", I'll explain how I gathered this from the log you posted.

[  388.077362] kernel BUG at /build/linux-03BQvT/linux-3.13.0/fs/ext4/inode.c:2420!
[  388.077497] invalid opcode: 0000 [#1] SMP 

This tells me that the issue was a sanity check related to BUG(). The first line explicitly calls this out, and even points to the exact line and source file of interest. The second line also implicitly reinforces this, because BUG() and related functions execute the ud2 instruction, which results in an invalid opcode (instruction) error.

[  388.078878] RIP: 0010:[<ffffffff81241298>]  [<ffffffff81241298>] mpage_prepare_extent_to_map+0x2b8/0x2c0

The RIP is the instruction pointer (called EIP and IP on 32 and 16 bit processors, respectively). It will point to the location of the currently executing instruction, which will be within the current function. Since I have a different kernel version and didn't want to look up your exact version, I was able to go to the function mpage_prepare_extent_to_map() in fs/ext4/inode.c and look for anything like BUG() or BUG_ON(), as well as any lines just above it. Looking for the files where the two functions wait_on_page_writeback() and PageWriteback() are defined revealed their purpose. Specifically, these assumptions provided by the two lines of source code proved incorrect:

  • After wait_on_page_writeback(page) returns, page will not have the writeback bit set.
  • PageWriteback(page) will return false, as page will not have the writeback bit set.
  • Therefore, BUG_ON(PageWriteback(page)) will not be triggered.

Due to the fact that the BUG_ON() was triggered, the bug (ehem) becomes obvious.

forest
  • 64,616
  • 20
  • 206
  • 257