Are memcpy() based race conditions exploitable for causing remote code execution?

Question

Let’s say I have the following pseudocode in the trusted part of a sandbox which prevent untrusted code calling mprotect() and mmap() and ptrace() directly (mutext isn’t accessible from sandboxed memory)…

//src and dest are user controlled but must be valid.
TrustedPart_for_safe_jit(void * mutext, uint8_t *src,uint8_t *dest, uint32_t size) // in the current case, *dest targets a PROT_NONE memory region
{
    MutexLock(mutext);
    ValidateOpcodesOrCrash(src,size); // uses calls to mmap on size internally. Contains many different loops and use several 10k thousands lines of codes in the trusted part of the sandbox : this is the longest part. Please also note that src is write protected while being in this function.
    unwriteprotect(dest,size); // calls many sandbox’s internal functions

    SafeMemcpy(src,dest,size); // THIS IS the function which contains the race condition

    asm("mfence");
    unEXECprotect(dest,size); // involve write protecting as well as allowing reading
    MutexUnlock(mutext);
}

SafeMemcpy(uint8_t *src,uint8_t *dest, uint32_t size) // the data to be copied cannot exceed 128Mb
{
    if(!CheckUserTargetPointToValidMemroyRange(dest,size) {
        uint8_t *src_ptr=src;
        uint8_t *dest_ptr=dest;
        uint8_t *end_ptr=des+size;
        while (dest_ptr < end_ptr) { // that loop should execute very fast
            *(uint32_t *) dest_ptr = *(uint32_t *) src_ptr;
            dest_ptr += 4;
            src_ptr += 4;
        }
    }
}

That part is responsible for allowing untrusted code to use ᴊɪᴛ compilation.
The point is untrusted thread aren’t suspended.

As you know, when 2 threads use memcpy() with the same destination, they generate random data. In that case, such data could potentially contains instructions like int 0x80, thus allowing to escape the sandbox.

Things I thought to so far :

Write data to a file and read it with the read system call through the sandbox. If the memory is still write protected the program doesn’t crash. This would involve looping and even if the data to be copied can be 128Mb large I’m not sure it would works because of syscall overhead.
An Alternative would be to create code several times and try reading with several timing, but I have no idea on how to select the initial timing window.
Use futex… But I couldn’t found if futex can be used to check the value of non allocated memory. I’m also unsure if the thread could wake up before memory become write protected.

So, is it possible to plan the timing window for memcpy race conditions ?

Are you trying to write to the memory from a different thread before it gets write protected? — grc, Feb 20 '17 at 06:09
@grc : Yes, I want to write after and before it is write protected. The aim is to start memory writing during the`while (dest_ptr < end_ptr)`loop. As modern ᴄᴘᴜ typically copies data inside such loops at 19Gb/s, the timing window is very short. — user2284570, Feb 20 '17 at 12:49
If you're copying 128Mb, I think the timing window would be relatively large compared with syscalls. — grc, Feb 20 '17 at 13:56
@grc : yes, but I can create 128Mb at max *(the maximum size allowed for the dynamic code region)*. If I want to create dynamic code several time, I have to use smaller size *(I recognize It’s strange but they documented that dynamic codes region can’t be deleted securely because of race condition)*. Also what timing to choose ? I can’t access to the real fileystem, so I have no way to know about ᴄᴘᴜ speed. — user2284570, Feb 20 '17 at 15:04
How do you get the interefering thread running on this system? I assume it must previously be copied using the given mechanism, and then allowed to execute in the sandbox. Before being copied it must pass ValidateOpcodes(). Does ValidateOpcodes() remove or reject data for copy? If so what does it not allow? — this.josh, Mar 01 '17 at 07:34
@this.josh : this is a user mode only sandbox, it allows to run safely a restricted list of ᴄᴘᴜ instructions *(the static executables are disassembled in order to verify)*. So you can compile C code with a special version of gcc. In order to create threads, I have access to a special version of the glibc’s pthreads *(the ᴀᴘɪ is the same but the implementation assure compliance with the sandbox’s trampoline instead of the`clone()`system call)*. `ValidateOpcodes()` rejects the sandboxed executable file if it contains invalid instructions *(such as jumping to a non 16 bytes aligned address)*. — user2284570, Mar 01 '17 at 16:09
@Demi : I think not. but the definite yes or no answer can’t be given publicly. — user2284570, Mar 24 '17 at 17:49

score 2 · Answer 1 · answered Mar 21 '17 at 02:56

2

Based on what I see here, you can modify *src after ValidateOpcodesOrCrash finished checking that part of the memory but before SafeMemcpy starts.

I don't know how ValidateOpcodesOrCrash is implemented, but presuming that it simply loops through [src,src+size] and look for illegal instructions, then you can call with a fairly TrustedPart_for_safe_jit large size, busy wait for a few hundreds of CPU cycles, and then start overwriting *src that ValidateOpcodesOrCrash likely had finished checking. If ValidateOpcodesOrCrash does something more complicated, you can figure out what sequence of instructions will be the fastest and slowest for ValidateOpcodesOrCrash to check, put the fastest at the front and, a whole lot of the slowest instructions all up to the end. You probably won't need to wait for ValidateOpcodesOrCrash to complete before you start overwriting src.

answered Mar 21 '17 at 02:56

Lie Ryan

31,089
6
68
93

`wait for a few hundreds of CPU cycles`How do you define that when the hardware could be the the best lastest 64 bits Xeon with ddr4 or an old amd opteron with ddr ? I also can’t see why I should not modify src while in`SafeMemcpy`. The only thing done by`CheckUserTarget`is to check the address of dest is in allowed memory range. – user2284570 Mar 21 '17 at 13:08
`what sequence of instructions will be the fastest and slowest for ValidateOpcodesOrCrash`The problem on x86 is opcodes that are more complicate to decode also takes more space. Simple ones like`nops`are decoded very fast, but you can use up to 15nopwhere a single and more complicate opcode would take the same space. **Also I can’t modify`src`before`ValidateOpcodesOrCrash`finish due to write protection being applied while processing**. – user2284570 Mar 21 '17 at 13:24
@user2284570: You can tweak the amount of busy loop manually, all we'll be trying to do here is to increase the attack window to increase your odds. An exploit don't need to be 100% reliable. A defender can only fail once, an attacker have nearly infinite retries. If the team developing this software don't understand this concept, they shouldn't be writing any security software. – Lie Ryan Mar 21 '17 at 14:17
@user2284570: another thing you might want to try is to do lots of memory access and writes in your other threads you control, to increase memory bandwidth contention with the supervisor thread. The goal here is to try to force the supervisor thread to exhaust its time slice in the middle of SafeMemcpy and get preempted/context switch, which should give you a fairly large time slice to exploit the race condition, and hopefully be able to implant your exploit code. – Lie Ryan Mar 21 '17 at 14:24
Thanks, but this still doesn’t tell if 10,000,000 cpu cycles would be largely too much or too few depending on the context *(remember if the timing window is incorrect the program crash)*. Because of crash if invalid I can’t know if the timing was too short or too long. Also, the sandbox requires executables to be downloaded from an http server *(and there’s no caching)*. They can only be triggered by the user from local source if he/she had fully accepted to install the to be sandboxed program. This makes retrying costly. I feel that it’s like guessing an address in the case of aslr. – user2284570 Mar 21 '17 at 14:41
@user2284570: what's the expected number of users? If every time the program 95% of the time it behaves normally and 5% of the time it attempts an attack, most users might not report it as an issue. Even if the exploit only guessed correctly 10% of the time, if you have just 200 downloads per day, there are going to be one successful exploit per day. – Lie Ryan Mar 21 '17 at 14:58
Let us [continue this discussion in chat](http://chat.stackexchange.com/rooms/55759/discussion-between-user2284570-and-lie-ryan). – user2284570 Mar 21 '17 at 14:59
**So you’re simply explaining the solution is to pick up an arbitrary static value at compile time and send it to all users whatever their hardware is ?** *(for example 21,000 millisecond because of the targeting is old windows xp boxes with fragmented hard disks)* – user2284570 Mar 23 '17 at 21:50
@user2284570: it doesn't need to be arbitrary, you'd want to pick a value that's likely to work given the types of machines you are targeting. You can also have the server serve binaries with different precompiled values, keeping track of which ones works or doesn't work. Yes this will be more limited than having the client fully automatically choose values, but an exploit doesn't need to be 100% reliable to be dangerous enough. – Lie Ryan Mar 24 '17 at 06:13
`a value that's likely to work given the types of machines you are targeting`. Without an access to the **real** filesystem, how am I supposed to have an idea of what kind the machine is *(the only thing I can do is to measure the time a function of mine takes to complete)* ? If the timing is wrong, the program just crash without returning anything, so I can’t know if a wait value was too short or too long. And again what kind of time to choose for starting ? 900ns, or 15 seconds ? – user2284570 Mar 24 '17 at 17:43
@user2284570: I've already described how to choose the delay in a previous comment. You should modify the sandbox software to not crash during exploit development, and to log extra details that'll help you figure out whether it's too long or too short. On real attacks you can't modify the sandbox, so you'll just have to fire blind, stretch the timing window, and let [law of large number](https://en.m.wikipedia.org/wiki/Law_of_truly_large_numbers) work its magic. – Lie Ryan Mar 25 '17 at 00:25
`You should modify the sandbox software to not crash during exploit development`Yes, but this will make the exploit working only on similar hardware as mine.The point of the question is question is to get it working on all x86 *(once I corrupted data, I can read into memory for a suitable hex sequence)*.Also isn’t there a second law of large numbers due the speed difference between recent and very old hardware that will overcome the one of huge time in data copying *(as explained by expecting completion finishing between several seconds and µs)* ? Definitely firing blind isn’t the good option. – user2284570 Mar 25 '17 at 17:00
@user2284570: why do you need to insist for this to work with very old and modern machines? Exploits don't need to work reliably on all machines to be dangerous, just target the typical machine (for untargeted attack) and research on your target's hardware (for highly specific target). Unlike ASLR, which distributes addresses uniformly, "typical machine speed" isn't uniformly distributed. Also, as I mentioned above, you should use cycle count rather than time, as they tend to grow together for the typical machine. – Lie Ryan Mar 26 '17 at 00:50
@user2284570: if you want to have more work on this, you need to disclose more of the sandbox implementation. With the attitude you've described of the development team, I'm pretty sure this isn't the only exploitable place that can be combined together with this to improve your odds. – Lie Ryan Mar 26 '17 at 00:56
The sandbox is well known; robust and eavily single threaded fuzzed *(I mean not against race conditions)*. It is definitely the only exploitabpe issue. **While I agree to disclose more, I fail to see what parts of the source code could provide the missing information for the current case.** It is because I have no ideas about what missing information can become useful. The source code is definitely very large. – user2284570 Mar 26 '17 at 05:46
I mean do you have an idea about what information here is currently missing and I could disclose. – user2284570 Mar 28 '17 at 19:25
@user2284570: anything that could cause large amount of memory accesses (to increase the chance of forced context switch by the OS), anything that creates lots of heavy threads (increasing the chance that when ValidateOpcodesOrCrash are forced to context switch, the processor goes to another thread for some time). Also, some internals about ValidateOpcodesOrCrash, what does it do, and anything that could slow it down (e.g. if it creates a deep parsing tree) – Lie Ryan Mar 29 '17 at 01:09
no thread are created for the creating jit code. However, the sandbox allows sandboxed process to create up to 100 threads for running any allowed sanboxed code *(you can put empty loops in each of them)*. The only large data copying done for creating jit code is here. The disassembler is written in ragel and compiled as a large switch linked to a static table, so the only way to slow it down is to give more data to decode. The number of instructions and their sizes have near no impact on disassmbly speed. – user2284570 Mar 29 '17 at 01:23

Demi · Answer 2 · 2017-03-19T21:36:06.430

0

(Disclaimer: I am not a security professional – just a student with some knowledge of C)

I think that this is likely exploitable, at least to cause a crash. To fix the bug, you need to either:

suspend all untrusted threads while dest is writeable.
or have TrustedPart_for_safe_jit return freshly allocated executable memory (I assume the untrusted code is not allowed to cough up arbitrary machine addresses as pointers).

That means that instead of using a mutex, you need to stop-the-world.

edited Mar 19 '17 at 21:36

answered Mar 19 '17 at 19:13

Demi

769
1
4
11

The whole design of the sandbox is to crash the process in the case of trying to execute non allowed code. This is a sandbox : the aim is to allow executing any code programmer want like with javascript. In fact if I want to perform crashing I can just access non allocated memory. I have full access to pointers, provided they are shrunk to 32 bits values *(so accessing important data require 64 bits pointers, the lower 32 bits only contains validated code and data)*. – user2284570 Mar 19 '17 at 19:18
`suspend all untrusted threads while dest is writeable.`The point of my question is I have no ideas on how to determine when it is writable… If I could, I could just start memcpy writing. – user2284570 Mar 19 '17 at 19:19
@user2284570 The problem is that `MutexLock()` needs to suspend all write access to `src` or `dst` from *any* thread. That is why you need to suspend all untrusted threads. – Demi Mar 19 '17 at 19:23
`The problem is that MutexLock() needs to suspend all write access to src or dst` **It doesn’t. It just prevent`mutext`to be locked.** I mean that `mutext` is global data except that it can’t be accessed from untrusted code. The only thing`MutexLock()`prevent is to get 2 threads calling`TrustedPart_for_safe_jit`at the same time, which would allow to have the correct timing. The point is`dst`is required to be read_only before`TrustedPart_for_safe_jit`. `src`remains always readable and writable, but how to determine the correct timing for modifying`src`after`ValidateOpcodesOrCrash()`finished ? – user2284570 Mar 19 '17 at 19:48
`or have TrustedPart_for_safe_jit return freshly allocated executable memory` Though I didn’t wrote it, you could have guessed it : that since it’s in memory accessible from untrusted code to allocate read only data *(`TrustedPart_for_safe_jit` exit if the data is writable since the begging)*. The only way to create executable memory is to use`TrustedPart_for_safe_jit`. Once data as been allocated, it’s status can’t be changed except if by freeing memory. The exception is jit code which can never freed without restarting the process *(that’s why `TrustedPart_for_safe_jit`has no real use)*. – user2284570 Mar 19 '17 at 19:53
@user2284570 What I am trying to say is that **you need to get rid of the race condition**. – Demi Mar 19 '17 at 21:37
I know how to get rid of it. The problem is I’m not part of the maintainers team. And as long I don’t even explain how such race condition could be exploitable, there are unwilling to fix it *(they say that`MutexLock()`prevent calling the function twice before an instance complete so that it’s impossible for an attacker to time correctly and so their no reason to modify the code)*. – user2284570 Mar 19 '17 at 21:41
Can't help there, sorry — I have no experience in offense. – Demi Mar 19 '17 at 23:28
you should paid attention to the [tag:exploit-development] tag. Please delete this answer so the question is marked as unanswered which should help getting an on topic answer *(unlike yours)*. – user2284570 Mar 19 '17 at 23:47

Are memcpy() based race conditions exploitable for causing remote code execution?

2 Answers2