In testing suspicious files for malware, the typical method is to search for known malicious code. This is why zero-day malware attacks can be so dangerous — their malicious code is previously unknown to antivirus software. The ideal solution is to test every file and program in a sandbox before it is used, and this has been tried, but as far as I know, typically unusual processes appear that can be detected by a smart enough mouse. The malware detects that it is being sandboxed, and holds back on its malicious behavior to keep from being discovered.

The question is, can it be possible, and more importantly practical, to build a sandbox that's identical to an actual computer, and then implement in today's operating systems (Windows, OS X, Linux, iOS, Android) a software which which uses it to test everything before it is run, or even better, uses it as the running environment for that file?

Yes, it can be done as (theoretically) every "computing device" is computationally equivalent to every other computing device. Look up the Church-Turing thesis if you are interested.

However your question is grounded in practice and in this case the answer is "yes, but it would cost too much". Effort in virtualisation today aims at speeding up the virtual environment as much as possible to the point where it is quite trivial to detect if you are running in a virtual machine. Which means that any effort in replicating a system running 100% "native" is limited by the number of people with your same interest.

In other words there is very little commercial interest in doing what you are aiming for and the ROI would be limited only to the few hobbyists and companies that have a vetted interest in the matter. How much would you pay for such system? How much time can you devote? How many people do you know that are willing to spend years on this project?

While there are efforts to research this I haven't yet seen a fully working system that's not terribly slow. For example, Skype employs anti-debugging techniques that detects slowdowns (see slide 30 of this presentation). I suspect any malware could do similar tricks to measure execution against a fixed time server and detect when it's running in a (necessarily slow) emulator.

I think that a different approach would be more economically sound: run the malware sample on an actual machine, and observe what happens. Then "ghost" its disk and memory and look at the differences with an identical, "clean" sample. It might take less time to do, overall, than developing a simulator like the one you have in mind.

Actually lorenzo's answer does not quite cut it. The Church-Turing thesis only provides us with a model of computing, it can't tell us anything about virtualization because it is not concerned with other aspects of a machine.

But there is theoretical analysis for the ability of a machine to be virtualized by Popek et al: http://cs.nyu.edu/courses/fall14/CSCI-GA.3033-010/popek-goldberg.pdf

This said, current architectures and most importantly x86-64 do NOT fullfill those requirements. So the conclusion would be, it is unfortunately impossible for the cpu architectures currently in use. But one could always think about new cpu architectures...

Put an actual computer in a physical sandbox environment. The computer itself isn't a sandbox and don't virtualize anything.

Need active directory? Put active directory in the sandbox environment.

Do your tests, verify what has changed, review computer and network logs.

This is more practical than building a sandboxed OS which limits normal hardware functions.

...practical, to build a sandbox that's identical to an actual computer... test every file and program in a sandbox before it is used...

I think this is the wrong question to ask. The real challenge is not to build a sandbox which behaves like a real computer, but one which behaves like a real computer used by the targeted user.

Malware actually uses techniques to detect presence of a user, sometimes even a specific user (targeted attacks). Some possible techniques for web based drive-by-downloads are

  • Check if specific positions on a page gets clicked or wait for some input by displaying some kind of captcha.
  • Check if some resources are cached and others are not and thus check for the presence of a browser actually used by a human for some time.
  • Check if user is logged in to facebook, has access to some specific internal website, resource etc.
Let's take another look on this question: detecting malware purely on behaviour is very hard --- not only malware can try detect whether it is running in a virtualised enviorment, but it can (for example) wait some time before activating malicious behaviours (a kind-of-related example: some time ago a chrome extension that waited 7 days before activating malicious activity passed Google's screening process for chrome extensions).

Most of antiviruses already do some behaviour malware detection, let's assume that a malware program waits 5 minutes before trying to wipe / partition. Antivirus that does behaviour analysis can either:

  • Run this program in sandbox for couple of seconds and then decide it is OK.
  • Run this program is super-fast sandbox in which time passes very fast (faster than realtime) --- but this sandbox would be easy to detect by contacting time server.
  • Run this (and every other!) program in sandbox for 5 minutes before running it on real PC --- which renders the computer unusable.
