How do AV scanners in VirusTotal check if a file is malicious or not and how trustworthy is its report?

Question

I have the following three questions regarding the VirusTotal website:

When i submit a file, sometimes it uploads the file extremely fast even for large files, how does it do it?
Does it only send the hash of the file to AV scanners/engines to get the result? How trustworthy is its report?
Is there any better alternative that actually performs some static and dynamic analysis on the file to check if its malicious or not?

Damon · Accepted Answer · 2019-07-12T13:30:58.670

ad 1: It does upload your file, but only if the hash is not known. As the very first thing, a piece of Javascript will calculate a cryptographic hash (SHA-256 if I recall correctly, but might be wrong) and sends that. The engine then, rather than scanning, looks up the hash in a already-did-it database. Only if not present, or if you insist, it will upload the actual file.
How secure is this? Pretty much 100%, or as close as you can get to it. The odds that your file has the same hash as an already scanned file but isn't the same file are diminuitive. Not quite zero, admittedly, but as good as.

ad 2: It doesn't make much of a statement at all by itself other than "no problems were found" if there were zero hits reported. What it does is, it runsaround 60-70 different scanners, some of which are well-known, and some I've never heard of, and displays their output. Which, sometimes, contains false positives, and which may very well contain false negatives. The actual usefulness of virus scanners is being disputed, but alas it's as good as you can get. At least, virus scanners detect well-known threats relatively well.
Plus, there's the community thingie where users can give ratings, but it all comes down to trusting some unknown guy on the internet, so... bleh.

ad 3: Hardly. The best alternative is to never execute programs of unknown or even dubious origin. Anything else is reading tea leaves. Sure, some read the tea leaves better than others, but they are still only reading tea leaves. Dynamic analysis exists, pretty much every desktop AV software has that nowadays (running most system calls through a proxy lib), but whether it actually does something useful other than burn massive CPU is questionable. Virus Total may be somewhat better insofar as it runs a massive amount of scanners. Whether that actually increases security is uncertain. If you think about it: Assuming the scanners that are run are actually worth their salt, then a single one of them should do. On the other hand, if they're not worth their salt, what value is added by running more of them? Quoting an old wisdom: If you add nothing to nothing, the sum remains rather small. Or, as stated in Fidelio: "Nothing, if you add to nothing air".

If you got a file from a presumably trustworthy site, and you have zero hits on VT, then it's probably, usually, rather safe. If you keep the file around for two weeks and re-scan it again two weeks later, even better (assuming a new threat may become known in the mean time). That's what I'm doing, and I've done that years before VirusTotal even existed -- so far it works very well (or, it seems to work, I might have malware that I don't know about). In the end, you never know for sure.

About dynamic analysis
That's not happening. You can deduce what kind of analysis VirusTotal is doing from its display. It is almost certainly not just a forwarding of hashes to some other service (as suggested in a comment) because of the time it takes to scan, and because definition files are not always in sync (indeed most of the time they're not) with the manufacturers' tools. You can also unpack and re-pack a ZIP file (which almost certainly results in a non-identical file), and it will be scanned just fine. How would that work if only hashes are being handed around? It wouldn't... but it does. That, and the fact that sometimes one or another scanner fails to open an archive, indicates that actual scanning happens.
You can deduce that they do static signature matching and heuristic scanning because signature matching is what every AV does and has been doing for 30+ years by default, and heuristic analysis is what most (if not all) scanners do by default for at least 20 years, too.

On the other hand side, you can be pretty sure that no dynamic analysis is done because that is forbidding and impractical for a web service. In order to do dynamic analysis, you must run the binary in a secured environment (emulator, virtual machine or similar) which provides a complete operating system for the binary to run, and which is able to accomodate the memory and CPU requirements of a haphazard amount of a-priori unknown programs. That's hardly realistic. Plus, there is a very non-neglegible risk in running unknown software which could do basically everything including running malware services inside the VM, or breaking out of the VM. You know, for example a web browser or a mail client needs internet access to function properly. How do you provide that without also granting any malware that you run internet access? Google (the owner of VirusTotal) very certainly wouldn't like being accused of running malware services.
Lastly, you must have 100% coverage to be sure (else, the analysis is pretty pointless, malware does not necessarily do its work upon program launch), so you need either a human or the most advanced fuzzing robot in the world to provide adequate input to the program so it takes all possible paths. Do that for a web service which maybe ten million or so people use every day, and nobody likes to wait longer than a minute or two for the results, good luck.

`but it all comes down to trusting some unknown guy on the internet, so... bleh` - reminds me of some other sites... like Stack Exchange. — Esa Jokinen, Jul 12 '19 at 06:32
So does it only send the hash of the file to the "AV scanners"? if so, doesn't this mean i can easily bypass it by changing the file a bit? — OneAndOnly, Jul 12 '19 at 06:40
@OneAndOnly If you change the file by one bit, it'll upload the file in whole because the hash wouldn't match. The AVs would then be able to detect it as malicious. — forest, Jul 12 '19 at 06:43
@forest so if the hash doesn't match, AV scanners will do static analysis on it, correct? are you sure VirusTotal won't just compute the hash and send it to AV APIs? do you have any source that says this? ( even if this is the case, then packed binaries can easily bypass detection tho ) — OneAndOnly, Jul 12 '19 at 07:20
@OneAndOnly The hash alone is useless for doing full analysis. It's only useful if the hash proves that the file was already scanned so the cached result can be used instead of uploading the whole thing. — forest, Jul 12 '19 at 07:23
@forest yes i know, i just want to know how these AV scanners on virustotal check if the file is malicious if its hash is not there, will they do static analysis on the file? if so doesnt this mean packed binaries can bypass it? and do you have any source on this? — OneAndOnly, Jul 12 '19 at 07:25
They will do static (signature) and heuristic analysis on the file. — forest, Jul 12 '19 at 07:25
@forest so basically if its a packed/encrypted binary, virustotal is useless, since there is no way to know the payload without executing it in a sandbox (dynamic analysis), correct? — OneAndOnly, Jul 12 '19 at 07:26
I just wanna add that its a bit more complex than that.I have had some experiments with it .One of which is uploading a malware which wasnt detected at all.But exactly after 8 days it was detected by most antivirus.So if you upload it they might not detect your right away but in sometime — yeah_well, Jul 12 '19 at 12:01
_"so basically if its a packed/encrypted binary, virustotal is useless"_ if the binary is encrypted, then **every** AV is useless. If it's a self-decrypting binary, you might get a report of `TR/Crypt.ZPACK` or similar, depending on the packer, but that's pretty much it. There's no straightforward way of knowing what's in the binary. Executing in a sandbox doesn't necessarily help, either. It depends on what the malware is doing. You might or might not be able to detect something. The only safe thing, when in doubt, is to _not run_ an unknown binary. Better yet, do not even download it... — Damon, Jul 12 '19 at 13:09
@forest Now I'm imagining an 'attack' where you create a completely harmless executable which is a collision for your evil malware... I wonder what hash algorithm they use, and if it's vulnerable to collisions? — Nic, Jul 12 '19 at 15:18
Many virus scanners work the same way as VirusTotal, they hash the file then upload it if an unknown hash. Windows does this in the background and will upload to Microsoft any executables it finds that have an unknown hash. — user10216038, Jul 12 '19 at 15:39
@NicHartley That's a realistic attack vector, and a major reason why MD5 is a bad choice for antivirus. A malware author could create a collision between a malicious and benign executable, then send in the benign executable to the AV company's false positive report contact, causing them to whitelist both the benign and malicious software. — forest, Jul 14 '19 at 04:41

score 1 · Answer 2 · answered Jul 12 '19 at 16:42

Just to add to the previous answer, which covers pretty much everything -- VirusTotal actually does sometimes do dynamic analysis, especially if the file is commonly checked or initially looks suspicious. They started doing this in 2017, and you can get more information on it directly from them here. You can check to see if dynamic analysis has been run on the file you checked by going to the "Behavior" tab of your VirusTotal scan.

How do AV scanners in VirusTotal check if a file is malicious or not and how trustworthy is its report?

2 Answers2