26

I have a dataset from a malware detection project that others want to use. Part of that dataset is system binaries that I had retrieved from my PC by searching for *.exe files (to serve as a benign dataset). Is it safe to share these files or can they contain sensitive information about accounts/identity etc.?

Niket Bhodia
  • 369
  • 3
  • 4
  • Are any system (or other important) binaries .NET assemblies? Those are compiled to native code at runtime. Could this compilation process (NGEN I think) incorporate additional information which then gets cached and unintentionally shared? – StayOnTarget Mar 03 '19 at 13:31
  • 2
    Another thing to watch out for is the legality of what you intend to do. Copyright laws could be getting in the way of this. Sharing a virus infect executable could potentially be violating the copyright of both the author of the virus as well as the author of the original software. On the other hand it could be argued that this falls under the definition of fair use. A question about that could be on-topic for [law.se]. – kasperd Mar 03 '19 at 15:49
  • 3
    @kasperd Never heard of an "author of the virus" claiming or attempting to enforce copyright over the code in their virus. – fpmurphy Mar 04 '19 at 00:27
  • 1
    @fpmurphy Agree that there are copyright issues. The OP wan't to share original, uninfected files: "to serve as a benign dataset". – user71659 Mar 04 '19 at 04:06
  • 1
    @fpmurphy I mentioned both **the author of the virus** as well as **the author of the original software**. There is little doubt somebody owns copyright on the virus code. But there are multiple reasons why it's unlikely the owner of the virus is going to claim their ownership in court. The author of the original software is much more likely to make a claim in court, and if they are going to sue over distribution of the original uninfected files, they may have a good case. – kasperd Mar 04 '19 at 11:41
  • @kasperd. "There is little doubt somebody owns copyright of the virus code." Well, that depends on the jurisdiction of the author of the virus code. Not all jurisdictions allow implicit or automatic copyright. – fpmurphy Mar 04 '19 at 11:56

4 Answers4

32

Everybody's (me included) reflex answer to such a question will normally be: Huh huh huh (falls off chair). No! How would you think this could even work? Executables are signed nowadays, which prevents them from being modified!

However, if you consider "exe" files in general, not just those from a fresh naked Windows install, the answer must be: Careful!.

Some executables (an increasing number) are specially crafted for you. And yeah, they're signed, doesn't make a difference.

This includes at least some, but more likely most executables you downloaded from one of those modern software-as-a-service or online shop or whatever you call them things. Adobe, Steam, Office360, you name it.
I don't know the technical details about each and every of these, they're just examples that came to my mind as possible candidates. It is however certain, that among other methods, custom-signed executables exist (not just on PC, for example the Nintendo shop definitively works that way).

So, if your Windows system is not just a Windows system, but one that includes custom-signed (or what would one call it? custom-branded?) executables, then you may give out sensitive information.

Also, not all executables are the same version, and not all executables are present on every computer. Unless one also considers file creation dates, it is probably a bit far fetched to say that they provide a unique fingerprint, but this information could certainly be used to more or less identify your system, with a little error margin.

While in theory, there's probably enough combinations of features and versions to identify every atom in every computer, in practice most installs will have mostly the same features, and mostly the same versions. Which amounts to maybe a few dozen million real permutations. But still, if it's a problem that someone might tell that's this-and-that combination points to your specific computer with, say, 85-90% likelihood, then... be aware.
Mind you, it's not so much different with genetic analysis, although of course numbers are much bigger in that case. Folklore tells us that siblings are 50% genetically identical, but in reality, complete strangers are 98% genetically identical. That's because, well, they need these genes in that particular composition to even exist (you will be surprised how much you have in common with, say, a rat or a bunny!). But even if people are mostly identical in almost everything, there's enough in the small, remaining bit to tell quite a lot about someone.

Damon
  • 5,001
  • 1
  • 19
  • 26
19

Windows system executables do not contain any sensitive information. They may reveal the version of the operating system you are using, but personal information is not stored in executables. Instead, it is stored in configuration files or databases kept throughout the system. While it would be theoretically possible to store sensitive information in executables, I can't think of any reason it would be done.

forest
  • 64,616
  • 20
  • 206
  • 257
9

They can contain file paths from the system they were compiled on, which may be sensitive if these are programs you compiled on your own system.

user541686
  • 2,502
  • 2
  • 21
  • 28
  • 4
    I'm assuming OP hasn't compiled his own copy of Windows locally. – forest Mar 03 '19 at 04:50
  • 4
    @forest: Very true, but I'm worried searching for `*.exe` files might result in more files than true OS binaries. You'll find other binaries that way too, especially if you planted any inside `System32` yourself for whatever reason. – user541686 Mar 03 '19 at 04:54
  • 4
    That's a good point. And of course, some people create self-unpacking archives with 7zip that are exe files and which may contain sensitive data. – forest Mar 03 '19 at 05:02
  • @Mehrdad, thanks for the reply. Yes .exe files of other installed s/w are also in this set. But it's unlikely they would contain any sensitive information, correct? And I have not compiled any binaries myself. Just checking out of caution. – Niket Bhodia Mar 03 '19 at 09:55
  • 2
    @NiketBhodia: If you haven't built the EXEs yourself, then they indeed should not contain any sensitive information *from your machine*. But for example, if an EXE comes from your work's IT department, and they built it themselves, then maybe the company name is somewhere there. You need to think through who may have built each EXE and whether they might have information that is indirectly or directly associated with you. – user541686 Mar 03 '19 at 09:58
  • @NiketBhodia: Also, note that the set of software you have installed is in some sense a fingerprint of your machine. So that itself is potentially indirect identifying information about you, although it doesn't directly lead to you on its own. Whether this is sensitive or not really depends whom or what you're trying to protect against. – user541686 Mar 03 '19 at 10:00
1

Internal / company specific applications may well contain sensitive algorithms (eg pricing/discounting rules, fraud detection). They might be analysed by hostile parties for security flaws.

Revealing which versions of Commercial / third party applications are actively used (especially if not fully up-to-date with security patches) may also allow hostile parties to target your company using known vulnerabilities in those versions.

Gary
  • 884
  • 7
  • 12