82

I'm reading through the extensive description on which data is acquired by Microsoft's telemetry 1 including the following paragraph:

User generated files -- files that are indicated as a potential cause for a crash or hang. For example, .doc, .ppt, .csv files

I was wondering whether Microsoft actually gathers data from a Word document, in case word crashes (hope on being wrong on this one).

Is Microsoft getting the 'whole' file, only a paragraph or am I misreading that part of the documentation?

usr-local-ΕΨΗΕΛΩΝ
  • 5,310
  • 2
  • 17
  • 35
VoodooCode
  • 713
  • 1
  • 5
  • 6
  • 3
    I don't think .doc files are very common these days. Isn't it a 1990s thing? (.docx today?) – Peter Mortensen Mar 01 '19 at 15:12
  • 8
    Note that this document is specific to what may be gathered for full-level diagnostic data. If you've set your diagnostic data level to basic, this data is not subject to being gathered by telemetry. https://docs.microsoft.com/en-us/windows/privacy/basic-level-windows-diagnostic-events-and-fields-1809 – Xander Mar 01 '19 at 15:50
  • 2
    Do not forget about malware scanners, they normally explicitely ask to transfer suspicious content if cloud scanning/intelligence is activated. – eckes Mar 02 '19 at 00:59

2 Answers2

68

Here is what they spy on, finally officially admitted after being proved again and again by different independent sources. That should make a pretty good idea on what actually is transmitted.

To actually see what's being reported you can give yourself permissions for %ProgramData%\Microsoft\Diagnosis directory and look what's in there, but the file are encrypted which is a very suspicious thing.

What you can look at in the newer version is the Diagnostic Data Viewer. But that does NOT guarantee or prove that there is documents privacy in any way.

At this point my guess is that they will transmit parts of files that generated crashes, or if they consider proper to do so and definitely can transmit any type of document via the encrypted content in \Diagnosis and https as the transmission way.

Their EULA states:

Finally, we will access, disclose and preserve personal data, including your content (such as the content of your emails, other private communications or files in private folders), when we have a good faith belief that doing so is necessary to: comply with applicable law or respond to valid legal process, including from law enforcement or other government agencies; 2. protect our customers, for example to prevent spam or attempts to defraud users of the services, or to help prevent the loss of life or serious injury of anyone; 3. operate and maintain the security of our services, including to prevent or stop an attack on our computer systems or networks; or 4. protect the rights or property of Microsoft, including enforcing the terms governing the use of the services - however, if we receive information indicating that someone is using our services to traffic in stolen intellectual or physical property of Microsoft, we will not inspect a customer's private content ourselves, but we may refer the matter to law enforcement.

Conclusion: they can and will do it at will.

Esa Jokinen
  • 16,100
  • 5
  • 50
  • 55
Overmind
  • 8,779
  • 3
  • 19
  • 28
  • 17
    While the answer is actually "yes, they could" the EULA snippet you cited has nothing to do with that. To investigate a crash has NOTHING to do with 1,4. Also note that crash data is an opt-in while for points mentioned in EULA you basically give them the rights to do what they want but only in those very specific circumstances (that _"...at will"_ is incredibly misleading, IMHO). – Adriano Repetti Mar 01 '19 at 13:08
  • Since they can do make transfers and everything is encrypted how do you know if they will be nice guys and only do it when legally allowed to ? – Overmind Mar 01 '19 at 13:17
  • 7
    Is it a serious question? Because it'd be a HUGE law infringement, and - on the contrary of _cloud_ services - they distribute the evidence (virtually anyone can inspect the decompiled source code). Given that MS is not an anonymous developer hidden somewhere in world...there are MUCH more chances that any on-line service is misusing your data (oh well, they actually tell you that they do then...) or just some obscure desktop (or mobile...) app... – Adriano Repetti Mar 01 '19 at 13:44
  • Law infringement must be proven. And you can't since they encrypt the content. Online services are a different story and their usage is lesser that the usage of W10, but yes, you are right about them. Look what facebook previously did and what... they got a fine and added some text in their EULA. – Overmind Mar 01 '19 at 13:52
  • 2
    @AdrianoRepetti Well, Microsoft's [website](https://docs.microsoft.com/en-us/windows/privacy/windows-diagnostic-data) says that the linked "article describes all types of diagnostic data collected by Windows at the Full level". Under section "Product and Service Performance data", subsection "Data Description for Product and Service Performance data type" the following is listed: "User generated files -- files that are indicated as a potential cause for a crash or hang. For example, .doc, .ppt, .csv files" – VoodooCode Mar 01 '19 at 13:54
  • And since pretty much any file type can cause crashes... – Overmind Mar 01 '19 at 13:55
  • 1
    @Overmind where data is encrypted? Client side? Where the code is? Client side? Then you have everything you need to determine WHAT is collected (if you really wish so), you do not need to decrypt anything. Also note that any memory dump (even for Open Source software, think about Apport) potentially collect extremely sensitive data. BTW no, they collect data files for their apps (if Photoshop crashes then MS won't receive the pictures of your birthday party). – Adriano Repetti Mar 01 '19 at 14:00
  • @VoodooCode yes, they can. My first comment was to disagree about the citation (EULA) which has nothing to do with telemetry data (in fact it gives them MUCH more power but only in well determined circumstances). – Adriano Repetti Mar 01 '19 at 14:02
  • 40
    `the file[s] are encrypted which is a very suspicious thing.` Why? They're copies of documents you already own and control, and the OS can already read them and extract diagnostic (and personal) data if they so choose. It makes perfect sense to encrypt private data before sending it over the internet. The _fact that they're sending it_ is suspicious, but not the encryption. – brichins Mar 01 '19 at 15:52
  • 4
    @brichins The fact that they store it encrypted on the very machine on which it was generated is suspicious. Sure, they can and should encrypt it in transit, but encrypting it at rest on the user's machine primarily just prevents the user from seeing what is in it. – David Schwartz Mar 01 '19 at 19:33
  • 10
    @DavidSchwartz Encrypting it at rest is useful if the user deletes the original file, in which case they wouldn't expect to have a readable copy of it sitting on their disk still. – Chris Hayes Mar 01 '19 at 20:11
  • 2
    @ChrisHayes That only makes sense if the file has sensitive content in it. Which makes it even more important that the user know what's in it. So if that argument applies, the arguments in the other direction apply more strongly. – David Schwartz Mar 01 '19 at 20:34
  • 10
    @DavidSchwartz Encryption at rest is nearly always a good thing, especially (as in this case) if the contents are a) unknown but potentially sensitive and b) not intended for user consumption or immediate use. As Chris pointed out, the user has reason to expect that if they delete something, it's gone - not duplicated out of sight. Also, diagnostic info should be kept around even (perhaps _especially_) if the source data has been removed. – brichins Mar 01 '19 at 20:39
  • 11
    This encryption is in no way suspicious; rather, it is evidence that whoever designed this process built a proper threat model, analyzed it appropriately, and correctly implemented good mitigations against likely vulnerabilities. – Eric Lippert Mar 02 '19 at 01:38
  • 4
    I've downvoted for the unnecessary conspiracy theory overtones. – isanae Mar 02 '19 at 02:37
  • 1
    @EricLippert: One could say that, but one could also argue that MS is hiding what "evil and illegitimate conspirative thing" they are doing from the legitimate user's eyes. If the encryption was about mitigating threat (which is fine), there should be a way for the computer's owner -- presumably after elevating to Admin -- to access that data, and more importantly control what they access, shouldn't it? Making data unreadable for others and making it unreadable for the legitimate owner are different things. Nothing prevents them from stealing trade secret as it is, and there's no evidence. – Damon Mar 03 '19 at 12:12
  • @DavidSchwartz Honestly, I would find it significantly more suspicious if some of the files were encrypted and others weren't, since that would indicate that the OS analyses the files' contents before sending them, and does so well enough to determine which ones either need added security, or are private information that the user _doesn't_ want sent (and thus need to be hidden from the user). Blanket encryption could indicate that it tries to hide everything, or that it doesn't analyse and just assumes every file is the Colonel's secret recipe. – Justin Time - Reinstate Monica Mar 03 '19 at 19:09
38

Memory dumps often have document contents

It's worth noting that if you're sending a memory dump of a crashed application at the moment of its crash (which is a reasonable way of analyzing crashes) then that memory dump is very likely to include the contents of whatever document(s) were opened in that app at the time. So if you're "just" sending app crash debug information, then that by necessity means that sometimes you're also sending confidential user documents in it.

Peteris
  • 8,369
  • 1
  • 26
  • 35