16

Opening a PDF link in the browser (e.g. google chrome with the ootb PDF viewer plugin) apparently indicates that when the PDF is hosted on a cloudflare-facing domain there is additional data present in the embed code.

Inspecting the page source of a displayed PDF file with chrome dev tools shows some 'reporting' URL when the PDF is behind cloudflare e.g. https://a.nel.cloudflare.com/report/v3?s=%2BW057P981N7Esg... (see the second code block).

PDF embed of a file NOT served via cloudflare:

<embed id="plugin" type="application/x-google-chrome-pdf" src="https://www.w3.org/WAI/ER/tests/xhtml/testfiles/resources/pdf/dummy.pdf" stream-url="chrome-extension://mhjfbmdgcfjbbpaeojofohoefgiehjai/f02f891e-7fd9-4857-8a34-f4e05abb87f8" headers="accept-ranges: bytes
cache-control: max-age=21600
content-length: 13264
content-type: application/pdf; qs=0.001
date: Sun, 05 Sep 2021 08:17:57 GMT
etag: &quot;33d0-438b181451e00&quot;
expires: Sun, 05 Sep 2021 14:17:57 GMT
last-modified: Mon, 27 Aug 2007 17:15:36 GMT
strict-transport-security: max-age=15552000; includeSubdomains; preload
x-backend: ssl-mirrors
" background-color="4283586137" javascript="allow" full-frame="" pdf-viewer-update-enabled="">

PDF embed for a file that IS served via cloudflare:

<embed id="plugin" type="application/x-google-chrome-pdf" src="https://www.cloudflare.com/static/839a7f8c9ba01f8cfe9d0a41c53df20c/cloudflare-cdn-whitepaper-19Q4.pdf" stream-url="chrome-extension://mhjfbmdgcfjbbpaeojofohoefgiehjai/fab5433b-5189-4469-91bb-fe144b761c7f" headers="accept-ranges: bytes
age: 105287
alt-svc: h3-27=&quot;:443&quot;; ma=86400, h3-28=&quot;:443&quot;; ma=86400, h3-29=&quot;:443&quot;; ma=86400, h3=&quot;:443&quot;; ma=86400
cache-control: max-age=8640000
cf-cache-status: HIT
cf-ray: 689e1d381a951501-MAD
content-length: 921473
content-type: application/pdf
date: Sun, 05 Sep 2021 08:33:41 GMT
etag: static/839a7f8c9ba01f8cfe9d0a41c53df20c/cloudflare-cdn-whitepaper-19Q4.797a721498.pdf
expect-ct: max-age=604800, report-uri=&quot;https://report-uri.cloudflare.com/cdn-cgi/beacon/expect-ct&quot;
nel: {&quot;success_fraction&quot;:0,&quot;report_to&quot;:&quot;cf-nel&quot;,&quot;max_age&quot;:604800}
report-to: {&quot;endpoints&quot;:[{&quot;url&quot;:&quot;https:\/\/a.nel.cloudflare.com\/report\/v3?s=Bi6bZw6jf1FJoimuy2arirenUDiwyZX%2B%2B1Ty506xD9qMJ5UggIvZAy2h8gKogsJORkPlWdnZ12udf6CN%2BadaEF0FRKFAyZQabI6xkui0%2FrV%2BaCFsp7BmbEHnoLk0HPmJ6pMeMQ%3D%3D&quot;}],&quot;group&quot;:&quot;cf-nel&quot;,&quot;max_age&quot;:604800}
server: cloudflare
strict-transport-security: max-age=31536000
vary: Accept-Encoding
x-content-type-options: nosniff
x-frame-options: SAMEORIGIN
x-xss-protection: 1; mode=block
" background-color="4283586137" javascript="allow" full-frame="" pdf-viewer-update-enabled="">

Question

Does this imply that cloudflare is rewriting the HTML source for PDF embeds and tracking PDF files opened through the browser PDF plugins? What are the security/privacy implications of this? Would disabling the browser PDF embed plugin reduce the amount of data collected by cloudflare?

What is particularly confusing is that the <embed/> code is supposedly generated by the PDF browser plugin and NOT from the incoming response so how can this rewriting be happening specifically for cloudflare?

ccpizza
  • 291
  • 2
  • 8

2 Answers2

34

This is not about HTML. This is the HTML of Google Chrome, and Cloudflare controls the response HTTP headers, as it should, since it's the HTTP server responding to the request.

The Report-To header is part of Content Security Policy security features.

Denis
  • 456
  • 4
  • 5
20

No, this does not look like a security or privacy issue.

It seems your PDF viewer is generating an <embed> element and is adding a non-standard headers attribute. This attribute seems to contain HTTP response headers, so just anything the server of the PDF file sends back. For example, this contains an ETag for caching, and various security-related headers.

Cloudflare provides various features for its customers that involve HTML and HTTP rewriting. For example, it can absolutely inject links if configured that way (e.g. through a Cloudflare Worker). Cloudflare is in a MITM position and can inject arbitrary code and already track all requests. This is an essential aspect of their services.

But the report-to header is not used for tracking purposes. It merely provides an optional way for the browser to report problems with the website to the website operator. This can include information about deprecated browser features, Content Security Policy (CSP) violations, or networking problems. See their article Understanding Network Error Logging for an example use case. Since most websites do not run a server that can collect and analyze CSP reports, Cloudflare inserts a reporting URL by default. Cloudflare can also use reports about networking and DNS problems to improve stability of their services, thus benefiting their customers.

IMSoP
  • 3,780
  • 1
  • 15
  • 19
amon
  • 1,068
  • 7
  • 9