5

Consider a user accessing a site that is using HTTPS for all its traffic.

A hacker is trying to use a man-in-the-middle to snoop on the user. What information can he glean?

Obviously the content is encrypted, and we'll assume he can't decrypt it, but what can he learn without having to do that?

The kind of things I'm thinking of are:

  • The fact that the user is going to the site at all. I'm guessing that there would likely have been a DNS request for the domain name, and that request wouldn't have been encrypted, so the hacker knows at the very least that the user is accessing this specific site.

  • URLs - Are the actual URLs of the request encrypted as well as the content? If not, some URLs may contain useful information for the attacker (ie which pages have been requested, ID numbers for requested data, etc)

  • The size of the transmitted data: If the hacker knows what the site does and what is expected to be downloaded or posted to it, I would guess he'd be able to work out roughly what the user is doing just by the data size of each https request/response. For instance, if the site's purpose is to allow users to download protected documents, the hacker could deduce which of the documents on the site the user has downloaded.

  • Request/response timings: Similar to the above, if the hacker has knowledge of the site, and knows that a particular page has a slow response time, he would be able to deduce when the user went to that page.

Most of the above relies on the hacker having some existing knowledge of the site, so this isn't a casual hacker we're talking about; this is specific targeting of the site and/or the individual.

How much of the above is actually feasible? Would I be right to worry about them if I'm developing a sensitive site? Are there any other angles I haven't thought of?

SilverlightFox
  • 33,408
  • 6
  • 67
  • 178
Simba
  • 301
  • 1
  • 6

3 Answers3

6

In an SSL connection, the GET or POST portion is encrypted. For example: visitor: https://www.yoursite.com/shoppingcart.aspx

www.yoursite.com          visible 
GET /shoppingcart.aspx    encrypted
HTTP/1.0                  encrypted

What you're thinking about is called an inference attack, where with little bits of information, an attacker can put together pieces of a puzzle. So you ask:

If the hacker knows what the site does and what is expected to be downloaded or posted to it, I would guess he'd be able to work out roughly what the user is doing just by the data size of each https request/response.

This is not necessarily the case. Again, since both a GET and a POST is encrypted, it would be a worthless guessing game to an extent. Consider the following:

Example A

Site contains 1mb file called topsecret.docx
Visitor uploads a 1mb video of cats

What you will see:

Site <--> 1mb session <--> Visitor

Because the GET and POST are both encrypted, there is no way for you to determine if the user downloaded or uploaded. All you are seeing is a 1mb exchange. Any attacker would be wasting a lot of time and resources playing a guessing game however, consider a third site in the mix.

Example B

Site A contains 1mb file called topsecret.docx
Visitor does something
Visitor visits Site B
Site B now contains topsecret.docx

What you will see (of course via network sniffing):

   Visitor <--> Site A (1mb session)
   Visitor <--> Site B (1mb session)

If you could find the same documents on both sites, you could infer that visitor went to site a, downloaded a file, then visited site b where the file appeared. You could md5 it to further document proof however, you'd need to find the initial file. Otherwise visitor could have went to both sites, and uploaded or downloaded cat videos.

It would be easier for someone with the resources to pull off this kind of attack, to just outright MiTM the sites with stolen certs, sslstrip, or some other trickery versus playing a guessing game.

MODIFIED TO ANSWER Lekensteyn's comment

Lekensteyn, I added a fictitious example to illustrate a point. And it still holds true, so I will give you another example

Site A contains 2 files (File1 (1mb) and File2 (200mb)

Visitor <--> download File2 <--> SiteA 

In the above, the visitor is downloading a 200mb file and upon starting he stops the session at say 1mb. Because you cannot see what he did, traffic analysis shows the session was a 1mb connection. What you will see:

Visitor <--> 1mb session <--> Site A

Would you be willing to bet your last dollar the visitor downloaded File1 based on the size? In my example, I used a crude mechanism to illustrate my answer. There was a lot I could have gotten into detail with but I choose not to for brevity

munkeyoto
  • 8,682
  • 16
  • 31
  • Thanks for the answer. It is helpful. I'm puzzled that the MitM wouldn't be able to differentiate between the request and response; I was expecting that to be possible at least. But if it isn't, then you're right; it certainly makes things harder. – Simba Nov 25 '14 at 15:07
  • 2
    While an inference attack is possible, the following details are wrong. You can clearly see the direction of a transfer and guess that a large file was uploaded or downloaded. Usually you cannot tell which file exactly was transferred, except when you have additional information such as the size distribution of files on a server, the media as included in a page combined with timing information. What do you mean by the example of siteA and siteB? When did the file get transferred? – Lekensteyn Nov 25 '14 at 15:09
  • 1
    Your edit does not take away the confusion. How did the document topsecret.docx end up at site B? Is "visitor does something" equal to "visitor uploads topsecret.docx to site B"? There is a reasonable possibility to detect that the visitor aborted a download (assuming that the HTTP/1.1 server did not abort early and HTTP Keep-Alive). TLS does not have a special record type for acknowledgements, and "aborting a request" is often just tearing down the connection. – Lekensteyn Nov 25 '14 at 22:37
  • In the example, visitor makes connection to Site A at 1mb, then visits Site B for a 1mb session where topsecret.docx appear{s,ed} – munkeyoto Nov 26 '14 at 00:40
2

With MitM over a HTTPS website, the greatest threat comes from the ability to replace the site's SSL cert with his rogue one. Yes, the user will receive a warning that the site does not match the cert but users are likely to just click continue. Once the cert has been replaced, he can decrypt all communication as he has the private key for his own cert.

If we assume that he does not replace the cert then you are right in saying that there is not much he can glean from eavesdropping on the connection. I believe you have covered most of the points.

limbenjamin
  • 3,944
  • 50
  • 72
  • 1,281
  • Hi. Thanks for the reply. The one I'm most worried about from the perspective of gleaning useful information is the URL. Very often a URL will contain ID numbers or other data that could be useful. Is the URL encrypted with the rest of the request or not? – Simba Nov 25 '14 at 12:28
  • 1
    URLs are encrypted. The reminder of your attacks are classical traffic analysis. – Bruno Rohée Nov 25 '14 at 12:33
  • The TLS SNI extention would leak the host name the user is trying to access as far as I understand. – Darsstar Nov 25 '14 at 13:00
2

The fact that the user is going to the site at all. I'm guessing that there would likely have been a DNS request for the domain name, and that request wouldn't have been encrypted, so the hacker knows at the very least that the user is accessing this specific site.

Yes, the DNS request will reveal the host name if the DNS request can be MITM'd and the destination IP and port of the HTTPS connection is also visible in cleartext in order to be routed to the server. If SNI is enabled on the client then the domain name is also transmitted in cleartext, if not the SubjectAltNames returned in the certificate will indicate either one domain name, or a small list of possible domain names that might be easy to narrow down depending on knowledge the attacker may have from other sources.

URLs - Are the actual URLs of the request encrypted as well as the content? If not, some URLs may contain useful information for the attacker (ie which pages have been requested, ID numbers for requested data, etc)

URLs are private during an HTTPS session. So if the user goes to https://example.com:444/buyThing/thing.php?id=123 it would only be possible for the MITM to determine that the destination was example.com on port 444 over TLS.

The size of the transmitted data: If the hacker knows what the site does and what is expected to be downloaded or posted to it, I would guess he'd be able to work out roughly what the user is doing just by the data size of each https request/response. For instance, if the site's purpose is to allow users to download protected documents, the hacker could deduce which of the documents on the site the user has downloaded.

Yes, it is possible for the amount of data to be used in a side channel attack.

Request/response timings: Similar to the above, if the hacker has knowledge of the site, and knows that a particular page has a slow response time, he would be able to deduce when the user went to that page.

Yes, this is true and is like the timing/side channel vulnerability described here.

How much of the above is actually feasible? Would I be right to worry about them if I'm developing a sensitive site? Are there any other angles I haven't thought of?

Well they are all feasible. The domain name being revealed is more of a privacy issue for the individual rather than a security one. If they themselves are bothered about this they would have to use a service such as TOR.

See this answer for some of my other insights as to what a MITM may be able to see such as URLs in the referer header if this data is handled badly by the site and/or browser.

Other types of side channel attacks like these can also be executed, like where an autocomplete over HTTPS can lead to the characters being determined due to the small size of the encrypted data.

SilverlightFox
  • 33,408
  • 6
  • 67
  • 178