77

For example, say the following are HTTPS URLs to two websites by one IP over 5 mins: "A.com/1", "A.com/2", "A.com/3", "B.com/1", "B.com/2".

Would monitoring of packets reveal:

  • nothing,
  • reveal only the IP had visited "A.com" and "B.com" (meaning the DNS only),
  • reveal only the IP had visited "A.com/1" and "B.com/1" (the first HTTPS request for each site),
  • reveal a complete list of all HTTPS URLs visited,
  • only reveal IP's of "A.com" and "B.com",
  • or something else?

Related Question: can my company see what HTTPS sites I went to?

While this question does have additional information, it as far as I'm able to tell does not address specifically the scenario of "reveal only the IP had visited "A.com/1" and "B.com/1" (the first HTTPS request for each site)" - though possibility being wrong about this is high, and happy to delete the question if it's a duplicate.


NOTE: This is a followup question to an answer that was posted to as: Why is HTTPS not the default protocol?

blunders
  • 5,052
  • 4
  • 28
  • 45

3 Answers3

92

TLS reveals to an eavesdropper the following information:

  • the site that you are contacting
  • the (possibly approximate) length of the rest of the URL
  • the (possibly approximate) length of the HTML of the page you visited (assuming it is not cached)
  • the (possibly approximate) number of other resources (e.g., images, iframes, CSS stylesheets, etc.) on the page that you visited (assuming they are not cached)
  • the time at which each packet is sent and each connection is initiated. (@nealmcb points out that the eavesdropper learns a lot about timing: the exact time each connection was initiated, the duration of the connection, the time each packet was sent and the time the response was sent, the time for the server to respond to each packet, etc.)

If you interact with a web site by clicking links in series, the eavesdropper can see each of these for each click on the web page. This information can be combined to try to infer what pages you are visiting.

Therefore, in your example, TLS reveals only A.com vs B.com, because in your example, the rest of the URL is the same length in all cases. However, your example was poorly chosen: it is not representative of typical practice on the web. Usually, URL lengths on a particular site vary, and thus reveal information about the URL that you are accessing. Moreover, page lengths and number of resources also vary, which reveals still more information.

There has been research suggesting that these leakages can reveal substantial information to eavesdroppers about what pages you are visiting. Therefore, you should not assume that TLS conceals which pages you are visiting from an eavesdropper. (I realize this is counterintuitive.)


Added: Here are citations to some research in the literature on traffic analysis of HTTPS:

D.W.
  • 98,420
  • 30
  • 267
  • 572
  • 1
    +1 @D.W.: Selected your post as the answer. To me, the n-gram block attack on AJAX-based HTTPS transactions is not surprising if you generalize the threats presented, though agree that it clearly high-lights how serious the issue might be in some cases. Thanks for posting the links, really improved the quality of your answer in my opinion. Cheers! – blunders Jun 08 '11 at 03:22
  • Awesome answer! I think you should include the fact that the attacker knows a LOT about timing: the exact time of the connections, duration of the connection, time of each packet back and forth, time for the server to respond to each packet, etc. It's obvious in one respect, but also can be mined for information in many ways, as I assume the references explain in detail. – nealmcb Jun 08 '11 at 05:00
  • @D.W. After 4 years, Does HTTPS still leaking these kinds of length info? – user2203703 Nov 07 '15 at 18:35
  • 1
    @user2203703, yes. – D.W. Nov 08 '15 at 03:17
  • So to confirm - a querystring sent to an SSL site - can't be viewed by an evesdroper? – niico Feb 01 '17 at 10:44
  • 1
    @niico, correct, the query string is encrypted and can't be viewed directly (but other things can be, and if an eavesdropper can infer something about the query string based on those other things, that would be just as bad). – D.W. Feb 01 '17 at 17:49
  • Does the arrival of HTTPS/2 change spying capabilities? – stwissel May 30 '17 at 05:12
23

The second choice. Mostly.

When a browser visits a HTTPS web site, it establishes a TLS tunnel, which involves an asymmetric key exchange (client and server agree on a shared secret). That key exchange mechanism uses the server public key, which the server shows as part of his certificate. The server certificate contains the server name (e.g. A.com) and the client verifies that the name matches the one it expects (i.e. the server name in the URL). The server certificate is sent, fatally, before the key exchange, hence in plain view.

The rest of the URL is sent as part of the HTTP request which occurs within the encrypted tunnel, hence invisible to third parties. A given tunnel may be reused for several other HTTP requests, but (by construction) they are all for the same server (the same domain name).

Thomas Pornin
  • 320,799
  • 57
  • 780
  • 949
  • 1
    The cert must contain a subject name (in CN or SAN as applicable) that either is the hostname from the URL _or_ is a wildcard that matches the hostname, but only the leftmost/lowest DNS label can be wild so if not the actual hostname this is at least hierarchically near it. – dave_thompson_085 Jan 14 '18 at 02:48
-3

This is actually a vague question. This is why. When you access an https server, http (veiwed here that https is just http over TLS) is a higher layer than TLS which runs beneath. The first thing that is done is negotiate the TLS settings, like cipher suite, keys, handshake etc. This is done on the https port but there is no http data yet. Then the client or server changes to encryption mode where everything is encrypted.

After this negotiation is finished it turns to application data which is just plain old http protocol as the payload.

But this data is encrypted so no URLs are shown. However, as is commonly known, the IP address of the server and client are NOT encrypted because it is used not in TLS layer but the IP layer which is below TLS and this a lower level. TLS is the payload of the IP packet which contains the IP address, port number, IP protocol like tcp, etc as headers. Therefore because only TLS is encrypted, these items are not encrypted.

Additionally, eavesdropping is not an issue provided the server's and or client's certificate can be "linked" to root authority, or has a valid certificate.

Finally I'd like to say that TLS and thus HTTPS are actually a framework and algorithm for negotiating highest security level, based on client and server preferences and minimum supported frameworks. Basically, TLS doesn't define the actual encryption used. These are cipher suites which are typically regulated in a separate RFC-type setting than the TLS protocol. Thus based on HTTPS alone, it is not enough to say the quality of security in the areas which it provides. Only some minimum security and difficulty can be assumed. The actual quality of the cipher suites is a complex question, and specific to each kind, because there are many relying on completely different mechanisms.

Edit Also it has been brought to my attention that the Server Name Indicator (used for multiple servers which share an IP address) extension of TLS does easily tell to anyone the domain name of the server accessed. In only the first messages sent to the server, a field will contain the ASCII text of the domain name like "google.com". So this information is easily seen by anyone monitoring the first packet. This is a choice which is common among many web site hosts these days. But no URL's should be unsecured.

Finally it really depends on the cipher suits, they are not all equal, like the default cipher suite which is no encryption. Then anyone would see plain http if your browser and the website are configured to support those suits. So depending on your web browser and as well as the server, anywhere from only the IP address, to the IP address as well as the domain name (TLS with SNI extension also sometimes it can be worked out otherways which is less easy), to everything in some other cases depending on strengths of the cipher suits.