1

I have a classic C forking proxy in my LAN; now that I know "how to socket", I'd like to grant some privacy to it.

For instance, the browser extension HTTPS Everywhere (from its Wikipedia page),

It automatically makes websites use the more secure HTTPS connection instead of HTTP, if they support it.

If I'd use OpenSSL in my proxy, I'd make crypted connections: if my proxy receives a GET request (plain HTTP request for an unsafe connection), with OpenSSL I would protect that connection adding a layer of TLS to it.

But as reported in my quote above, HTTPS Everywhere makes a website use HTTPS connection if the site support it.

So, how can I check (with C code) if a given website supports HTTPS?

...please don't tell me it just needs to attempt a connection to remote server's port 443.

Anders
  • 64,406
  • 24
  • 178
  • 215
  • I would guess it just attempts a connection to remote server's port 443 – John Dvorak Dec 16 '16 at 08:19
  • ... and yet, connecting to the remote site on port 443 is pretty much all you need to do. – Stephane Dec 16 '16 at 08:19
  • To expend on that: if you have a web site running on the default port at a given host name, then it will use port 443 for HTTPS. If the web site does NOT use the default port there is no way to know what port will be used for HTTPS (if any) except attempting a handshake on every possible port: that far too slow for proxying and might get you locked out by IDSs – Stephane Dec 16 '16 at 08:21
  • You might also want to search for "how does TLS works". The answer describes the handshake. Finally, this isn't the right place to ask for sample code. – Stephane Dec 16 '16 at 08:22
  • Of course, I had no intention of asking for code: I posted my question here because I needed the theory before proceeding. Thanks for the precise answer @Stephane, you might want to turn your comments into an answer, so that I may accept it – Andrea Mazzocchi Dec 16 '16 at 08:25
  • The answer really is that it tries the HTTPS connection on port 443. The HTTPS Everywhere site says that they use regex to rewrite HTTP requests. – schroeder Dec 16 '16 at 08:27
  • Https Everywhere has a set of rules for sites. Sometimes hard-coding a whitelist is the easiest way to avoid false positives. – Xiong Chiamiov Dec 16 '16 at 08:47
  • I read https://www.eff.org/https-everywhere/rulesets: lot of things indeed, but for my novice code this will be enough. – Andrea Mazzocchi Dec 16 '16 at 08:55

2 Answers2

2

Connecting to the remote site on port 443 is pretty much all you need to do.

If you have a web site running on the default port at a given host name, then it will use port 443 for HTTPS. If the web site does NOT use the default port there is no way to know what port will be used for HTTPS (if any) except attempting a handshake on every possible port: that far too slow for proxying and might get you locked out by IDS.

You might also want check the answer to "how does TLS works" which describes the handshake in detail.

Stephane
  • 18,557
  • 3
  • 61
  • 70
0

EFF's HTTPS Everywhere doesn't have any magic to determine if a site supports HTTPS.

What it does have is a really big list of URL rewriting rules. When you feed a HTTP URL in on one end, what comes out at the other end is either the same HTTP URL that you started with, or a corresponding HTTPS URL. Note that the corresponding HTTPS URL may be different from the HTTP URL.

Consequently, if you are simply trying to replicate the functionality of HTTPS Everywhere, consider simply using the HTTPS Everywhere ruleset directly. It's fairly well documented how to write these rules, so interpreting them in your own code should be relatively easy. (C might not be the easiest language for working with regular expressions, but I suspect libpcre and friends may come in handy.)

This works far better than naiively replacing http:// with https:// at the beginning because:

  • Not all sites serve the same content over HTTP and HTTPS
  • Not all sites are properly configured for HTTPS with a CA-signed certificate

If you combine the HTTPS Everywhere list with the HSTS Preload list, you should be covering a significant percentage of popular, HTTPS-capable web sites with a minimum of work and a minimal risk of problems.

user
  • 7,670
  • 2
  • 30
  • 54