That's a definitive use case for a proxy. A normal proxy, not a reverse-proxy (aka. load balancers).
The most well-known and free and open-source is squid. Luckily it's one of the few good open-source software that can easily be installed with a single apt-get install squid3
and configured with a single file /etc/squid3/squid.conf
.
We'll go over the good practices and the lessons to known about.
The official configuration file slightly modified (the 5000 useless commented lines were removed).
# WELCOME TO SQUID 3.4.8
# ----------------------------
#
# This is the documentation for the Squid configuration file.
# This documentation can also be found online at:
# http://www.squid-cache.org/Doc/config/
#
# You may wish to look at the Squid home page and wiki for the
# FAQ and other documentation:
# http://www.squid-cache.org/
# http://wiki.squid-cache.org/SquidFaq
# http://wiki.squid-cache.org/ConfigExamples
#
###########################################################
# ACL
###########################################################
acl SSL_ports port 443
acl Safe_ports port 80 # http
acl Safe_ports port 21 # ftp
acl Safe_ports port 443 # https
acl Safe_ports port 1025-65535 # unregistered ports
acl CONNECT method CONNECT
#####################################################
# Recommended minimum Access Permission configuration
#####################################################
# Deny requests to certain unsafe ports
http_access deny !Safe_ports
# Deny CONNECT to other than secure SSL ports
http_access deny CONNECT !SSL_ports
# Only allow cachemgr access from localhost
http_access allow localhost manager
http_access deny manager
#####################################################
# ACL
#####################################################
# access is limited to our subnets
acl mycompany_net src 10.0.0.0/8
# access is limited to whitelisted domains
# ".example.com" includes all subdomains of example.com
acl repo_domain dstdomain .keyserver.ubuntu.com
acl repo_domain dstdomain .debian.org
acl repo_domain dstdomain .python.org
# clients come from a known subnet AND go to a known domain
http_access allow repo_domain mycompany_net
# And finally deny all other access to this proxy
http_access deny all
#####################################################
# Other
#####################################################
# default proxy port is 3128
http_port 0.0.0.0:3128
# don't forward internal private IP addresses
forwarded_for off
# disable ALL caching
# bandwidth is cheap. debugging cache related bugs is expensive.
cache deny all
# logs
# Note: not sure if squid configures logrotate or not
access_log daemon:/var/log/squid3/access.log squid
access_log syslog:squid.INFO squid
# leave coredumps in the first cache dir
coredump_dir /var/spool/squid3
# force immediaty expiry of items in the cache.
# caching is disabled. This setting is set as an additional precaution.
refresh_pattern . 0 0% 0
Client Configuration - Environment Variables
Configure these two environment variables on all systems.
http_proxy=squid.internal.mycompany.com:3128
https_proxy=squid.internal.mycompany.com:3128
Most http client libraries (libcurl, httpclient, ...) are self configuring using the environment variables. Most applications are using one of the common libraries and thus support proxying out-of-the-box (without the dev necessarily knowing that they do).
Note that the syntax is strict:
- The variable name
http_proxy
MUST be lowercase on most Linux.
- The variable value MUST NOT begin with
http(s)://
(the proxying protocol is NOT http(s)).
Client Configuration - Specific
Some applications are ignoring environment variables and/or are run as service before variables can be set (e.g. debian apt
).
These applications will require special configuration (e.g. /etc/apt.conf
).
HTTPS Proxying - Connect
HTTPS proxying is fully supported by design. It uses a special "CONNECT" method which establishes some sort of tunnel between the browser and the proxy.
Dunno much about that thing but I've never had issues with it in years. It just works.
HTTPS Special Case - Transparent Proxy
A note on transparent proxy. (i.e. The proxy is hidden and it intercepts clients requests ala. man-in-the-middle).
Transparent proxies are breaking HTTPS. The client doesn't know that there is a proxy and has no reason to use the special Connect method.
The client tries a direct HTTPS connection... that is intercepted. The interception is detected and errors are thrown all over the place. (HTTPS is meant to detect man-in-he-middle attacks).
Domain and CDN whitelisting
Domain and subdomain whitelisting is fully supported by squid. Nonetheless, it's bound to fail in unexpected ways from time to time.
Modern websites can have all sort of domain redirections and CDN. That will break ACL when people didn't go the extra mile to put everything neatly in a single domain.
Sometimes there will be an installer or a package that wants to call the homeship or retrieve external dependencies before running. It will fail every single time and there is nothing you can do about it.
Caching
The provided configuration file is disabling all form of caching. Better safe than sorry.
Personally, I'm running things in the cloud at the moment, all instances have at least 100 Mbps connectivity and the provider runs its own repos for popular stuff (e.g. Debian) which are discovered automatically. That makes bandwidth a commodity I couldn't care less about.
I'd rather totally disable caching than experience a single caching bug that will melt my brain in troubleshooting. Every single person on the internet CANNOT get their caching headers right.
Not all environments have the same requirements though. You may go the extra mile and configure caching.
NEVER EVER require authentication on the proxy
There is an option to require password authentication from clients, typically with their LDAP accounts. It will break every browser and every command line tool in the universe.
If you want to do authentication on the proxy, don't.
If management wants authentication, explain that it's not possible.
If you're a dev and you just joined a company that is blocking direct internet AND forcing proxy authentication, RUN AWAY WHILE YOU CAN.
Conclusion
We went through the common configuration, common mistakes and things one must known about proxying.
Lesson learnt:
- There is a good open-source software for proxying (squid)
- It's simple and easy to configure (a single short file)
- All (optional) security measures have tradeoffs
- Most advanced options will break stuff and come back to haunt you
- Transparent proxies are breaking HTTPS
- Proxy authentication is evil
As usual in programming and system design, it's critical to manage requirements and expectations.
I'd recommend to stick to the basics when setting up a proxy. Generally speaking, a plain proxy without any particular filtering will work well and not give any trouble. Just gotta remember to (auto) configure the clients.