Could logless VPNs be traced?

Question

Logless VPNs, such as ExpressVPN, claim that they can't tell authorities your real IP even if asked. They claim:

No connection logs. Never logs connection timestamps, session duration, your source IP address, or the ExpressVPN IP address that your computer assumes when connected to the VPN.

So in this scenario they can't share your real IP even if they want to? Is it 100% safe to be using VPNs like this?

those are 2 different questions: if they are set up like that, then they can't. It's only "safe" if they are telling you the truth. We have multiple questions asking the same thing here. — schroeder, Dec 09 '17 at 23:57
@schroeder See my answer below. Even if the VPN is telling the truth, it is not "safe". — forest, Dec 10 '17 at 04:49
So who is their service provider? Oh, let me guess... a set of AWS instances? Who would get to log incoming and outgoing connection data, if not ExpressVPN? Who would be able to administer servers? Do the math. :) — Mark Buffalo, Dec 10 '17 at 05:15
Even if it's not a bunch of AWS instances, it's gonna be in a huge DC. :P — forest, Dec 10 '17 at 05:23

forest · Answer 1 · 2022-08-06T00:19:40.567

It is absolutely possible, and there are even companies who buy traffic analysis information from ISPs in bulk and resell them, with Team Cymru explicitly advertising the ability to trace VPN connections^*.

Traffic analysis attacks

Even if a VPN is honest about their claim that they do not log, their upstream ISP certainly logs. It would be unheard of for them not to. For a VPN (as opposed to Tor), the ingress and egress go through the same ISP, allowing trivial traffic analysis attacks. I explained a bit of this in this answer. Take the following series of events, with the ISP being the upstream ISP of the VPN (or a proxy):

ISP sees 203.0.113.42 send 253 bytes of data at t+0.
ISP sees the proxy server send a 253 byte request to example.com/foo.html at t+1.
ISP sees example.com send a 90146 byte reply at t+2.
ISP sees 203.0.113.42 receive a 90146 byte reply at t+3.

From this, it becomes trivial to realize that 203.9.113.42 connected to example.com/foo.html. This is a type of traffic analysis attack, specifically a traffic correlation attack. Virtually all ISPs keep this sort of information via NetFlow and similar ubiquitous systems.

Network stack issues

There is another problem with a VPN. You have to realize that the term Virtual Private Network is now more a marketing term. VPNs were never designed with anonymity in mind. The "private" in VPN refers to IANA-reserved private addresses specified in RFC 1918. It does not mean "right to privacy" or anything similar. All it is designed for is to connect two systems and expose them to each other as virtual network interfaces with local (private) IP addresses. This has several issues:

Your networking stack is "exposed", so a vulnerability in your kernel could be exploitable.
For this same reason, TCP/IP fingerprinting can uniquely identify you, even behind a VPN.
You are forced onto the same NAT as a large number of untrusted users, allowing them to attack you indirectly, sometimes even allowing them to discover things like your hostname.

Visualizing the issue

It's useful to see how this all works, visually, in the form of a diagram. The single line represents traffic under your home IP, and the double line represents traffic with a different IP. A traffic correlation attack involves correlating the activity (timing and sizes) of both types of traffic.

How a plain connection works:

Client ----[Client ISP]----+
                           |
Server <---[Server ISP]----+

How a VPN works:

Client ----[Client ISP]---[       ]----> VPN
                          [VPN ISP]       |
Server <===[Server ISP]===[       ]=======+

How Tor works:

Client ----[Client ISP]---[         ]--> Node1
                          [Node1 ISP]      |
                   +======[         ]======+
                   |
                   +======[         ]==> Node2
                          [Node2 ISP]      |
                   +======[         ]======+
                   |
                   +======[         ]==> Node3
                          [Node3 ISP]      |
Server <===[Server ISP]===[         ]======+

You can see in this diagram how the VPN's ISP is in the position to trivially correlate the two connections, compared to a mixnet like Tor where the first and last node's ISP must collaborate to have a chance at deanonymizing someone. This is not impossible, and an adversary who can see a significant portion of the internet at any given time may be able to pull this off a certain percentage of the time. It is very difficult to do, however, and the Tor protocol includes a number of features (both deployed and in active development) to make this even harder than it already is.

Another important thing to remember is that Tor will periodically switch the nodes it uses. Although the first node stays the same in order to avoid so-called Sybil attacks, the other two will change around every 10 minutes, or whenever a different domain is visited. This reduces the chance that the final node sees too much traffic over time. VPNs, on the other hand, will naturally be static targets.

What this all means

Using a VPN (or proxy) does not protect you from the VPN's ISP revealing its logs, even if the VPN service is completely honest about their no-logging policy.
Your networking stack is exposed and visible to any 3rd party server you connect to, allowing potential exploitation and TCP/IP fingerprinting.
Anonymity networks like Tor provide some level of traffic correlation protection and hides your networking stack, though like all systems, it's not perfect.

If you need anonymity, you should use Tor without a VPN, unless a VPN is necessary to bypass a firewall that Tor cannot bypass, otherwise it would be superfluous.

_{* Disturbingly, the CEO and founder of Team Cymru, Rabbi Rob Thomas, is also on the Tor Project board of directors.}

Could the traffic analysis be rendered more difficult by introducing random time lags and random padding at the IP or TCP layer by the VPN so that the correlation of outgoing and ingoing traffic is harder, assuming there's a lot of simultaneous customers connected to the VPN ? — entrop-x, Dec 10 '17 at 09:01
Unfortunately not very much. Various studies (don't have them on me right now) have shown that the necessary delays would have to be several _hours_ to be effective. While this isn't a problem for email mixnets where sitting on a single node for hours was acceptable, it would completely break the internet as we know it when used with time-sensitive TCP/IP connections. — forest, Dec 10 '17 at 09:11
If you use TOR as another layer of security, still not safe ? — Amit Nar, Dec 10 '17 at 22:25
Using only Tor, or a VPN and then Tor, are both fine, though a VPN is superfluous in such cases. Just use Tor and follow standard good OPSEC, and you should be fine, excepting issues with any anonymity network like browser exploits. "Safe" is a very vague word and it depends on too many factors to answer with yes or no. — forest, Dec 10 '17 at 22:30
In fact, to answer the question whether to use a VPN or not, you have to ask yourself whether you "trust" your own ISP more than the VPN and the VPN's ISP - or, rather, whether the incentive to "go after you" is higher for your own ISP or the VPN+his ISP. Using VPN + Tor hides the fact that you are using Tor from your own ISP, but gives that fact away to your VPN's ISP. — entrop-x, Dec 11 '17 at 09:40
@entrop-x Actually it does not hide the fact that you are using Tor. Tor uses 514 byte "cells", whereas VPNs do not use such specific padding. Simply noticing that the connections come in 514 byte bursts is enough to conclude that someone is using Tor despite encrypted VPN traffic. — forest, Dec 11 '17 at 09:48
Curious, why does TOR not pad or mangle cells to avoid this footprinting? I suppose that's a question for the TOR architects... — Monica Apologists Get Out, Dec 14 '17 at 18:15
@Adonalsium It's the padding that _provides_ this fingerprint. The usage of 514 byte cells is itself padding. — forest, Dec 15 '17 at 00:44
I suppose the answer to my next question "Why" is "That's how they designed the protocol, for reasons.", then. Seems like they could implement varied length padding in order to make that signature less obvious, but I'm sure they would have done that if it was that simple. — Monica Apologists Get Out, Dec 15 '17 at 14:57
Varied padding would still have a specific range, which itself would be a fingerprint. The only "solution" would be to use no padding at all, but then traffic would be easier to mount certain forms of analysis attacks on. — forest, Dec 16 '17 at 02:04
@forest very well done, however 1 nitpick ... I would change `Node1-3 ISP` of the tor diagram to `Node n+1-3 ISP` because while one packet might go through those specific 3x nodes the next packet might go through 3x different ones. This is what makes Tor SO much better than simply chaining proxys via SSH. — CaffeineAddiction, Dec 20 '18 at 08:49
@CaffeineAddiction Actually, a single stream will only use one set of 3 relays. It's only when you go to another domain (or stop sending packets for more than 10 minutes) that the circuit path changes. But you're right, I should probably clarify that the relays are not fixed. I'll do that, thanks! — forest, Dec 20 '18 at 08:50

Could logless VPNs be traced?

1 Answers1

Traffic analysis attacks

Network stack issues

Visualizing the issue

What this all means

Linked

Related