Can DeepCorr's correlation technique de-anonymize all Tor users?
No, it cannot de-anonymize ALL Tor users.
(however it drastically narrows down the scope to proceed with successful de-anonymization)
Why, and what is flow correlation attack
Flow correlation attack is an attack where adversary intercepts network flows at various network locations "correlating" them using math statistics or machine learning methods (e.g. neural networks).
DeepCorr's setting consists of a network "with M ingress flows and N egress flows": DeepCorr listens to ingress flows, being closer to the group of users at the one end, and it just tries to figure out the moment when traffic starts leaving the circuit at another end. And it means "gotcha"!
Website != flow
more and more people browse websites(many of those are simple websites) with similar sizes around the same time
how can they tell the source of the traffic using size and timing alone when other flows have similar features?
DeepCorr does not do website fingerprinting (which is another class of attacks, as mentioned in the article), it just correlates a "flow A" to "flow B" at two different points of network.
Website similarities don't matter for successful correlation, DeepCorr operates with features of small packet sequences: sizes, times, flow direction (in/out), etc.
Still...
Correlation != de-anonymization
From the article:
To be able to perform flow correlation, an adversary needs to observe (i.e., intercept) some fraction of flows entering and exiting the Tor network. The adversary can then de-anonymize a specific Tor connection...
I would say "but may not de-anonymize"... I mean that seems like a successful flow correlation attack doesn't automatically mean a successful de-anonymization. Correlation means "these users visited those group of sites" (but it drastically narrows down the set of users and increases de-anonymization probability).
Does Firefox generate additional patterns?
Could it be the reason for why it worked so good for them? may be firefox generated some extra unique traffic that Tor browser wouldn't generate because of things like Ads and cookies?.
In my opinion there is no much difference between Tor and Firefox traffic flow.
Example: google.com
Firefox:
25 requests
1.31 MB / 677.67 KB transferred
Tor:
19 requests
1.39 MB / 498.30 KB transferred
Intuitively, I would say that both browsers generates some unique patterns of flows, and doesn't forget that website != flow.
Also seems like DeepCorr doesn't need too much of traffic to measure:
"the correlated flows are 300 packets long for all the systems"...
Tor's hidden services
Can this attack work against hidden services(version 3) as well?
I would say "why not": DeepCorr performs on traffic flows, it doesn't care whether the flow is "hidden", and hidden service is just another traffic flow. DeepCorr will correlate ingress and egress, and it is what it does.
P.S.: a few words about a possible countermeasure.
Countermeasure
As authors stated:
"Our results suggest that (public) Tor relays should deploy a traffic obfuscation mechanism like obfs4 with IAT=1 to resist advanced flow correlation techniques like DeepCorr."
(IAT=0 doesn't help)
"However, this is not a trivial solution due to the increased cost, increased overhead (band-width and CPU), and reduced QoS imposed by such obfuscation mechanisms... designing an obfuscation mechanism tailored to Tor that makes the right balance between performance, cost, and anonymity remains a challenging problem for future work."