Is it possible to identify the source of message in whatsapp?

Question

Is it possible to identify the real source of message in whatsapp?

Scenario :

Texting whatsapp message to a group,one of the group member forwards it to another and it goes on atlast one day it reaches me again.

Questions :

Now as a end user, I can benefit my privacy since i believe the WhatsApp doesn't track down the source of the message,so it gives me a privacy option.

Now imagine as a government agency or any other independent researchers whom wish to track down the source of the message, is it really possible to track down the source/origin of the message? Any research publications related to it?

Technically

The same thing repeats one user forwards the message to another,so the origin of message goes unidentified. So now if a need arises to track down the original sender or origin of message is it possible to track it down, if not why?

Jedi · Answer 1 · 2016-07-01T19:55:49.763

Warning: This answer is pure speculation. It is also hard to get references for covert programs, but I've linked to Wikipedia pages which have some good references. I've also stayed away from examining WhatsApp's implementation, as there are several other Security SE posts discussing various aspects of it.

I hope I have matched your scenario correctly in this hypothetical:

Consider a government agency in (hypothetically) India called the Massive Surveillance Bureau (MSB). The MSB has detected that certain WhatsApp messages containing false rumours were forwarded in a sensitive area, which eventually led to a riot. What would it take for the MSB to track down the sender of the first message?

Firstly, let's look at what the MSB minimally requires to be able to perform this tracking. The MSB would need three pieces of data for all messages:

A hash of the content,
Sender identifier
Timestamp

Even if it does not have access to plaintext, but can just has a hash of the content, it can still identify the first sender of a particular message (Note that changing even a single character breaks this scheme).

Next, let's consider all of the different places that the MSB can place itself.

Scenario 1: The MSB can read all carrier traffic and even ISP traffic in encrypted form. In this case, it cannot establish item 1 -- the hash. It could however use metadata like packet size, sender IP and timestamp to roughly estimate when a message was sent to one of Whatsapp's servers. Given enough ground-work (e.g. find first sender, confiscate and investigate device forensically, interrogate owner) a reasonable assumption could be made.

Scenario 2: The MSB can read all carrier traffic and even ISP traffic in India. This could either be manually gathered by making requests to all carriers (like PRISM) or automatically by some sort of a monitoring system (like the CMS or Tempora). Since WhatsApp relies on the underlying carrier signal, a man-in-the-middle attack is possible, whereby the MSB clones and spoofs devices by receiving the first SMS authentication message (e.g. here). Now it has real plaintext data; it can find all messages similar to the target and lookup the first one.

Scenario 3: The MSB has a backdoor with read access to WhatsApp's stored messages in a database. This is very similar to scenario 1, since the keys are not available to it; it can makes estimates based on available metadata about sender, timestamp and message size. If it had a program like PRISM, WhatsApp could intentionally setup an insecure version of the app on certain devices, making it easier to perform other attacks.

Scenario 4: The MSB has infected all mobile devices in India with malware (e.g. IRRITANT HORN. This is similar to scenario 2. The MSB now has access to all decrypted messages on-device.

Anyways, that's all speculation. Most of these attacks will still be difficult now that WhatsApp has E2EE (desribed here, analyzed here). You may find this real traffic analysis paper from GeorgiaTech useful

EDIT: Yay!

score 1 · Accepted Answer · answered Jun 22 '16 at 19:46

This would be no different in design from TOR where your data is forwarded through nodes. There needs to be a separation of things here. First I will speak on "inference" which is what the government would aim for. Imagine there are a dozen in the group you speak of. The keyword is "apples"

User1 --> "I will smash all apples" --> group
Random Group member --> "I will smash all apples" --> Group (User2 receives it)
User2 --> "I will smash all apples" --> group
Random Group member --> "I will smash all apples" --> Group (User3 receives it)

It's more expensive for the government to try and determine who sent it first. Their best bet would be to confiscate all phones, and see who saw it first. But scratch that, this becomes a privacy, liberties issue. Still cheaper to get records (ISP, search records, etc.) to illustrate (infer) who may have had a bigger interest in apples first. This is off topic, but where government is concern, don't think for a second they will attack the technology head on, its cheaper/more effective to attack the endpoints individually.

TECHNICAL TALK NOW - What you have diagrammed differs little from TOR except you're illustrating this using WhatsApp. The problem one will run into would be tracking the user depending on how the message was forwarded. Would be far easier to create a random API to interact with Whatsapp using say Google Talk/SMS, then send a random message from there. Where there is a will there is a way considering WhatsApp stores timestamps. The most I can think of would be someone accessing WhatsApp on all phones involved in the group, and check who has the first time stamp

Is it possible to identify the source of message in whatsapp?

2 Answers2