If an attacker obtains a file that has been encrypted using an OpenPGP public key, what information can the attacker deduce?
For example, to what degree of certainty can the attacker deduce the identity of the intended recipient?
The key ID of the recipient is included in plain-text in the encrypted file. Other possibly interesting information "hidden in plain sight" is just the size of the file, or the name of the encrypted file (if someone just sends it without alteration of course.)
What you might not realise is that the recipient Key ID is effectively an optional field. Section 5.1 goes on to say:
An implementation MAY accept or use a Key ID of zero as a "wild card" or "speculative" Key ID. In this case, the receiving implementation would try all available private keys, checking for a valid decrypted session key. This format helps reduce traffic analysis of messages.
You can encrypt using the -R
(or --hidden-recipient
) flag with gpg
to avoid revealing the recipient's public key in an encrypted message.
$ gpg -e -R torvalds@linux-foundation.org message.txt
$ $ gpg --verbose --verbose --decrypt message.txt.gpg
:pubkey enc packet: version 3, algo 1, keyid 0000000000000000
data: [2047 bits]
gpg: public key is 00000000
gpg: anonymous recipient; trying secret key aaaaaaaa ...
gpg: anonymous recipient; trying secret key bbbbbbbb ...
gpg: anonymous recipient; trying secret key cccccccc ...
:encrypted data packet:
length: 76
mdc_method: 2
gpg: encrypted with RSA key, ID 00000000
gpg: decryption failed: secret key not available
$
As this point, gpg
iterates through all the private keys it has trying to obtain a valid session key, as it cannot identify the public key used for encryption. However, also see this answer for ways to differentiate between recipients if the attacker has access to a large number of messages.
A practical aside -- secondary clues may be in various logs. For instance, an attacker who obtains such a message might also be able to access (say) a .bash_history
file with the recipient's address, or a web-server log with IP addresses that provide clues to who POST'ed or GETs the file, etc.
According to the RFC for the OpenPGP message format, section 5.1, every OpenPGP encrypted message contains at least one Public-Key Encrypted Session Key Packet, which itself contains this plain-text information:
If multiple recipients have been specified, then multiple Public-Key Encrypted Session Key Packets will exist, one for each intended recipient.
So, in summary, an attacker can narrow down the identity of the intended recipient based on an OpenPGP encrypted message.
You can prove this experimentally to yourself by attempting to decrypt a file to which you do not have the public key. As you can see, the key IDs of the intended recipients, as well as algorithm name and version number:
$ gpg --encrypt -r '<recipient1@example.com>' -r '<recipient2@example.com>' --sign message.txt
$ GNUPGHOME=/tmp/empty gpg --verbose --verbose --decrypt message.txt.gpg
:pubkey enc packet: version 3, algo 16, keyid 9759103664E69CC1
data: [2048 bits]
data: [2047 bits]
gpg: public key is 64E69CC1
:pubkey enc packet: version 3, algo 16, keyid 9478F6114164312C
data: [2048 bits]
data: [2048 bits]
gpg: public key is 4164312C
:encrypted data packet:
length: 171
mdc_method: 2
gpg: encrypted with ELG-E key, ID 4164312C
gpg: encrypted with ELG-E key, ID 64E69CC1
gpg: decryption failed: secret key not available
This is the plain-text information that is available. I do not know if they is any other information that can be deduced statistically.