45

As the title says, do those 4 bytes carry a meaning (I assume they do as apparently the smile changes depending on the key bitness)?

The two files below have been encrypted with the different keys, but within the same key those 4 bytes are always the same.

If these 4 bytes are always the same, is there any built-in way in PGP/GPG to prevent an attacker from knowing what file they may have obtained/intercepted, other than stripping these bytes in transit and re-creating them at destination?

enter image description here enter image description here

schroeder
  • 123,438
  • 55
  • 284
  • 319
ajeh
  • 503
  • 1
  • 4
  • 6
  • 12
    Preventing an attacker knowing they have a PGP/GPG file would be a topic for steganography (which I believe PGP/GPG doesn't do). – user253751 Dec 07 '16 at 00:04
  • 23
    Note that those 4 bytes are *not* the same in your two examples; the first sequence starts with `85 02` and the second with `85 01`. In the codepage you are using, one is a smiley with a plain background and one is with a filled one. – Federico Poloni Dec 07 '16 at 07:50
  • 12
    Total concidence. Do you remember the 7/11 Wingdings conspiracy? http://gizmodo.com/wingdings-predicted-9-11-a-truthers-tale-1679759324 – Mawg says reinstate Monica Dec 07 '16 at 13:09
  • 3
    @Mawg, not sure what Wingdings has to do with slurpies or convenience stores :) – mikeazo Dec 08 '16 at 13:14
  • 4
    Lolx! I Trumped it :-) – Mawg says reinstate Monica Dec 08 '16 at 13:15
  • 1
    Maybe the people who defined the magic number were hinting everyone that there's no telling what can follow love that starts with a female smile? :P – Chirag Bhatia - chirag64 Dec 18 '16 at 10:42
  • 1
    @ChiragBhatia-chirag64 The way I initially read it was: "if you give a necklace with a smile to a woman, she will love you". But apparently, my imagination played a trick with me :) – ajeh Dec 20 '16 at 16:16

3 Answers3

70

Yes, it's a coincidence that the first bytes appear to you as these symbols. They are part of the OpenPGP message format specification (RFC 4880) and vary depending on the packet properties.

Let's create a file containing only those bytes and try to read it as a GPG message:

$ echo "\x85\x02\x0c\x03" > foo.gpg && gpg --list-packets foo.gpg
# off=0 ctb=85 tag=1 hlen=3 plen=524
:pubkey enc packet: version 3, algo 255, keyid 0AFFFFFFFFFFFFFF
    unsupported algorithm 255
  • The first byte (0x85 = 0b10000101) is the cipher type byte (CTB) that describes the packet type. We can break it up as follows:
    1: CTB indicator bit
    0: old packet format (see RFC 1991)
    0001: public-key-encrypted packet
    01: packet-length field is 2 bytes long

  • The second and third bytes denote the packet length (0x020c = 524).

  • The fourth byte (0x03) means it's in the version 3 packet format.

As you can see, these bytes are meaningful and not magic number constants that you can remove without losing information. If you cut them off, you are corrupting the GPG packet and it will require some guesswork to reconstruct it.


The bytes are shown as smileys and hearts because that's how your (probably DOS) terminal displays non-printable control characters. In character sets that originate from code page 437, low bytes outside the printable ASCII range are traditionally represented as icons. Here's the original CP437 on an IBM PC:

enter image description here

(Image source)

Arminius
  • 43,922
  • 13
  • 140
  • 136
18

As a general principle, well-designed binary file formats¹ will have their first few bytes be a magic number identifying the format. ELF executables' first four bytes are always 7f 45 4c 46, PNG files' first eight bytes are always 89 50 4e 47 0d 0a 1a 0a, and so on. Well-designed encrypted file formats will always follow that magic number with an unencrypted "header" that reveals the encryption algorithm, the length of the encrypted data, things like that.

This is not normally considered a security vulnerability, because of Kerckhoffs' principle, which says that a cryptosystem needs to be secure even if the attacker knows everything that the file header can tell them (such as the algorithm).

It's possible to design a file format, or a protocol, all of whose bytes are indistinguishable from randomness unless you already know the decryption key, but it's surprisingly difficult (did you know that encrypting the expected length of encrypted data can introduce a vulnerability?) and it doesn't actually gain you anything. A file that's completely indistinguishable from the output of cat /dev/random will be just as suspicious to the secret police as an obviously GPG-encrypted file. Perhaps more suspicious, even, since there are all kinds of innocuous reasons to encrypt files.

If you are worried about an attacker merely learning that you are using encryption to communicate with someone, you need steganography, which conceals secret information within ordinary-looking, unencrypted files. Be aware that the state of the art in steganography is not nearly as sophisticated as the state of the art in cryptography; last I checked, all known approaches were breakable by a determined adversary. (If the secret police's first impression is "oh, this is a memory card full of vacation photos", they might not bother digging any deeper…unless they already have a reason to suspect you.)


¹ I have no opinion about whether the GPG file format is well-designed.

zwol
  • 647
  • 1
  • 4
  • 12
  • 1
    If you really are concerned about the header, then in some encryption applications, it's possible to have headerless output, or to split the header to separate file, but you'll have to memorize the encryption parameters and pass it into the decryption application. – Lie Ryan Dec 08 '16 at 00:38
  • 1
    Could you answer the original question? I agree that we shouldn't consider it a vulnerability due to Kerckhoffs' principle, but is it true in the first place that GPG or PGP has a consistent set of bytes or a human-readable header? (I'm mainly concerned about the latter, the accepted answer seems to have covered the bytes mentioned at least) – NH. Jul 26 '17 at 20:01
  • @NH. Anything I could say would merely repeat the other answer. – zwol Jul 26 '17 at 20:03
  • @zwol so those 4 bytes explained in the other answer are the only header-ish data? Nothing human-readable or easier to detect in Notepad++? – NH. Jul 26 '17 at 21:06
  • @NH. I don't know. The RFC linked from the other answer should say. – zwol Jul 26 '17 at 21:20
0

What you see as "ellipsis, smile, female sign and a heart" are symbols displayed to you based on selected codepage. Type chcp to find out your Active code page. You'll get different symbols on different codepages for same byte codes.

Magic bits are like ID tags indicating type of file, which some operating systems eg. MacOS rely on vs Windows (that prefer file extensions, although some files eg. .exe files have their magic bits always referred to at run time).

In your scnshots the binary codes are different, although you see similar symbols on file display.

To answer your question: yes it is "a coincidence" in this case (based on your codepage), and not "always the same".

The idea of encryption is that the attacker may know the type of file and yet not be able to decipher what has been encrypted (practically), so you don't have a need for "stripping these bytes".

Zimba
  • 181
  • 5