What aspects of image preparation workflows can lead to accidents like Boris Johnson's No. 10 tweet's 'hidden message'?

Question

The BBC reports that the image Boris Johson posted on Twitter to congratulate Joe Biden contains traces of the text "Trump" in the background. The BBC article links to a Guido Fawkes' article, and when I download the tweet's JPEG, convert to PNG with macOS preview then subtract a constant background, there it is!

When I do a similar check on the blanked out area in my image in this post I see nothing, i.e. it worked.

My goal there was to show an image of a battery but to ensure that no personal information like the battery's serial number would be visible or detectable. Sharing that on the internet might be a small but nonzero security issue.

I breathe a sigh of relief but then wonder for future reference, in order to be sure that blanked out areas are fully blanked out:

Question: What aspects of image preparation workflows can lead to accidents like Boris Johnson's No. 10 tweet's 'hidden message'? What are the most prominent things to avoid doing in order to avoid accidental hidden residues like this?

import numpy as np
import matplotlib.pyplot as plt

# https://twitter.com/BorisJohnson/status/1325133262075940864/photo/1
# https://order-order.com/2020/11/10/number-10s-message-to-biden-originally-congratulated-trump/
# https://pbs.twimg.com/media/EmPRWjyVoAEBIBI?format=jpg
# https://pbs.twimg.com/media/EmPRWjyVoAEBIBI?format=jpg&name=4096x4096
# https://twitter.com/BorisJohnson/status/1325133262075940864


img = plt.imread('biden.png')

average = img[20:100, 20:100].mean(axis=(0, 1))

imgx = (img[..., :3] - average[:3]).clip(-0.005, 0.005) + 0.005
imgx = imgx.sum(axis=2)  # monochrome
imgx /= imgx.max()  # normalize

plt.imshow(imgx, cmap='cool')
plt.show()

@MarkMorganLloyd I can't disagree with the sentiment, but that might (as best, and with some tweaks to the wording) fit at politics.se. This fits with various other "how not to redact" and related questions — Chris H, Nov 11 '20 at 15:11
@ChrisH that's right. I simply wondered "if they can screw up so badly, what might I not be erasing!?" — uhoh, Nov 11 '20 at 15:25
I suggest this wasn't a "technical" issue the way you meant that. Surely, the parts of "image preparation workflow" which matter there are two-fold… First that there should be none: posting graphics as text is rarely helpful. Either way, the technicality was not getting people to pay attention, which is hardly an IT problem, except to the extent user10216038 is prolly right about left-over toner. — Robbie Goodwin, Nov 12 '20 at 17:23

jkej · Answer 1 · 2020-11-11T15:49:10.367

Summary:

The most likely explanation is that the old text was removed by using a fuzzy or smooth eraser tool.

Analysis:

In the image below I have only increased brightness and contrast to make the "hidden message" more visible. Nothing fancy. The slight red tint is only due to the fact that the black background of the original has a very slight red tint to it.

As you can see there is a very clear gradient in the most visible hidden text fragment (under "shared priorities"). The other fragments also show some signs of gradients, but there are no gradient effects used in the text about Biden.

Hypothesis:

These seemingly random gradients together with the fact that the "hidden message" appears to consist of small random fragments of a much larger text makes me think that whoever made this picture removed the old text by using a fuzzy eraser tool. They manually swiped the eraser tool back and forth over the text until they didn't see the old text anymore. But the fuzzy eraser tool doesn't remove everything if you pass over quickly just once. This is by design to avoid sharp edges in an image.

In the picture below I have swiped a big fuzzy eraser back and forth a few times over the original image to show what the results may look like. Obviously, in my picture some parts are still a little too visible, but I still think it gives a good idea of what type of effects this could cause.

Solution:

Don't use a fuzzy eraser tool to remove things you want to remove completely. In this case there's no need to use an eraser tool at all. Just fill the whole image with the background color, or maybe even better, just create a new image from scratch. The only thing they wanted to keep was the size and the background color and that should only take a few seconds to replicate in a new image.

Update:

As requested by @Tristan in a comment, I have tried to replicate the process completely. Here is a picture where I have removed the Biden/Kamala text with a fuzzy eraser tool and then placed a new text on top of it:

And here is the same picture but with increased brightness and contrast to highlight remnants of the old text:

this definitely seems the best fit so far. It explains why we have partial letters, a gradient on those letters, and only a few words with anything remaining. — Tristan, Nov 11 '20 at 14:48
Nice! I wonder whether the original image with the "Trump" text was ever posted. — Dewi Morgan, Nov 11 '20 at 23:04
Gold stuff. This also reveals that Boris Johnson is currently indeed number "1 0" as indicated in beautiful [Albertus](https://fontsinuse.com/uses/8920/the-prisoner-1967-tv-series). Who is Number 1? — David Tonhofer, Nov 11 '20 at 23:07
The Comic Sans is a nice touch in this case. (I never thought I'd write this sentence in my life.) — Federico Poloni, Nov 12 '20 at 09:56
How was this easier for them than File > New? It boggles the mind — Bryan Boettcher, Nov 13 '20 at 17:47
@DavidTonhofer [This guy](https://genius.com/albums/Fatboy-slim/You-ve-come-a-long-way-baby) is # 1. — Mentalist, Nov 14 '20 at 09:06

user10216038 · Accepted Answer · 2020-11-11T03:55:05.317

43

At first I thought this was simply a hoax as no digital image process I'm aware of could do this by accident, but I was intrigued.

Taking the orignal image and performing an equalize on it did indeed pull out the alternate fragments. It also revealed the black background to be not really black but a composite of dense subtractive primary colors.

Here's a crop of the equalized image:

I'm pretty sure this error came about because the message was Printed on a Color Laser Printer and photographed (or scanned).

Note the Cyan, Magenta, Yellow remnants.

The "hidden message" is a result of a previous print leaving a tiny bit of toner on the drum which was picked up by the next print. This is most common in duplex capable printers.

So how do you prevent it? Don't photograph paper to produce digital content.

edited Nov 11 '20 at 03:55

answered Nov 11 '20 at 03:48

user10216038

7,552
2
16
19

Thanks! This is not what I expected but it turns out to be far more interesting. A quick check of correlation coefficients between colors for the "blank"100,000 pixel noisy area in the middle of your image with `np.corrcoef(r, g)` shows roughly a 0.8 correlation factor between any two colors, https://i.stack.imgur.com/5uI6e.png I suppose that doesn't mean anything conclusive without knowing more about the toner particle size, pixel size and focus, but it seems consistent with these being subtractive colors and therefore each pure color exciting two of the three RGB channels of the camera. – uhoh Nov 11 '20 at 07:40
(not "pure color" I suppose I mean each toner color (y, m c)) – uhoh Nov 11 '20 at 09:22
28

The background of the [original image](https://pbs.twimg.com/media/EmPRWjyVoAEBIBI?format=jpg&name=4096x4096) is a solid colour as far as I can tell, not sure where the noisy background is coming from in your image – Alan Birtles Nov 11 '20 at 11:16
15

Information leakage of this sort is terrifying. I can easily imagine national secrets having been printed on the printer instead, and therefore accidentally making to the Internet! – Ahmed Tawfik Nov 11 '20 at 11:27
21

The noise in the image you posted is most probably an artifact of your equalizing algorithm. It certainly isn't there in the original. More importantly, the "hidden message" is _brighter_ than the background. Surely, remaining toner from the last print could only have made the print _darker_? – jkej Nov 11 '20 at 13:26
2

@AhmedTawfik I'm not sure that in reality left over toner would leave a mark which is simultaneously not visible to the naked eye on the page, not visible in the scan/photograph and retrievable from the image. Even if it was possible why would anybody print something out just to take a picture of it? – Alan Birtles Nov 11 '20 at 13:39
4

@AlanBirtles that's because they have a camera app preinstalled on their phone but their bluetooth isn't currently working and the corporate firewall doesn't allow word documents over the internal network... and they couldn't have taken a photo of the screen because it's one of those places that use a CRT. It might sound facetious but it's entirely plausible. – John Dvorak Nov 11 '20 at 13:55
Similarly I don't see noise, only jpg artifacts (and some anti-aliasing effects) around the letters i.e. the background is pure #231F20 except near the letters – Chris H Nov 11 '20 at 13:58
11

The text is easily revealable using a naive fill tool (i.e. one that only replaces the exact colour you click on). This replaces the background, but leaves behind the incorrect text, and a couple of jpg artifacts. This is not the behaviour you'd expect if @user10216038 had the right solution, and fits that suggested below much better – Tristan Nov 11 '20 at 14:01
@Alan Birtles - The background color is **NOT** a solid color. You can see this readily in GIMP by using the color picker at various points on the background and you will see it change. What's more, if you zoom to 400 or 800 percent you can readily see the lighter *hidden text*. – user10216038 Nov 11 '20 at 16:02
5

@user10216038 Do you have a different original image than the rest of us? I have used the one Alan Birtles linked to. In that image the background certainly is a solid color. I have used the color picker in GIMP and the background color is #231F20 everywhere. – jkej Nov 11 '20 at 16:11
@jkej - The image I used is: **https://twitter.com/BorisJohnson/status/1325133262075940864/photo/1** which I got to by following the *Guido Fawkes Article* then selecting the *tweeted link* and finally the actual *tweet*. – user10216038 Nov 11 '20 at 16:18
@user10216038 I agree with jkej, using a fill with 0 threshold, so it only fills identical colours, and the original photo, all the background apart from the visible text is filled in, suggesting it is a single colour. Something weird seems to have happened between the server and the tool you're using to look at it – Tristan Nov 11 '20 at 16:19
@user10216038 Yes, that appears to be the same image. Eventually you get to the same URL either way. Could they have changed the picture since you downloaded it? Could you upload the original that you have where the background is not uniform? – jkej Nov 11 '20 at 16:28
@Tristan - Ijust pulled down the image again and used fill with **0** threshold, **it did not fill.** Using a threshold of **1** filled *most* and illuminated the *hidden text*. I believe that you are not looking at the native file. – user10216038 Nov 11 '20 at 16:29
1

@jkej - The native file is 4096x2304. Which is a ludicrously large tweet image. – user10216038 Nov 11 '20 at 16:35
@user10216038 Are you doing something special to look at the "native file". I just open the file in GIMP. I manage to fill with 0 threshold. – jkej Nov 11 '20 at 16:35
@user10216038 Yes, that's the resolution of my image too. The file size is 368 KB (377 707 bytes). – jkej Nov 11 '20 at 16:39
1

ah interesting. It's possible we are accessing the image after some sort of downscaling by twitter. If it is 4k then what we're seeing would be averaging over any noise you've got pretty well so it makes sense that we'd see a solid background – Tristan Nov 11 '20 at 16:39
2

@Tristan But I have been working on such a large file all along and the background seems to be solid. – jkej Nov 11 '20 at 16:41
1

curiouser and curiouser – Tristan Nov 11 '20 at 16:56
1

maybe your web browser/ISP is recompressing the image before you download it? – Alan Birtles Nov 11 '20 at 17:08
3

The image size of 4096x2304 is yet another indication that this was scanned paper. A direct digital image created that big for a Twitter post would be insane. – user10216038 Nov 11 '20 at 17:56
1

@Ahmed Tawfik - It used to be standard security policy to require printing 3 blank pages before using a classified printer to print unclassified. The original intent was to prevent bleed over like this. Of course like many governmental guides, the purpose was lost to blind rules resulting in imposing the same restrictions on ink-jet printers and labeling toner cartridges as classified and unclassified. It was easier to follow procedure than explain that a box of black dust doesn't store information anymore than a ballpoint pen remembers writing classified sentences. – user10216038 Nov 11 '20 at 18:06
3

Printing and scanning a picture to post it on Twitter would be magnitudes more insane than directly creating a digital image with slightly higher resolution than needed. – jkej Nov 11 '20 at 18:36
7

I think this makes some assumptions that are a priori pretty unlikely: This would have to be a pretty high-quality print and scan (in terms of resolution+smoothness of the color), but yet low-quality enough so that there is text bleed that can easily seen by eye alone (but only at limited locations, with smooth and gradient-like edges). jkej's answer seems way more likely to me – ManfP Nov 11 '20 at 20:05
28

Sorry, but this explanation is ridiculous. Who would print an image and then scan it to use in a tweet? Even ignoring the highly incredible process, this would inevitably produce image distortions, which are not visible in the image. – IMil Nov 11 '20 at 23:28
as @AlanBirtles says - are you perhaps downloading over a mobile ISP? I've had trouble with O2 (a few years back) recompressing JPGs so map text wasn't legible. But twitter should always be https – Chris H Nov 12 '20 at 08:44
2

Another data point: Twitter serves image files based on context. If I right click on the image in the tweet and "save image as" in Firefox on Linux, i.e. desktop, I get a 680x383px image of 39kB; bucket fill, threshold 0 doesn't work (it shows something that might be parts of letters, but there's a region all round the white text that's lighter than background & dominated by jpg compression artifacts). If I *left* click on the image, I go to https://twitter.com/BorisJohnson/status/1325133262075940864/photo/1 which downloads as 4096x2304px, 378kB, reveals "Trump". File names are identical – Chris H Nov 12 '20 at 08:55
1

unfortunately (in this case, but a very good idea overall) there's minimal EXIF info in JPGs downloaded from twitter – Chris H Nov 12 '20 at 08:56
... If I left click and save image on my phone (Firefox for Android) over WiFi or 4G I get a 2048x1152px image, 149kB. Flood fill reveals hidden text. – Chris H Nov 12 '20 at 09:04
@user10216038 dunno, I think my pens are talking to each other when I sleep, so I've moved them to separate pocket protectors. – uhoh Nov 13 '20 at 05:40
1

@IMil While I agree that this explanation is not the best fitting, but your argument is also poor. The answer to who "would print an image and then scan it to use in a tweet?" is whoever tweets an image of some text on a text sharing platform. – Andrei Nov 13 '20 at 16:27
1

@Andrei Not really. Those are two completely unrelated and different things. There are sensible reasons people upload images with text on to Twitter (formatting control, mainly). Doing a completely useless thing that takes loads of time and would ruin the quality of the image is incomparable. – Asteroids With Wings Nov 13 '20 at 21:31
1

I think the answer to why you'd print an image, scan it and tweet it could conceivably be the person who tweeted the image only had access to a printed copy and not the digital original? – David Waterworth Nov 14 '20 at 03:25

Alan Birtles · Answer 3 · 2020-11-11T14:16:47.610

9

I expect the explanation is fairly mundane. An image was probably prepared with 3 layers:

Background colour and maybe the footer text
Trump text
Biden text

You could then produce both messages with the same look by hiding either layer 2 or layer 3. I imagine what happened here is the person preparing the images hid layer 2 by reducing the opacity and accidentally set it to nearly 0 rather than 0. Add in a liberal sprinkling of JPEG compression artefacts and you get the result seen in the tweet.

Following this process and creating a JPEG results in an image like this:

I'm sure you could fiddle with the level of opacity (I used 1.6%) and JPEG quality and get a result where the alternative text was less visible to the naked eye but still present in the image.

Here is an example with a lower opacity value, I've also added "small text" in a smaller font partially overlapped with "Biden", the smaller text is swamped by the JPEG artefacts whilst "Trump" survives.

You can avoid this by preparing two separate images or just by being a bit more careful when using layers in one image.

edited Nov 11 '20 at 14:16

answered Nov 11 '20 at 11:01

Alan Birtles

206
1
4

12

this would leave the entirety of the underlying message visible though. Instead we only have a few characters, some of them only partial – Tristan Nov 11 '20 at 14:03
2

@Tristan not necessarily if the text is much fainter than with my example (I deliberately chose a value which is still visible to the naked eye to illustrate the point) much of the text could disappear into the JPEG artefacts – Alan Birtles Nov 11 '20 at 14:06
The image is also likely to have gone through multiple generations of JPEG compression, once when the image was created and probably at least once more when the image was uploaded to twitter especially if the image was resized – Alan Birtles Nov 11 '20 at 14:26
2

This is closer but @Tristan has a good point. The even gradient on the small text below "shared priorities" ("the future of this"?) doesn't seem to fit with this approach even taking into account jpg artifacts, anti-aliasing, and recompression. Even your fainter version is readable immediately on my monitor - but presumably you could go more opaque still – Chris H Nov 11 '20 at 14:43
... this gradient also fits with a hint of letters above "look" – Chris H Nov 11 '20 at 14:50
3

Yeah maybe I was giving too much technical credit to the people running our country, the eraser explanation seems more likely – Alan Birtles Nov 11 '20 at 14:55
As someone who also has dabbled with image editors such as Photoshop and a number of others, this is a valid possiblity as well. It might actually be a combination of both answers as well. Either way, I am reasonably certain we wouldn't be able to tell for sure based on current data only. – Gnudiff Nov 12 '20 at 11:51

score 2 · Answer 4 · answered Nov 13 '20 at 08:25

2

While probably not the cause in this specific case, in theory it could also be a result of using a tool deliberately leaking redacted information.

In 2008 Underhanded C Contest, the participants were asked to write an image-editing tool leaking information about image parts redacted with (traditionally black) rectangles. And to do that in a stealthy, deniable way.

answered Nov 13 '20 at 08:25

Edheldil

885
5
9

Interesting reading! Is it possible to add a sentence or two here that also directly addresses the original question as asked? *Thanks!* – uhoh Nov 13 '20 at 10:45

What aspects of image preparation workflows can lead to accidents like Boris Johnson's No. 10 tweet's 'hidden message'?

4 Answers4