205

I'm wondering if it's safe to black out sensitive information from a picture just by using Microsoft Paint?

Let's take in this scenario that EXIF data are stripped and there is no thumbnail picture, so that no data can be leaked in such a way.

But I'm interested in whether there is any other attack, that can be used in order to retrieve hidden information from the picture?

Mirsad
  • 10,005
  • 8
  • 33
  • 53
  • 4
    Have you considered just experimenting yourself and black-out a small area of the image and compare before/after hexdumps ? Maybe add extra gaussian blur or other features ? – Little Code Jun 13 '16 at 22:36
  • 3
    Not yet, I have that in my mind... – Mirsad Jun 13 '16 at 22:39
  • 9
    Are you meaning with "hidden information" actually "hidden" informations? Or do you mean by "hidden", informations that are removed? 2 very diferent things in respect to your question – Zaibis Jun 14 '16 at 07:11
  • Perhaps a good practice would be to make a completely new image by cutting and pasting sections of the old inage into the new image. You'd be guarunteed to not transfer data. After all, what computer would secretly copy/paste the sections you did not select? – user64742 Jun 15 '16 at 02:43
  • 22
    @TheGreatDuck surely the reason we even have an Information Security stackexchange is that computers very often expose data in unintended ways, no? – nekomatic Jun 15 '16 at 07:47
  • 7
    For small images, I blacken out the sensitive info and just take a screenshot of that. May be time consuming to do this with a large number of pictures. – rohithpr Jun 15 '16 at 16:15
  • 1
    One safer option is to print out the image after 'blanking out' then to scan it back in (with low res) on anther computer. That way an error in using software is less likely to get you. – Ian Ringrose Jun 15 '16 at 18:20
  • @nekomatic that may be true, but generally when you copy a piece of data, only that data gets copied. It wouldnt be a security issue to copy unrequested data, but rather an issue of inefficiency. – user64742 Jun 15 '16 at 18:52
  • Hmm, Perhaps with black text on white pictures, it would make more sense to white-out the text than black it out. – chux - Reinstate Monica Jun 17 '16 at 21:41
  • While safe, it may be better to fill in that black box by a blurred version of a fake image where you use a complicated point spread function instead of the standard Gaussian. People who are intent on recovering the removed information will then be kept busy, some of them are experienced enough to see that it is worth trying. They will then ultimately succeed with great effort but all that effort will have gone to waste. This enhances security because every minute spend in vain on your image is a minute spend less on some other, potentially more vulnerable images. – Count Iblis Jun 17 '16 at 22:15
  • 1
    Related (and from the same author 2 years sooner ;) ): [Is blurring face secure?](https://security.stackexchange.com/q/62529/32746) – WhiteWinterWolf Jul 10 '16 at 13:51
  • 1
    What if after blanking out the sensitive data, instead of saving the file you used the windows sniping tool to get a snapshot of the content? This way there is no going back for sure! – papakias Jul 06 '17 at 12:00

6 Answers6

226

As mentioned in the answers to a very similar question, scribbling over part of an image will destroy the original pixels, assuming that your editor doesn't store any layers or undo history in the saved image. (Paint doesn't.) There are some things to watch out for, though:

  • The width of the blanked region places an upper bound on the length of the secret data
  • The height of the region could tell attackers whether the text representation of the data has ascenders or descenders (like in the letters b and p)
  • Any spaces in the blanked region provide information about the relative lengths of the data's parts/words (mentioned in David Schwartz's comment)

If you use a blur rather than a plain opaque rectangle/brush, a determined attacker could try lots of different possibilities in the image to see what text(s) get close to your image when blurred. Some effects can be undone almost perfectly, so make sure the one you use involves a lot of randomness or actual data destruction (e.g. a blocky pixellization). Of course, Paint doesn't have any special effects, so you should be fine.

One possible thing to be wary of is JPEG compression artifacts around the secret data, which could be used to get clues about the shape of the text. It never hurts to overwrite more information than necessary when you're concerned about secrecy. (This attack isn't a problem if the image never went through JPEG compression before your redaction.)

Ben N
  • 2,491
  • 1
  • 12
  • 22
  • 192
    Don't forget that if your are blocking out information from a sorted list etc its position can give away what it said. I heard a story of the USA releasing a list of cities containing a particular type of secret installations from the cold war. With ones that were still in use blacked out. The list was alphabetized, so a list of possible cities for each blackout could be generated. Then by cross referencing this list against other facts, on how reasonable it was for a site to be in them, very good guesses could be made at which cities still had active sites. – Frames Catherine White Jun 14 '16 at 02:42
  • 87
    Reversal of blurring type effects isn't a hypothetical risk. At least one notorious scumbag, [Mr. Swirl](https://en.wikipedia.org/wiki/Christopher_Paul_Neil) was busted after someone worked out a way to unscramble his face in CP pictures he shared. – Dan Is Fiddling By Firelight Jun 14 '16 at 05:04
  • 25
    A tall blacked-out region doesn't tell you whether the text representation contains descenders or not. It only tells you that it *might.* – Robert Harvey Jun 14 '16 at 05:07
  • 31
    Even "a blocky pixelization" [may leave enough data to recover obscured text](https://dheera.net/projects/blur)—much better to just completely cover it. – Miles Jun 14 '16 at 05:36
  • 41
    @RobertHarvey: Right, but a not-tall blacked-out region tells you that it _doesn't_. – Lightness Races in Orbit Jun 14 '16 at 10:18
  • 6
    "It never hurts to overwrite more information than necessary when you're concerned about secrecy" -- if that's true then take it to extremes and don't release the document at all. In any practical use of redaction, you do "want" to release the data not required to be redacted, for some value of "want" ranging from "would like" to "have a legal duty". – Steve Jessop Jun 14 '16 at 11:19
  • 14
    I saw a document where the name of a particular person was blacked out. But there were only eight people it could have been and the spacing of the words around the blacked out area made it possible to eliminate all but one of them. – David Schwartz Jun 14 '16 at 17:15
  • 3
    @DavidSchwartz With proportional fonts, many many different pixel width of the erased word are possible and knowing its exact pixel length may give substantial clues as to what word it was, even if there are *way* more than eight possibilities. While the black area may not allow an attacker to determine this exact pixel length directly, the positions of adjacent words the spacing of the rest of the line may give that away! – Hagen von Eitzen Jun 14 '16 at 20:47
  • 3
    Can JPEG artifacts that reveal (are influenced by) the contents of the blacked out area extend beyond a "JPEG block" [i.e. 8x8 pixels]? – Random832 Jun 14 '16 at 21:06
  • @Random832 At least in theory, yes. There's nothing stopping a JPEG encoder from doing an error-diffusion pass. In practice, I'd be more worried about double-encoding from a different format / encoder, or saving it with a different block offset. – TLW Jun 14 '16 at 22:27
  • 1
    One thing to keep in mind here is that blacking out is not the only way to redact data. Take a look at [this image for example](http://imgur.com/s8ADQlS). The image length could be considered to set an upper bound but there's no guarantee the original capture contained the full name either. For curiosity, the original name was longer than 'a policy'. This can even be used on paragraphs (say when two words on a line need to be blanked) by changing the position of line breaks. – Bruno Jun 15 '16 at 01:03
  • So to clarify in the above case you just replace the text with an identical copy of blank space. Whether that's whitespace (most text editors), another color or even a pattern. – Bruno Jun 15 '16 at 01:52
  • 7
    One fun way to deal with the blur problem is blanking it out with the background color (clone tool), writing some random text like "Did you know that apple pie is delicious?" in the same font, then blurring _that_ heavily. You just have to make sure that the text you're censoring is the same length as the replacement text. – Nic Jun 15 '16 at 04:17
  • 1
    Wouldn't changing the brush color in Paint to match whatever background color the text is in front of solve most of the length/height concerns? – WorseDoughnut Jun 15 '16 at 13:50
  • There could be information hidden in an image by steganography, e.g. to be able to trace who had leaked a document. – Andrew Morton Jun 15 '16 at 18:08
  • 1
    It also depends if you trust Paint as a piece of software. – CoffeDeveloper Jun 16 '16 at 08:18
  • If you really want to separate the original metadata from the new image, use a photo snipping tool (like Snipping Tool which ships with windows) to copy it into paint, make your edits, use Snipping Tool again to save it off as a new image. There should be no real way for any data that was hanging around the original image to be present in the final image. – Marshall Tigerus Jun 17 '16 at 18:36
  • 1
    When you block out the address on an envelope, don't forget to block out the USPS POSTNET delivery point barcode. – Dennis Williamson Apr 20 '18 at 21:12
64

Ditto Ben N, but let me add a couple of points that are too long to fit as comments.

I'd emphasize the distinction between layered and un-layered data formats. Drawing a black box over a section of a GIF, JPG, or PNG image destroys the previous contents. Drawing a black box over a section of a Photoshop, Corel Draw, or Paint Shop Pro native image does not destroy the previous contents if it's on a different layer.

I'd be very cautious about blurring. You'd have to know how the software does the blur. If the blurring does not involve any randomness, if it's a deterministic algorithm, it may be possible to undo the blur with appropriate software. No way would I rely on it without thoroughly understanding the algorithm. Unless there was some very good reason to blur rather than black out, I just wouldn't do it.

Of course any attempt to redact with solid blocks must completely cover the original contents to be safe. You want to draw a black box, not scribble over it with a black pen that might leave gaps.

Some formats may keep an internal history log. Not quite the same thing, but I once had a case where my organization produced documents in PDF, another company edited those documents and then sent then back to us. We found that errors had been introduced in the documents and, to put it bluntly, blamed them. They claimed that the documents must have been like this to begin with because they didn't do it. Apparently they were unaware that PDF has an internal log of all changes, and I was able to identify exactly what text was changed and the exact time and date of every change.

Jay
  • 859
  • 5
  • 5
  • 64
    In 2005 there was a case where a US soldier killed an Italian secret agent in Iraq. The US published a report which contained classified information, including the name of the soldier who shot. It was a PDF, and the secret information had been covered with a black layer. It was quickly discovered that [the text beneath was still present](https://en.wikipedia.org/wiki/Rescue_of_Giuliana_Sgrena#Release_of_classified_information_in_US_report), and a simple copy/paste would reveal everything. So this is a real risk! – Fabio says Reinstate Monica Jun 14 '16 at 10:02
  • 16
    In 2007, [Christopher Paul Neil](https://en.wikipedia.org/wiki/Christopher_Paul_Neil) was arrested for child pornography, because he used the photoshop "twirl" tool to obscure his face. They were able to undo the effect, and reveal his face: https://www.schneier.com/blog/archives/2007/10/untwirling_a_ph.html – Jonathon Reinhart Jun 14 '16 at 11:07
  • 9
    It might be worth updating your answer as at least a PNG created in Adobe Fireworks does use layers so it doesn't destroy the content underneath. However I'm unsure about cross compatibility with other image editors (especially Photoshop) – Crazy Dino Jun 14 '16 at 12:07
  • 13
    It's not actually randomness that's required, just non-reversibility. (e.g. a hash function isn't random. Neither is drawing a black box. Neither is setting every pixel to the average of all pixels in a region. The latter drastically reduces the information content of the region, but still contains some of the source information.) – Peter Cordes Jun 14 '16 at 12:09
  • @PeterCordes Yes, lazy wording on my part. Randomness is *A* way to make it non-reversible, but certainly not the only way or even the best way. – Jay Jun 14 '16 at 13:29
  • 8
    @CrazyDino More accurately, that's the [APNG](https://en.wikipedia.org/wiki/APNG) format (whether or not it is actually animated). Like animated GIFs, animated PNGs use layers as frames, and if one layer contains the sensitive information, it isn't truly destroyed. – Gallant Jun 14 '16 at 14:49
  • 4
    @PeterCordes Even a hash function is dangerous in cases where the input has only a few possible values: the attacker can simply compare the hash to the hash of all possible inputs. – Bruno Le Floch Jun 15 '16 at 18:01
  • @BrunoLeFloch I suppose so. I'm reminded of an article I read about 20 years ago -- so forgive me if I don't get the details quite right -- by someone who worked on Unix security. He said at one point they decided to generate passwords automatically as random strings of characters, rather than letting users create their own passwords, to avoid "Password1" and the like. Except ... the random number function took a 2-byte integer as a seed. So even though a string of (I think) 8 alphanumerics might look like there were hundreds of billions of possible values ... in fact there were only 64k. ... – Jay Jun 15 '16 at 18:51
  • ... And so one of the developers managed to break all the passwords by just running off a list of all 64k passwords and then running it through the hash function and matching against the passwd file. – Jay Jun 15 '16 at 18:53
  • 1
    @jay no salt, in addition to no entropy? They should have known better. – JDługosz Jun 16 '16 at 05:52
  • @JDługosz Apparently the way they developed Unix security was by having a "good guy" who developed security functions and a "bad guy" who tried to break them, until they got stuff that was hard to break. Sounds like a fun job to me. – Jay Jun 16 '16 at 13:23
  • Good point about layers. An important data point: PDFs are based on PostScript, which is a [vector](https://en.wikipedia.org/wiki/Vector_graphics) format. However, they can ALSO contain raster (image or bitmap) data. It's really become a "container" format, as this answer points out including comments, history, layers, etc. Container = layers. The only "safe" way to obscure any data in a PDF is to convert it to bitmap first so you can't run any algorithms to rasterize it since the rasterization is already done. You lose resolution independence obviously but that's the price for security. – RitterKnight Jun 16 '16 at 15:45
22

When blacking out sensitive information in Paint the original pixels are destroyed. But using Inkscape to black out part of a vector image does not destroy the pixels, but instead covers them. If someone removes the black cover they can see the pixels. The same applies to things like Foxit Reader (I almost sent a document with sensitive information which had been covered with a black square).

So using MS Paint to black out sensitive information is safe. JPEG artifacts might show some of the text like @BenN says.

Just don't blur it if you don't blur enough and MS Paint doesn't support blur anyway.

Stevoisiak
  • 1,515
  • 1
  • 11
  • 27
Suici Doga
  • 477
  • 3
  • 12
  • 10
    Good point - also more complicated editors like Photoshop have transparency settings in several places, and if one gets set to 98% instead of 100%, the color can _look_ like black, but the original data is really just mixed with an almost black color and can be retrieved. – JPhi1618 Jun 14 '16 at 14:09
17

As a raster image program that does not use layers nor contain an undo history after saving, overwriting sensitive pixels in Paint irrevocably changes them in the saved image.

More reasoning:

Microsoft Paint is a proven simple piece of software with a long history and great popularity that works natively in simple raster image file formats. Serious flaws in Paint's algorithms would have likely been uncovered by now.

When redacting information in a raster image file it's best to use a simple format such as .bmp, .jpg. Simple formats are much easier to inspect and historically have resisted such forensic attacks as data recovery.

Of course, any security method can only show that there isn't any known vulnerability. But I couldn't find proof of any successful recovery of blacked out or blanked out information in a raster image in the .bmp or .jpg file formats that were edited using Paint.

Blurred or pixelated image sanitization has shown vulnerability to data recovery techniques. But that is outside the scope of the question.

geoO
  • 310
  • 1
  • 5
  • 7
    Sometimes a very short answer is a very good one. – Joshua Jun 16 '16 at 15:14
  • @Joshua Nope. Answers must be longer in order to be considered *very good*. It's the size that matters; not only the quality. Also, citations and reasoning are needed. – EKons Jun 18 '16 at 08:43
  • 5
    @Έρικ Κωνσταντόπουλος "Answers must be longer in order to be considered very good." Odd thing to say. An answer should be as long as it needs to be. That's a meta question anyway. – geoO Jun 19 '16 at 14:19
  • @geoO Meta? R U SRS? This is the main site, not META! – EKons Jun 19 '16 at 14:24
  • 3
    My point is you asked a meta question. About question *length.* Longer != better. I greatly expanded my answer anyway. Problem is you can't prove a negative. One successful recovery of blacked out data means it isn't a secure method. I can't find one successful such attack. My hex inspections of the bitmaps edited in Paint show no data recovery is possible.. – geoO Jun 19 '16 at 14:53
13

Already a few good answers here, saying Paint is safe. (I have no reason to believe otherwise.)

Just want to add that while blacking out a rectangle that fully covers the area and any surrounding areas (lists that information is part of etc) using a basic well studied image editing program should be fairly safe, just using any image editor might not be safe as shown by http://www.underhanded-c.org/_page_id_17.html

Erik I
  • 231
  • 1
  • 4
7

Some comments on previous answers (all good - Stack Exchange is like watching really good crossword puzzle players.) An interesting topic which occasionally might be life-and-death important. (My overactive imagination at work, but battered women at a shelter whose location is critical to keep secret are an example that come to mind).

Points that I hadn't considered that struck me as particularly important:

  1. Redact spaces, and here's why: Always redact more rather than less. If I were trying to guess, I'd assume a short (i.e. one or two word) redaction to be a name or a date (as a first approximation). So redact longer if possible.

  2. Try very hard to avoid redactions (particularly short ones) of the same length. Those would be likely to contain the same information.

  3. All of the answers provided are true with the current version of Paint (or even a Photoshop image flattened and exported as bmp, png or jpg), but any update of Paint may suddenly introduce undo-through-save, or layers, or auto-backup. And Microsoft have actually introduced some new features in Paint in Windows 10.

  4. Making sure that black is black is, as another poster pointed out, very important - an example that occurs immediately is scanned text (most often grayscale), but that's easy enough with Paint. Just make sure you're using the rectangle tool and both color selectors are set to actual black. (Although some artistry with the Paintbrush tool may give false information about ascenders and descenders. Whether this is ethical or legal I have no idea).

  5. As a developer, it strikes me there might be a use for a redaction tool that takes all of this into account, or a search-and-redact macro in Word.

  6. Obviously, redact even subtle contextual clues -- "his" or "her" eliminates 50% of the search pool (roughly). But that's drifting outside the scope of the question.

  7. I'm not sure about the method redaction for legal purposes, but replacing the redacted text with [REDACTED] would leave almost no clues, if you have access to the original text. You could use this technique in Paint, as well, but disguising the length of the original text would involve a lot of cutting-and-pasting.

Glorfindel
  • 2,235
  • 6
  • 18
  • 30
Colin
  • 71
  • 1