Can you trace malware back to a specific keyboard?

Question

A CNN article on the recent US Election hacks claims that

...the administration has traced the hack to the specific keyboards -- which featured Cyrillic characters -- that were used to construct the malware code, adding that the equipment leaves "digital fingerprints" and, in the case of the recent hacks, those prints point to the Russian government.

Now to me that sounds like total baloney. You're going to trace a character, which may in some executable code back to a specific keyboard? And you're going to know that its one particular model that is physically in one particular location?

Is this nonsense or is there something I'm missing here? Wouldn't it also be trivial to spoof whatever is the source of this info?

Remember that characters typed are just Unicode. I would imagine that if enough Unicode characters were collected they could hypothesize what keyboard it came from and possibly narrow it down to a keyboard used in a specific region. That said, it would take probably all of 10 sec to run code through a converter that would change the unicode to a different region. I imagine this would be trivial for just about anyone especially a government player. — DotNetRussell, Jan 03 '17 at 18:13
@AnthonyRussell That was my thought exactly... first that basically every Cyrillic keyboard across the world would boil down to the same unicode chars, and second that, as you said, it would be trivial to replace all the chars. — David says Reinstate Monica, Jan 03 '17 at 18:16
Honestly, I think if they had any actionable intelligence, they wouldn't make it public. As soon as they do, then the cats out of the bag and people will start taking steps to circumvent that. Either they have intelligence and they didn't release it, or this release was entirely bunk and political. I can see both happening honestly. I learn more toward they have intelligence and didn't let it out. — DotNetRussell, Jan 03 '17 at 18:18
Sections of "keysmashing" in the code could be a slightly better indicator of a specific keyboard layout than just the presence of certain characters. But that too is of course far from hard evidence. (E.g. "å‚∂ƒ©ªº∆@œæ" could be traced to a German Mac keyboard, not that that's a particularly likely thing to find in a source code.) — Emil, Jan 03 '17 at 21:33
@Emil Maybe in a esolang, but then tracing it to a particular keyboard goes out the window, anyway. ;) — jpmc26, Jan 04 '17 at 00:33
@DavidGrinberg: While it's true that every Cyrillic keyboard would produce the same Unicode _characters_, that does not mean they produce the same _code points_. In particular, й could be U+0419 or U+0418 U+0306. And I can imagine that different keyboard models use different ways to enter such diacritics, and by extension also use different encodings for those diacritics. — MSalters, Jan 04 '17 at 00:55
When I read “specific keyboard” I thought it meant the serial number of a *specific* keyboard. That's ballony. But from the comments here maybe he meant a “keyboard layout” e.g. the person made typos consistent with the standard Russian layout and character set. — JDługosz, Jan 04 '17 at 06:24
@JDługosz I was wondering the same thing at the start, but I presumed that even CNN wasn't that dumb about computers and instead assumed they just worded it poorly. — David says Reinstate Monica, Jan 04 '17 at 06:26
@MSalters Keyboards (at least any I know) do not produce code points! The keyboard only produces scancodes, indicating which key has been pressed. The interpretation of this input is entirely up to the operating system. — I'm with Monica, Jan 04 '17 at 08:20
@AlexanderKosubek: Minor detail. I actually have two keyboard layouts configured, so I can switch between entering `"o` and `ö`, but only the first layout matches my physical keyboard. The second is an example of a configuration which produces diacritics, but it does so by using a "dead key"_prefix_ whereas Unicode uses _suffixes_ for diacritics. With prefixes, using precomposed characters is easier. — MSalters, Jan 04 '17 at 08:31
@MSalters: In Russian `и` (i) and `й` (short i) are distinct characters, so using the combining character on the former in order to "create" the latter is plain wrong. I do not believe that any native Russians would do that, even if the end result looks the same. — dotancohen, Jan 04 '17 at 08:53
@dotancohen: How do you enter й on a keyboard that only has и? You'll need a combining character. That's why I speculated you might be able to tell keyboards apart from the encoding of й. — MSalters, Jan 04 '17 at 09:05
@MSalters: A keyboard that has `и` but not `й` is not Russian. The `й` character is on the same physical key as `q` (keycode 24). But now I think that I understand your reasoning. If the combining character is used, then that is a sign that it was not a real Russian who wrote the text. Is that what you mean? — dotancohen, Jan 04 '17 at 10:10
@MSalters It looks like diacritics, but isn't. It would be like saying that `i` (`и`) and `j` (`й`) are the same character in english, with `j` just having a little curve at the bottom (indeed, that's how it *started*, but not what it is now). But there are examples where trade-offs are made, of course - for example, the czech `ch` is actually a single phoneme and a single letter (similar to `Х` in cyrillic), but written as two letters, `c` and `h`. Just like the two English `th`s used to be two separate phonemes and letters, but now are written as `t`+`h`. — Luaan, Jan 04 '17 at 13:01
@dotancohen: Yes, that's my idea. The binary representation used may depend on whether you've used a real Russian keyboard with an explicit key, or another input method — MSalters, Jan 04 '17 at 13:05
@Luaan: Practically speaking, we're talking about computers here, and they are far more likely to obey Unicode logic than academic rules. I just checked Unicode 9.0 (2016 version); it _explicitly_ says that U+0419 й is identical to U+0418 и followed by U+0306 "Combining breve" . As for `ch`, I can't find it in Unicode. My standard example is `Ĳ` U+0132, which is _not_ i+j. That's clear from the fact it capitalizes as one letter. — MSalters, Jan 04 '17 at 13:18
@MSalters Yeah, the `ch` -> `c` + `h` happened long before Unicode came around (just like thorn in English); it does remain in Unicode in the sorting rules, though - `ch` is sorted between `h` and `i`, not `c` and `d`. I wouldn't call those rules "academic", they're just how people use their own language. But yes, I accept your argument with Unicode. And different (slightly non-conforming) implementations of Unicode might also be used to trace origin of some texts. Text is so complex that tiny differences can be quite identifying. "Plain text", *riiiight* :P — Luaan, Jan 04 '17 at 13:36
@Luaan: Reference for those Unicode sorting rules? Because Unicode doesn't have locale-specific rules AFAICT, while sorting definitely is locale-specific. After all, in most languages, "ch" sorts between "cg" and "ci". — MSalters, Jan 04 '17 at 13:43
@MSalters I didn't mean Unicode on its own - just things like sorting order in Windows or collation on MS SQL, whether on Unicode or a different charset. Needless to say, it's a huge pain since you can't really tell where `ch` is supposed to be treated as a single letter and when not (`CheckChuť` would ideally treat the first `ch` as `c` + `h` and the other as `ch`, but that's quite... tricky). — Luaan, Jan 04 '17 at 13:51
@DavidGrinberg CNN also recently [used screenshots from the game Fallout 4](http://bgr.com/2017/01/02/cnn-hacking-fallout-screenshot/) in a story about the Russian hackers. So your presumption may not be well-founded. — GalacticCowboy, Jan 04 '17 at 16:53

Arminius · Accepted Answer · 2017-01-04T02:04:54.353

A keyboard is not a typewriter. Keyboards produce scancodes that are interpreted by the software and mapped depending on your layout. When a key press produces a letter on your screen it's nothing more than the character value in its respective charset - keyboards don't leave "digital fingerprints" that could be traced back.

Instead, the author probably meant to say that they found strings or identifiers with Cyrillic letters in the source code. But such traces are easy to fake and wouldn't count as "hard evidence"; even metadata could have been planted.

Here's a similar case: After the Operation Aurora cyber attacks, analysts claimed they had found "Chinese source code" from which they concluded that the attack was led from China:

HBGary, a security firm, recently released a report in which they claim to have found some significant markers that might help identify the code developer. The firm also said that the code was Chinese language based but could not be specifically tied to any government entity.

Here, the case was actually stronger than the Cyrillic keyboard evidence as researchers could trace back parts of the code to a reference implementation that was only released in a Chinese paper:

Perhaps the most interesting aspect of this source code sample is that it is of Chinese origin, released as part of a Chinese-language paper on optimizing CRC algorithms for use in microcontrollers. [...] This CRC-16 implementation seems to be virtually unknown outside of China

(Source)

score 5 · Answer 2 · answered Jan 04 '17 at 09:46

As already stated, it's quite impossible to track keyboards. In theory it's possible to have keyboards contain some ID number that is transferred to the operating system (much like how itunes in the past knew what colour my ipod was), but appending that information to source code, Internet protocols or such so that it is traceable from the hacked system, is certainly not reality. Otherwise we'd already seen reports of it by those debugging their code or protocol messages.

I first thought about specific character encodings and it still could be the case. For example there are several parts in the ISO 8859 ("latin") standard. Many characters have the same encoding in all parts, including those that are required for scripts and such to execute. Then any extra characters to the 8859 set might give us some clues. For example it might be that when interpreting the characters using part 5 (ISO 8859-5 cyrillic) encoding, the extra characters make some sense.

Anyway, with the information at hand this is all just guessing. It may also be deliberately vague to give the impression that even your keyboards can be traced.

Can you trace malware back to a specific keyboard?

2 Answers2