10

A new critical issue was discovered in the character definitions of the Unicode Specification through 14.0.

Does it only affect code compiled from sources with disallowed unicode characters?

RHEL describes that it is relevant only to GCC.

Is it only C or CPP files?

What if a disallowed unicode character appears in HTML or CSS files?

Michael
  • 1,457
  • 1
  • 18
  • 36

2 Answers2

5

"CVE" 2021-42694 does not affect code at all. It affects the systems human beings use to review code and proposed code changes - that is, fancy text editors/IDEs, GitHub pull request and code review workflows, etc. This is a consequence of blindly applying UTR #9 to the entire body of code/patch as a single context, rather than "resolving embedding levels" in an application-specific (in this context, programming-language-specific) manner so that embedding/override controls are not allowed to exert formatting influence across different contexts (comment blocks, quoted strings, etc.) or just not honoring embedding/override control characters at all.

  • 2
    I think you're talking about the wrong CVE, about CVE-2021-42574, while the question is about CVE-2021-42694. – Kelly Bundy Nov 17 '21 at 16:22
  • @KellyBundy: Frankly, I'm confused about why that CVE is labeled as coming from 2021. [Confusables normalization](https://www.unicode.org/reports/tr39/#Confusable_Detection) has existed for ages, and is completely standardized. Sure, maybe some/most/all individual programming languages neglect to actually apply it, but that doesn't make it Unicode's fault... – Kevin Nov 17 '21 at 16:53
  • The "attack" is discussed in the [paper](https://trojansource.codes/trojan-source.pdf) on page 7-8 under _E. Homoglyph Attacks_. It's the same issue as CVE-2021-42574, the rendered text is not what it appears. – Johnbot Nov 18 '21 at 09:29
  • @Kevin: Because someone requested a CVE identifier for it this year to boost their reputation claiming discovery of a long-known non-CVE-worthy issue. – R.. GitHub STOP HELPING ICE Nov 18 '21 at 16:17
1

The critical issue is certainly not in the character definitions of the Unicode Specification through 14.0. In fact the Unicode committee provides adequate methods to fix such attacks, at least the homoglpygh and spoofing attacks on identifiers. There is TR31, TR36 and esp. TR39 which provide the recommendations. This is published since 2000, and almost nobody is following those security recommendations.

The problem is that identifiers are not identifiable. This happens in code, but also in objects files, because those do have an ABI and API, e.g. headers, FFI, linker access, tools missing checks.

The main problem is with all the language committees and implementors who were fast to implement unicode support for identifiers without any security measures. The only safe languages are the ones which refused (zig and J), and the ones which did follow TR39: Java, cperl and Rust. All 100 others are vulnerable by design. And now even gcc-10 provides these insecurities. I compiled an overview over the various language and compiler insecurities at https://github.com/rurban/libu8ident/blob/master/c11.md

For a fixed version of the C23++ std. However, when I did this for perl5 in 2016, they didn't care, so only my fork cperl is secure. Rust did listen around that time and provided proper unicode security.

Anyway, now I have a C library which can be used to implement it, and a linter to check against. I'm also experimenting with an improved readelf -L -Ul checker to find existing unicode security issues in libraries. readelf was pretty broken displaying utf-8. This needs to be improved further to display unicode problems better. Such as on github recently. See my github for the readelf patches.

BTW: Another cynical fix would be to rename identifier to symbol in the language specs. Because symbols need not to be identifiable. They are just the same binary junk as filenames. Of course filesystems and login systems (databases, such as ldap and passwd) need to be fixed also.

rurban
  • 111
  • 4