23

Digicert has disallowed "double dashes" in the third and fourth characters in new certs:

Effective October 1, 2021, for publicly trusted TLS/SSL certificates, we no longer allow the use of double dashes (--) in the third and fourth characters in domain names, unless the double dashes proceed the letters xn (xn--example.com).

Similarly, AWS has made such certs ineligible for ACM renewal.

Digicert references ballot 202, which I found on cabforum.

CAs MUST NOT include Domain Labels which have hyphens as the third and fourth characters unless the first character is “x” or “X”, the second character is “n” or “N”, and the fifth and later characters are a valid Punycode string.

This is my first time encountering punycode, and it seems rather interesting itself. But why are CAs prohibited from using hyphens when the domain is NOT punycode? Is there some security concern at play here? Digicert mentions that sites like es--xyz.loudsquid.com are not allowed. Why is es-- undesirable?

Indigenuity
  • 1,323
  • 2
  • 7
  • 13
  • 6
    This is particularly interesting, since the ballot 202 you linked to has failed. – nobody Dec 21 '21 at 21:29
  • Hah, I thought you were saying the link failed. I hadn't even noticed the bold `Ballot 202 fails.` on the page – Indigenuity Dec 22 '21 at 15:51
  • 2
    Ballot 202 included a lot of changes around names and definition of them (note even its name: "Underscore and Wildcard Characters", unrelated to this "IDN prefix" stuff discussed here). The fact that is failed is not specifically around the `^..--` restriction, but just as a consequence of this part being included with other changes, and the whole set of changes were refused. I suspect this is just planned to be rewritten/worked on again, I am going to check. – Patrick Mevzek Dec 22 '21 at 19:25
  • 5
    Part of the 202 ballot was indeed merged, see https://github.com/cabforum/servercert/pull/285 that shows branch name of `ballot/202_redux`. And as I said in previous comment, the PR description has: "For a future effective date (TBD): [..] Prohibition on Reserved LDH domain labels that are not XN-labels" – Patrick Mevzek Dec 22 '21 at 19:27

4 Answers4

21

Adding an answer because too long in a comment, but on the specific point of why reserving everything if xn-- is enough.

In one of first iteration of IDNA standard ("Internationalizing Domain Names in Applications"), in a draft in November 2001 (draft-ietf-idn-idna-04) there was this:

  1. ACE prefix The ACE prefix, used in the conversion operations (section 4), will be specified in a future revision of this document. It will be two alphanumeric ASCII characters followed by two hyphen-minuses. It MUST be recognized in a case-insensitive manner.

The scheme allowed interoperability tests when there was multiple encodings proposed. So in fact it seems there was at least bl--, bq--, dq--, lq--, mq--, ra--, wq-- and zq-- (and when things solidified, xn was chosen at random so that no one had a head start and no collisions with actual existing names). If you are old enough, you would remember that Network Solutions/Verisign then was selling bq-- domain names, as IDN testbed.

In February 2003:

An eligible subset of that list of 42 entries will be determined by eliminating the following codes due to their use, in one or more top-level domain zone files that have been reviewed, as the first two characters of second-level domain labels that have hyphens in their third and fourth character positions: AA, QM to QZ, XA, XZ, and ZZ.

Going back to December 2000 at IETF San Diego has these notes:

ACE identifier candidates

  • prefixes: AA--, AB--, ..., 99--
  • suffixes: --AA, --AB, ..., --99

Relevant domain names: aa--a.com, aa-b.org, ..99--zzzz.net, aa--x.co.jp, etc. a-aa.com, b--aa.org, ..., zzzzz--99.or.kr, etc.

Proposal

step 1: tentative suspension of registering relevant domain names for ACE identifier candidates

step 2: conduct a survey of relevant domain names already registered

step 3: select about 10 to 20 identifiers one of which is for test and others for real use, based on the survey

step 4: permanent blocking of registrations of domain names relevant to the selected identifiers (except for registrations compliant to MDN semantics).

In November 2000 in draft-ietf-idn-aceid-00 we have:

All strings starting with a combination of two alpha-numericals, followed by two hyphens, are defined to be ACE prefix identifier candidates. All strings starting with one hyphen followed by three alpha-numericals, and strings starting with two hyphens followed by two alpha-numericals are defined as ACE suffix identifier candidates. ACE prefix identifier candidates and ACE suffix identifier candidates are collectively called ACE identifier candidates.

which got simplified in following June to just:

All strings starting with a combination of two alpha-numericals, followed by two hyphens, are defined to be ACE prefix identifier candidates. All strings starting with two hyphens followed by two alpha-numericals are defined as ACE suffix identifier candidates.

And the mailing list archives before 2001-01 seems to be lost forever so no way to find more about that, I fear.

t0r0X
  • 103
  • 3
Patrick Mevzek
  • 1,748
  • 2
  • 10
  • 23
  • 2
    Forget about it being too long for a comment, this is quite simply a better answer than the existing ones, because it cites actual uses and standards. – IMSoP Dec 22 '21 at 19:11
  • 1
    Interesting to see all this from ~20 years ago only just now being realized as restrictions. – Indigenuity Dec 22 '21 at 20:12
  • Is it possible that some DigiCert clients are using this prefix system internally/informally, and this plugs some kind of security hole? Are there new/current proposals to introduce standard prefixes other than `xn--`? – shadowtalker Dec 22 '21 at 20:44
  • "Is it possible that some DigiCert clients are using this prefix system internally/informally,". Yes, because below registries and registrars, everyone is free to put whatever it wants in its zone, if its nameservers are allowing the name. So you can certainly put `ab--example` in your zone and nothing will break. "Are there new/current proposals to introduce standard prefixes other than xn--?" Not that I am aware of, because there is no new work in the IDNA arena since latest standard. While problems exist, it is probably deemed "good enough" as is (or hard enough to improve substantially). – Patrick Mevzek Dec 22 '21 at 20:52
17

The double hyphen is reserved as a generalized extensibility mechanism of which Punycode is one example.

RFC 5891: 4.2.3.1. Hyphen Restrictions

The Unicode string MUST NOT contain "--" (two consecutive hyphens) in the third and fourth character positions and MUST NOT start or end with a "-" (hyphen).

EricLaw
  • 358
  • 2
  • 6
  • 6
    This answer just repeats what OP already said in the question. _Why_ is it restricted? What is it used for? What's the rationale? – BlueRaja - Danny Pflughoeft Dec 22 '21 at 14:19
  • 6
    @BlueRaja-DannyPflughoeft Because it is reserved? That information was not in OP's question. – xngtng Dec 22 '21 at 15:14
  • So presumably there could be more types of extensibility, but Punycode is the only one currently? That seems like a plausible explanation for the reservation, but is that just an extrapolation? I don't see any sort of explanation in the RFC or a mention of a plan to leave room for future extensibility mechanisms – Indigenuity Dec 22 '21 at 15:42
  • 2
    This quote appears to be referring to the Unicode string which is used as _input_ to the IDNA encoding. That's probably to avoid parsers messing up on inputs like "xn------a", or "xn--xn--xn--". It's not clear to me that places any restriction on a _plain ASCII_ domain. Section 4.2.1 forbids anything with "xn--" which is _not_ IDNA compliant, but doesn't mention a generic extensibility mechanism for any _other_ uses of "--" - plausible though that is. – IMSoP Dec 22 '21 at 17:44
  • The rationale was, once IDNs were worked on, that multiple prefixes could be needed. In fact, during standardization, the first prefix was `bq--` and only later became `xn--`. Which means, out of precaution, that then all those prefixes became reserved as in "not supposed to happen in the wild". Studies were made at that time to see that in real existing domains, the pattern `^..--` was very rare, hence this choice. – Patrick Mevzek Dec 22 '21 at 18:09
  • @IMSoP "but doesn't mention a generic extensibility mechanism for any other uses of "--" - plausible though that is." There is none, as of today, because none is needed. There is only one version of IDNA (two in fact, but no changes for the following) and it uses `xn--`. There are no other variants of IDNA needing other prefixes. But there were, before standardization took place. – Patrick Mevzek Dec 22 '21 at 18:10
  • 2
    @PatrickMezvek Like I say, the explanation is perfectly plausible, but the RFC quoted doesn't say anything about it one way or another - it just says that **before** encoding a string for use in IDNA, it mustn't have hyphens in certain places. That doesn't seem to have any relevance to other **ASCII** domains using hyphens whether they like, as long as they don't start "xn--" – IMSoP Dec 22 '21 at 19:07
  • @IMSoP " it just says that before encoding a string for use in IDNA, it mustn't have hyphens in certain places. " Because after encoding, due to how IDNA works, they will be there. Take ASCII name "my--cafe", pass it to IDNA processing and you get "Punycode" name "my--cafe" (if you disable UTS#46, part of the explanation should be there in fact, see CheckHyphens rule in it). If you want to prevent ASCII names starting with `..--` outside of `xn--` you need to forbid them on input aka on the Unicode side – Patrick Mevzek Dec 22 '21 at 19:47
3

This is due to the double dash's usage in internationalized domain names. xn-- has a special meaning in domain names, and it is technically a violation of the IDNA2008 standard if the -- series of characters is in the 3rd and 4th spot, unless the first 2 characters are xn.

The specific RFCs that were defined for IDNA2008 are RFC 5890 to 5894.

dcom-launch
  • 265
  • 1
  • 10
  • 4
    That doesn't really explain a lot. What specifically is the issue with allowing `--`? –  Dec 21 '21 at 22:20
  • 4
    @MechMK1 As mentioned by EricLaw answer if you allow any string then you cannot use `--` as an extension mechanism for future extensions anymore. In other words the standard says that `--` is a reserved combination, currently `` can only be `xn` to mean the punycode extension, but in the future other might exists. Allowing other domains means in the future their interpretation might suddenly change which is not good. – GACy20 Dec 22 '21 at 07:32
  • 1
    @GACy20 No, it is not `prefix--` that is disallowed. It is pattern `^..--` (except if `xn--`) which is only two characters at start then 2 dashes. A starting dash is also disallowed, per other rules. – Patrick Mevzek Dec 22 '21 at 18:11
3

The purpose of leaving certain patterns of alphanumeric-and-dash strings reserved is to allow for the possibility that they might be used to represent something that can't presently be represented. There's no way a certificate authority can know what a string like aa--bcde might mean in future, and who might be entitled to use such a thing as a domain name. If a CA were to issue certificates for that domain name to Acme Enterprises and then the committee in charge of domain name formats decided that names starting with aa-- should be issued by the Accreditation Agency, which issued Binary Coded Decimal Enterprises the name aa-bcde, the fact that Acme Enterprise had a certificate for that name would be a problem. To be sure, it might be mitigated by issuing a revocation for the issued certs, but it's better to simply avoid such problems in the first place.

supercat
  • 2,029
  • 10
  • 10
  • As with other answers and comments, this is plausible, but it would be good to find a standard that actually makes this reservation. Everything you've said here *could* be true of domains containing the word "supercat", but presumably there is an agreement somewhere that a double hyphen in this position *might* have a meaning in future. – IMSoP Dec 22 '21 at 19:09
  • 1
    When data formatting standards characterize various patterns as "reserved", that generally implies that they don't have meaning at present, but deliberately leaves open the possibility that it might have a meaning assigned to it in future. – supercat Dec 22 '21 at 21:22
  • I know what it means, I'm just saying you haven't presented any evidence that it's true, or said what standard in particular reserves these patterns in particular. – IMSoP Dec 22 '21 at 21:56