7

I'm trying to understand the point of having multiple Certificate Transparency logs. While I understand that it solves the problems of reliability of trust, what baffles me is that so many are operated by the same entity: most notably, Google operates the following:

Google 'Argon2020' log
    \- Cert Count:     872098511
Google 'Argon2021' log
    \- Cert Count:     88502501
Google 'Argon2022' log
    \- Cert Count:     2790673
Google 'Argon2023' log
    \- Cert Count:     6392
Google 'Xenon2020' log
    \- Cert Count:     904142660
Google 'Xenon2021' log
    \- Cert Count:     68113254
Google 'Xenon2022' log
    \- Cert Count:     7243881
Google 'Xenon2023' log
    \- Cert Count:     6370
Google 'Aviator' log
    \- Cert Count:     46466471
Google 'Icarus' log
    \- Cert Count:     762271779
Google 'Pilot' log
    \- Cert Count:     1077198088
Google 'Rocketeer' log
    \- Cert Count:     1102506143
Google 'Skydiver' log
    \- Cert Count:     297801233
  1. Are some of them more important than others?
  2. Why are some split into 202X series and what does the number mean?
  3. Also, I installed axeman and it gave me an overlook on the total sum of certificates in all CT logs: it's around 7 billion. Do the certificates repeat in different logs or are they all unique?
  4. If they're duplicates, is there a way of deduplicating them when scraping the logs?
d33tah
  • 6,524
  • 8
  • 38
  • 60

2 Answers2

3

Why are some split into 202X series and what does the number mean?

Since certificate transparency logs can grow to include hundreds of millions of certificates, they are often split into separate physical logs (this is called temporal sharding). The number represents the year up till which the log is valid. Certificates are placed in different logs based on their expiration date.

Apart from Argon and Xenon logs, the other logs seem to be old non-sharded logs and as can be seen, have grown somewhat large. The Aviator log is now read-only and new entries to the rest have been restricted.

According to this document (which is a bit old),

Google runs three geographically diverse CT logs which accept all certificates issued by CAs accepted by any major browser

So the difference between the Xenon and Argon logs is probably that they are geographically located in different countries. This is probably so that interference from a single government will not render the entire CT infrastructure unreliable. Other CT log operators possibly do the same

nobody
  • 11,251
  • 1
  • 41
  • 60
2
  1. As long as the logs are recognised by the browser/software validating the certificate it wont matter which log was used, for example known logs used by chrome would make a log not on the list somewhat less interesting.

  2. See nobody's answer

  3. Certs are repeated in different logs for reliability, i.e. if one of the logs you used is offline, but the other is working. And also for security - Chrome requires certs to be in at least 3 logs, both google and non-google run, so that even if one log was hacked, or being run dishonestly, that would have to happened in multiple places to be useful

  4. The duplicates should have the same serial numbers, so you can see its the same cert.

mcfedr
  • 162
  • 4
  • About point 3, its not just likely, its almost definite, since Chrome requires certificates to be present in at least two logs (one Google and one non-Google) so there definitely are duplicates. – nobody Sep 03 '20 at 10:57