This is an awesome idea - to manually trace through the cert validation process! I've enjoyed reading through your steps, since I've never actually done it myself!
Answering your questions:
First question: Is this the correct way of obtaining the certificate the *.wikipedia.org certificate was signed with? After all, how can I be sure ẁikipedia.org didn't present me with a bogus certificate and list the url to some bogus CA-certificate in this section? The url is not even https it seems...
Yup, this is the correct way. The key understanding that I think you're missing here is that the certificate file is already cryptographically protected by the CA's signature, which involves computing a hash over all the data in the cert, including the URL in the AIA
extension. So you know that this is the URL that wikipedia is intending for you to see, it can not be modifiable by a man-in-the-middle, or else the CA's signature wouldn't validate.
About https
: we had a customer recently who tried to use an https
link in their AIA
extension and we forced them to re-issue the cert. The reason is that you're trying to validate a certificate for *.wikipedia.org
, if doing that required you to establish an https
connection with secure.globalsign.com
then you would have to validate their certificate also, and what it their AIA
extension contained an https
link? This chain can become very long and cause a webpage to take minutes to load - or lead to infinite loops where the https
connection uses the cert you're trying to validate. No, a .crt
file is already cryptographically protected by the CA signature they contain, there's no need to add https
on top of that.
Second question: Is the data to be signed the same data that the original .PEM certificate contains? If not, what data is missing, i.e. is not to be signed?
All of the data that you need in contained in the certificate, however, you need to remove the signature field before computing the hash (see my answer to your fourth question below).
If you think about this from the CA's perspective, it makes sense: they can't know what the signature bits before they compute the signature, so it needs to be excluded from the hash. A signature is like a wax seal on an envelope, the seal goes on the outside.
Third question: What openssl command is used to do the conversion from .PEM to the .DER representation that was signed from the CA?
This web page seems to do a good job of covering this question:
DER vs. CRT vs. CER vs. PEM Certificates and How To Convert Them
In a nutshell, use the -inform / -outform
flags:
PEM to DER
openssl x509 -in cert.crt -outform der -out cert.der
DER to PEM
openssl x509 -in cert.crt -inform der -outform pem -out cert.pem
Fourth question: manually computing the signature.
Your understanding is almost correct; you compute your own hash (in this case SHA256) of the cert body and then verify that hash against the signature value using the parent (CA) publickey. Note that you cannot generate a signature without the privatekey, which you don't have and can't get; moreover, although some signature schemes are deterministic, including the one used here, some signature schemes are not, and even the privatekey holder (the CA) can't generate a new signature which is the same as the old one.
The basic structure of a certificate is shown in the specification for X.509 certificates on the Internet, rfc5280; it actually has four parts: the ASN.1 header for the SEQUENCE tag, the part containing all the 'real' information named tbsCertificate
which is an abbreviation for To Be Signed, an AlgorithmIdentifier
which specifies the type of signature (in this case sha256withRSA
), and the actual signature as a BIT STRING
. Although these fields are part of the DER, since you have openssl they can be conveniently accessed using openssl asn1parse
which defaults to PEM input (but mixed output). In this case:
$ openssl asn1parse -i -in wikipedia.pem -- comments added
0:d=0 hl=4 l=1811 cons: SEQUENCE -- this is the 'outer' SEQUENCE
4:d=1 hl=4 l=1531 cons: SEQUENCE -- this is the beginning of TBS
8:d=2 hl=2 l= 3 cons: cont [ 0 ]
10:d=3 hl=2 l= 1 prim: INTEGER :02 -- version minus one, so this is a v3 cert
13:d=2 hl=2 l= 18 prim: INTEGER :1121A225BA0402D791854854C8BA60686A9B -- serial
33:d=2 hl=2 l= 13 cons: SEQUENCE -- COPY of algorithmidentifier, see below
35:d=3 hl=2 l= 9 prim: OBJECT :sha256WithRSAEncryption
46:d=3 hl=2 l= 0 prim: NULL
48:d=2 hl=2 l= 102 cons: SEQUENCE -- issuer (CA) name, in several pieces
50:d=3 hl=2 l= 11 cons: SET
52:d=4 hl=2 l= 9 cons: SEQUENCE
54:d=5 hl=2 l= 3 prim: OBJECT :countryName
59:d=5 hl=2 l= 2 prim: PRINTABLESTRING :BE
63:d=3 hl=2 l= 25 cons: SET
65:d=4 hl=2 l= 23 cons: SEQUENCE
67:d=5 hl=2 l= 3 prim: OBJECT :organizationName
72:d=5 hl=2 l= 16 prim: PRINTABLESTRING :GlobalSign nv-sa
90:d=3 hl=2 l= 60 cons: SET
92:d=4 hl=2 l= 58 cons: SEQUENCE
94:d=5 hl=2 l= 3 prim: OBJECT :commonName
99:d=5 hl=2 l= 51 prim: PRINTABLESTRING :GlobalSign Organization Validation CA - SHA256 - G2
[... much snipped, including the subject name (for Wikipedia), the key, and all the extensions ...]
1539:d=1 hl=2 l= 13 cons: SEQUENCE -- this is the algorithmidentifier for the signature
1541:d=2 hl=2 l= 9 prim: OBJECT :sha256WithRSAEncryption
1552:d=2 hl=2 l= 0 prim: NULL
1554:d=1 hl=4 l= 257 prim: BIT STRING -- this contains the signature value
Notice that the TBS starts at offset 4, and the signature wrapper at offset 1554. Now do
openssl asn1parse -in wikipedia.pem -strparse 4 -out wikipedia.tbs
and it displays all the tbs (again), but it writes it (in binary) to the file:
$ od -tx1 wikipedia.tbs
0000000 30 82 05 fb a0 03 02 01 02 02 12 11 21 a2 25 ba
0000020 04 02 d7 91 85 48 54 c8 ba 60 68 6a 9b 30 0d 06
0000040 09 2a 86 48 86 f7 0d 01 01 0b 05 00 30 66 31 0b
[snip rest]
Similarly
$ openssl asn1parse -in wikipedia.pem -strparse 1554 -out wikipedia.sig
Error in encoding
3344:error:0D07207B:asn1 encoding routines:ASN1_get_object:header too long:.\crypto\asn1\asn1_lib.c:157:
displays an error because the signature is not ASN.1, but it still writes it to the file:
$ od -tx1 wikipedia.sig
0000000 b2 c6 af 4b 88 31 c2 44 33 37 20 48 01 71 06 81
0000020 39 a5 03 bc 16 0f 21 7b 29 23 62 a1 84 fc d0 f5
0000040 f9 2d 0a 26 c6 dc 7f a8 31 99 4f 05 ef aa ef a9
0000060 82 b2 c3 68 f7 53 3a 0c b7 ea e8 a5 82 1d da 75
0000100 98 c6 92 69 1c 15 34 8d 1c 1a 02 90 b6 f0 d1 fe
0000120 07 ee 0a 4f 75 5a 3b 25 6f 5f fb c6 6c a6 bd a3
0000140 bc e2 6f c8 0e d7 c4 e6 a6 99 86 c7 24 b5 1a e6
0000160 61 c3 86 13 59 e7 2b 44 57 64 f7 20 21 f6 e6 db
0000200 8f e5 16 a5 48 06 1b 42 57 31 0c 9e 68 e6 a6 8e
0000220 61 0c c2 08 a7 54 25 8b 33 7c 6a e6 85 31 5c da
0000240 22 6e 8b 65 7e 55 2f 9c 69 b3 2f 7e 59 7c f5 e6
0000260 3e 23 28 91 05 2d 9e fa 73 29 07 bb c8 98 2e 32
0000300 5c 6e 38 74 4f 66 1e a2 65 b7 2a 0d e5 8f da 6b
0000320 10 c7 2e e4 a9 69 a2 98 77 76 9c 39 f6 e0 f6 dc
0000340 3c b4 09 9e 03 eb d7 93 26 d4 fe a4 fd 46 82 13
0000360 14 3c 84 7f 15 e5 03 1e b3 50 34 46 b0 f9 39 fb
0000400
Obeserve this is the same value displayed in hex when you -text
the cert. Now get the publickey from the parent cert, in PEM format for use with openssl:
$ openssl x509 -in globalsignv2.pem -noout -pubkey >globalsignov2.pub
$ openssl pkey -in globalsignv2.pub -pubin -text
-----BEGIN PUBLIC KEY-----
MIIBIjANBgkqhkiG9w0BAQEFAAOCAQ8AMIIBCgKCAQEAxw5sPyOTf8xwpZ0gww5T
P37ATsKYScpH1SPvAzSFdMijAi5GXAt9yYidT4vw+JxsjFU127/ys+r741bnSkbZ
EyLKNtWbwajjlkOT8gy85vnm6JnIY0h4f1c2aRoZHVrR1H3CnNR/4YASrnrqiOpX
2MoKCjoSSaJiGXoNJPc367RzknsFI5sStc7rKd+kFAK5AaXUppxDZIje+H7+4/Ue
5f7co6jkZjHZTCXpGLmJWQmu6Z0cbTcPSh41ICjir9QhiwHERa1uK2OrkmthCk0g
7XO6fM7+FrXbn4Dw1ots2Qh5Sk94ZdqSvL41+bPE+SeATv+WUuYCIOEHc+ldK72y
8QIDAQAB
-----END PUBLIC KEY-----
Public-Key: (2048 bit)
Modulus:
00:c7:0e:6c:3f:23:93:7f:cc:70:a5:9d:20:c3:0e:
53:3f:7e:c0:4e:c2:98:49:ca:47:d5:23:ef:03:34:
85:74:c8:a3:02:2e:46:5c:0b:7d:c9:88:9d:4f:8b:
f0:f8:9c:6c:8c:55:35:db:bf:f2:b3:ea:fb:e3:56:
e7:4a:46:d9:13:22:ca:36:d5:9b:c1:a8:e3:96:43:
93:f2:0c:bc:e6:f9:e6:e8:99:c8:63:48:78:7f:57:
36:69:1a:19:1d:5a:d1:d4:7d:c2:9c:d4:7f:e1:80:
12:ae:7a:ea:88:ea:57:d8:ca:0a:0a:3a:12:49:a2:
62:19:7a:0d:24:f7:37:eb:b4:73:92:7b:05:23:9b:
12:b5:ce:eb:29:df:a4:14:02:b9:01:a5:d4:a6:9c:
43:64:88:de:f8:7e:fe:e3:f5:1e:e5:fe:dc:a3:a8:
e4:66:31:d9:4c:25:e9:18:b9:89:59:09:ae:e9:9d:
1c:6d:37:0f:4a:1e:35:20:28:e2:af:d4:21:8b:01:
c4:45:ad:6e:2b:63:ab:92:6b:61:0a:4d:20:ed:73:
ba:7c:ce:fe:16:b5:db:9f:80:f0:d6:8b:6c:d9:08:
79:4a:4f:78:65:da:92:bc:be:35:f9:b3:c4:f9:27:
80:4e:ff:96:52:e6:02:20:e1:07:73:e9:5d:2b:bd:
b2:f1
Exponent: 65537 (0x10001)
and you can either explicitly do the hash and check:
$ openssl sha256 <wikipedia.tbs -binary >hash
$ od -tx1 hash
0000000 71 fe 9a 8d 6a e2 85 7e f7 b4 be 22 a8 9e fb 4b
0000020 88 a2 e1 c9 c4 72 ef 65 40 07 77 54 4d 89 ef 38
0000040
$ openssl pkeyutl -verify -in hash -sigfile wikipedia.sig -inkey globalsignov2.pub -pubin -pkeyopt digest:sha256
Signature Verified Successfully
or openssl can do it as a single operation
$ openssl sha256 <wikipedia.tbs -verify globalsignov2.pub -signature wikipedia.sig
Verified OK
Note this works even for other signature algorithms than RSA, particularly
DSA and ECDSA.
Good on you for doing this, and good luck with future learnings!
I know you're only asking about the signature validation, but for completeness, I should also mention that you need to check the validity period of the cert to make sure it's not expired.
AND you need to check it isn't revoked, which is a good deal more complicated: you need to get a relevant CRL or OCSP response, which may be another (http) request, although some servers (including en.wikipedia.org
) do provide an OCSP response in the handshake, called 'stapling'. Both of these are also signed objects, so you need to parse and validate them, using the cert chain for the CRL or OCSP issuer, which is usually different than the chain for the server cert.
AND you need to repeat this process for each intermediate cert until you reach the root or other 'anchor' that you or someone (like Microsoft or Apple or Mozilla or Debian) has configured in your system as trusted. AND you need to check that each cert in the chain has the appropriate KeyUsage (and ExtendedKeyUsage, if present), and appropriate BasicConstraints, and possibly check Policies and PolicyConstraints if used, NameConstraints if used, and more. See rfc5280 again for the basic algorithm to validate a certificate: this is 19 pages plus 5 for revocation the old way (CRL) plus more elsewhere for the new way (OCSP). Aren't you glad we have browsers to do this for us?