1

I'm doing research on certificates and I've managed to find a dataset. It is a single large text file where each line appears to be a PEM encoded certificate but I am unable to load them using pythons asn1crypto.x509 although I am able to load my own captured certificates, here is an example:

bf55348d696395689d5c461bc0f092ecabcdb6ca,MIIGaTCCBVGgAwIBAgIMBGvfD9kgQa6y01qTMA0GCSqGSIb3DQEBCwUAMFcxCzAJBgNVBAYTAkJFMRkwFwYDVQQKExBHbG9iYWxTaWduIG52LXNhMS0wKwYDVQQDEyRHbG9iYWxTaWduIENsb3VkU1NMIENBIC0gU0hBMjU2IC0gRzMwHhcNMTkwNzA1MDYxMDI4WhcNMjAwNzA1MDYxMDI4WjBgMQswCQYDVQQGEwJVUzERMA8GA1UECBMIRGVsYXdhcmUxDjAMBgNVBAcTBURvdmVyMRYwFAYDVQQKEw1JbmNhcHN1bGEgSW5jMRYwFAYDVQQDEw1pbmNhcHN1bGEuY29tMIIBIjANBgkqhkiG9w0BAQEFAAOCAQ8AMIIBCgKCAQEAvvIO1fWXjyPhiW1WWJegNGP6L2IrleldHVy0lE8WbU/3oSZQYm7Ab/xNZnYrce/7d8uG4j4Z8RXcuNV/2x48TMdu0KfrOoDulH75JEyqn/0RSFL+vb/rSPJ87/JHtImp1sFDlrIaq843nxhZdcgWtk3OIwsOzyH/+hJs8E/saQeo6M+k6sUP5qes5qIXyaVlaiwKlsFr8PWBtpnTWYb8Au5XehIAzx/Dht5AKOmhMS1RiXPyrTikUOk/Fuvx1vW6lo2uPNLQYr5i3rx+Qp0WsezmRWMMAAPdpgFuBN42cVP3CyHTanYL0Yi4rzlBtLieN5Qof8EVGu7ZwpUJZ3DZWwIDAQABo4IDKjCCAyYwDgYDVR0PAQH/BAQDAgWgMIGKBggrBgEFBQcBAQR+MHwwQgYIKwYBBQUHMAKGNmh0dHA6Ly9zZWN1cmUuZ2xvYmFsc2lnbi5jb20vY2FjZXJ0L2Nsb3Vkc3Nsc2hhMmczLmNydDA2BggrBgEFBQcwAYYqaHR0cDovL29jc3AyLmdsb2JhbHNpZ24uY29tL2Nsb3Vkc3Nsc2hhMmczMFYGA1UdIARPME0wQQYJKwYBBAGgMgEUMDQwMgYIKwYBBQUHAgEWJmh0dHBzOi8vd3d3Lmdsb2JhbHNpZ24uY29tL3JlcG9zaXRvcnkvMAgGBmeBDAECAjAJBgNVHRMEAjAAMIG+BgNVHREEgbYwgbOCDWluY2Fwc3VsYS5jb22CEyouY2FuLWlmaXJtLWRldi5jb22CEiouY2FuLWlmaXJtLXFjLmNvbYITKi5jYW4taWZpcm0tc3RnLmNvbYIPKi5jY2hheGNlc3MuY29tgg0qLmNjaGlmaXJtLmNhgg4qLmNjaHBvcnRhbC5jYYIVKi50YXhwcmVwZGFzaGJvYXJkLmNhgg8qLnR3aW5maWVsZC5jb22CDGNjaHBvcnRhbC5jYTAdBgNVHSUEFjAUBggrBgEFBQcDAQYIKwYBBQUHAwIwHwYDVR0jBBgwFoAUqSuH4c4kRzsbv8+FNwJVnQ2UWOYwHQYDVR0OBBYEFL5eozUQAU4YduRs9EG6LKk3jAsTMIIBAgYKKwYBBAHWeQIEAgSB8wSB8ADuAHUApLkJkLQYWBSHuxOizGdwCjw1mAT5G9+443fNDsgN3BAAAAFrwMFqTwAABAMARjBEAiBTBijVgdd8pHnfJRUGphZNhGjV0MuanAiqkclZUM3jDgIgc1jQuT851wi9MohqWgJMi0t/3MKhvgvzb9FU4l+KfokAdQCHdb/nWXz4jEOZX73zbv9WjUdWNv9KtWDBtOr/XqCDDwAAAWvAwW1fAAAEAwBGMEQCIEtuT0TXxHG1NBFCGdVqX/1rFc+WuDGgE/Sy61damwyCAiAbTGxoeCmJONXnLeCi4uY7ocdO5hsXbUhvESqBeybeMTANBgkqhkiG9w0BAQsFAAOCAQEAlrhHZKI3JxT3xOWUzJ/wrZN4mJ7MgcfOK9O8QXlLdLcpuDeZOsBj6o0P4GRF/IjPwNlaeDD7OPdArA8jSsKqQvfwsVOZ2wVJQPTtkcNUKD+RRViJwfIJOXJYgHz3oZidBYVFaGIb+OPM+K5FvvRMFKEgU2T88OwsyWCgCIhrAElOgvPdyaaytQtVwNqrU5Q4LSgteUgyiDChh30nEmmEeovXV0bqvwY6L9PKLT4aVHeTn1okRBmC6NNEJTcyBeOvph/l4R+Mpa03Xcr/2OVtDIEAEWJpvutY1fwsfVTIFdepAxUAnDmvMLC71+UnV6+sVTgzeKCQeesTlOpxhS2kQw==

am I wrong? is this not in PEM?

On a similar note, if anyone knows where I could find more certificate dumps (real ones, not synthetic ones) I would be most grateful

Nullman
  • 215
  • 1
  • 6

1 Answers1

6

As for example described in Wikipedia the PEM format requires a begin and end label, i.e. -----BEGIN CERTIFICATE----- and -----END CERTIFICATE----- . What you have here does not have such labels and thus cannot be PEM.

It is instead a base64 encoded DER representation of the certificate, prefixed with the SHA-1 fingerprint of the certificate. I.e. what you have here is

bf55348d696395689d5c461bc0f092ecabcdb6ca,MIIGaTCCBVGgAwI....pxhS2kQw==
         sha1_fingerprint_as_hex        ,der_encoded_certificate_as_base64

To get the actual certificate you can decode the base64 part and feed it into openssl x509:

$ echo MIIGaTCCBVGgAwI....pxhS2kQw==  |\
     base64 -d | openssl x509 -inform der -fingerprint -text

Certificate:
    Data:
        Version: 3 (0x2)
        Serial Number:
            04:6b:df:0f:d9:20:41:ae:b2:d3:5a:93
    Signature Algorithm: sha256WithRSAEncryption
        Issuer: C=BE, O=GlobalSign nv-sa, CN=GlobalSign CloudSSL CA - SHA256 - G3
        Validity
            Not Before: Jul  5 06:10:28 2019 GMT
            Not After : Jul  5 06:10:28 2020 GMT
        Subject: C=US, ST=Delaware, L=Dover, O=Incapsula Inc, CN=incapsula.com
    ...
SHA1 Fingerprint=BF:55:34:8D:69:63:95:68:9D:5C:46:1B:C0:F0:92:EC:AB:CD:B6:CA
-----BEGIN CERTIFICATE-----
MIIGaTCCBVGgAwIBAgIMBGvfD9kgQa6y01qTMA0GCSqGSIb3DQEBCwUAMFcxCzAJ
...
eKCQeesTlOpxhS2kQw==
-----END CERTIFICATE-----
Steffen Ullrich
  • 184,332
  • 29
  • 363
  • 424
  • ohh man i did not notice the sha1 in there, thank you! – Nullman Jan 18 '20 at 19:35
  • PEM also requires the base64 be broken into lines of 64 chars (except the last), although some software that describes itself as implementing PEM does not enforce this, and could better be called near-PEM or mostly-PEM or PEM-like or PEM-ish. – dave_thompson_085 Jan 19 '20 at 02:02