0

I am writing a software that records and stores data files.

I want to digitally sign and time stamp these files so that I can assure that it's Contemporaneous and Attributable (to comply with 21 CFR part 11 regulations and ALCOA C+ guidelines). This means I want to be able to prove that the file was recorded by a certain user and prove that the file was created/modified at a certain time.

I have some questions regarding the validity of the implementation I have in mind:

Implementation:

Digital signature

  • I was planning to generate a RSA pair of keys.
  • I would create the hash of the file (using a strong hash generator) and encrypt with the public private key.
  • I will call the encrypted hash the "signature" and save it inside the file.
  • At this point I assume that the file is "Attributable" and it can be validated using the author private public key.

Timestamp

  • I will send the "signature" (encrypted hash of the file) to a TSA
  • The TSA sends me a hash which will be my timestamp signed by the TSA.
  • I save this hash inside the file.
  • At this point I assume that the file is "Contemporaneous" and can be validated only by sending this hash to the TSA.

Questions:

  1. Is this implementation conceptually correct?
  2. Does this implementation comply with most standards (assuming that I use good hash and RSA keys generators and TSAs)?
  3. Is it possible to have a trusted offline timestamping (to account for computers that cannot have internet access or temporary lost connection)?
AndrolGenhald
  • 15,436
  • 5
  • 45
  • 50
cinico
  • 93
  • 7
  • 2
    Why do you feel you cannot use a standard solution instead of implementing your own? – Stephane Jan 09 '19 at 12:42
  • I have zero experience doing this, so I there is a high chance that I have a lot of misconceptions about this. I was assuming that a standard solution for digital signing and timestamping files work well for common file formats. The data format of my files is non-conventional, so I was assuming that the standard solution would either not work or change the file format to something that we cannot control. – cinico Jan 09 '19 at 12:46
  • 3
    If your data files are composed of bytes written to a storage device, it's a conventional file, and a standard solution will work. You can study the process of signing, not bother with the implementation. – ThoriumBR Jan 09 '19 at 13:02
  • 1
    If you have zero experience, that's a VERY good reason not to implement something yourself. Have a look at GPG and PKCS: and pick the one that fits your environment the best. Then find a reputable library that implement digital signature with secure timestamping (most do) and use that – Stephane Jan 09 '19 at 13:25
  • Thank you so much for your comments. They are making me see things a bit more clear... I am using MATLAB (which is based in java). I already use some java libraries for the RSA keys generation. I will search for an appropriate library for signing and timestamping. What's still not clear to me is: if I use such libraries to do the work, I don't know how the timestamp and signature will be saved inside the file. Thus, I'm concerned that if the file format is modified and thus I can no longer load the file with my software (due to an unknown format). I hope I'm making any sense – cinico Jan 09 '19 at 13:57
  • MatLab files are conventional files. You can save the signature on another file (say, `savefile.mat.sig`) and ship it alongside `savefile.mat`. No need to modify the original file, and any change on the original file OR the signature file will be detected. – ThoriumBR Jan 09 '19 at 14:48
  • 1
    It seems you might want to consider using ASC X9.95 (which is an expansion of the RFC 3161 idea for TSAs). Then your timestamp authority is verifying both the data and the time of data commit, basically wrapping all your requirements together (as best as I can tell). Quick wiki primer: https://en.wikipedia.org/wiki/ANSI_ASC_X9.95_Standard . Otherwise, yes, your general idea is correct. You sign a file which is used to prove that the file has integrity (hasn't been changed since the signer signed it) and you timestamp, with a trusted authority, the signature to prove WHEN that was. – Ruscal Jan 09 '19 at 16:22
  • @ThoriumBR But that means that the software will generate one signature for each data file, which is a bit awkard to manage. Since I would expect the signature file to simply be an hash, I was hoping that I could save it together with the data file. Is this fine? – cinico Jan 09 '19 at 16:23
  • If you save the signature on the file, you change the file, changing the hash and invalidating the signature... – ThoriumBR Jan 09 '19 at 17:51
  • @ThoriumBR I understand what you mean. However, what I thinking was this: I would have a file that would contain the data that I want to sign and timestamp. I would calculate the hash of that information, get a signature and then save it in the same file. Then, to verify the signature, the software would open the file, get only the part of the file containing the data, and calculate the hash of the data. The way I see it, it's not much different than having the original file and a file with the signature and then zip it together with whatever extension I want. But maybe there a simpler way (?) – cinico Jan 09 '19 at 20:57
  • @Ruscal Thanks. I actually read most of that wiki page before writing down my original question. I thought I understood it right, and thanks for the confirmation, but what is actually confusing me now is where to save the stamps and signatures. Is it common practice that if I want to time stamp and sign a file that I have to save a second file and keep it "linked" to the original file? – cinico Jan 09 '19 at 21:07
  • 1
    @cinico There are multiple schemes at that point. I've seen software use alternate streams so that primary payload is in stream[0] (which is what most software uses) but the sig-stamping is in stream[1] (or n if you already have more sterams). Some (very many, actually) file types have meta-data positions that are for this purpose (most every file time if you include "could be used for, even if not designed"). And then there is the descriptor file approach. Most of my stuff appends sig to main file (the above is signed as, then the above -inc sig- is time stamped at) which TGZ and GPG do. – Ruscal Jan 09 '19 at 22:35
  • @Ruscal Great! I think it's clear to me now. Thank you – cinico Jan 10 '19 at 07:18
  • "encrypt with the public key...call the encrypted hash the "signature"" - Did you get public and private mixed up, or do you not know how signatures work? You definitely don't want anyone with the **public** key to be able to "sign" whatever they want. _Real_ signatures are created with private keys, with a signature algorithm, not an encryption algorithm. – AndrolGenhald Jan 10 '19 at 15:21
  • @AndrolGenhal You're right, I meant the private key to sign. However I did meant to use an encryption algorithm to (to encrypt the file hash with the private key). I'm not sure what's the difference to a signature algorithm. – cinico Jan 10 '19 at 16:50
  • @cinico "Encrypt hash with private key" is really just a [misexplanation](https://security.stackexchange.com/a/87373), what people actually mean when they say that is "sign data with private key". – AndrolGenhald Jan 10 '19 at 16:59

1 Answers1

4

So, in general, your thought process is correct.

  1. Create the digital content Document and save it.
  2. To prove that Author last saved Document they would sign the file with their asymetric key
  3. To prove when Document was saved you would cryptographicly time-stamp it with the help of a trusted Time Stamp Authority.

At any point in the future anybody can validate that the file is the same as it was at a point in time (unchanged since step 3) by validating the timestamp hash with the help of the TSA.

At any point in the future anybody can validate that the file is unmodified from when the Author saved it (unchanged since step 2) by validating Document's signature using Author's public key.

At the core, this seems to be what you are thinking about and yes this is how it does work. And yes, an implementation along these lines does comply with every standard that I am aware of (I work in the USA, but not with the FDA) for proving authenticity of documents across a time domain; including digital contracts (it provides proof of identity and proof of time signed). Though if you are attempting to use this for any purpose where compliance is mandatory then you should take the entire matter up with legal council. Let me be clear: I am not a lawyer and nothing I say should be construed as legal advise or a recommendation on a legally acceptable course of action for you and/or your company. If that is what is required, make sure that you and/or your company bring in your/their own lawyers to ensure you are compliant.

OK, now that the CYA is out of the way we need to look at implementation. You have a data payload for which you wish to guarantee authenticity. This isn't uncommon and is something we all deal with regularly ("verify the file hash before installing", "verify the message signature with my public key", etc). It is a known solved problem in our world. For verification of integrity of the payload, a hash of the payload is computed for comparison. For verification of authorship of a payload, the hash is computed and then cryptographicaly encrypted (using the Author's private key) -- a method we refer to as "signing the document". While you verify the integrity by making sure the hashes match, you check the authorship by decrypting the "signature block" using the supposed Author's public key -- this gives you the file integrity hash that you then verify as before.

Again, this is a solved problem with various schemes in existence to perform it. One of the most common is Pretty Good Privacy (PGP). The PGP standards include a full definition of how to create good cryptographic keys (a required pre-req for signing) and how to properly sign a message. At this point you get into the discussion of Opaque Signatures (where the signature and payload are combined into a single output) and detached signatures.

With an Opaque Signature you will only have one file in the output. It is a combination of the data payload from the original file, and the signature block. While this works well when you are also encrypting the payload (so you'd have to perform a separation and decryption to use the data anyway) its usefulness in scenarios where there is a desire to directly consume the data by an external program varies depending on if that program will be OK with the signature block. For textual messages an Opaque Sig normally isn't an issue. For binary files it will all depend on the software you want to use the file with.

Contrast that with a Separate Signature, where a secondary "signature file" is created to live side-by-side with the original (payload.data and payload.data.signature). This method increases your file count since there are new files for each signature, but the original file will still be able to be opened/viewed in it's intended software without any extra modifications or allowances. This makes the separate signature the more common method for binary files that remain in use. Separate Sigs also have an advantage when using a Time Stamping Authority; you only have to time-stamp the .signature file, reducing the size of the file being transmitted to/from the TSA.

And that then brings us around to the TSA's role in all of this. The thing about a Time Stamp Authority is that they should be near-universally trusted, external and unaffiliated to the parties they provide timestamps for, and impartial in their review of any action. The TSA's role requires them to be neutral across the domain where they are required to provide actionable data. That means if you only need a TSA to provide internal audit timestamps but those aren't required externally, then you can run your own internal authority. But if you need to be able to prove to an outside party that something is accurate as of a given date/time, then the TSA used should be external to both of you (to prove neither of you are tampering with the results). What the TSA actually does is take a file/data, add a timestamp (and, often, information about who requested the time stamp), and then sign the whole thing. So in the end, you may have payload.data, payload.data.signature, and payload.data.signature.timestamped (in some cases .signature is superseded by .signature.timestamped as the final file can provide both functions; other places may need to keep both.)

But this brings up the issue on offline time-stamping. And the answer there is, no. Using a TSA is, by its very nature, asking an external auditor to verify the time something happened. Using an offline TSA (I'm reading it here as: "I self-timestamp the documents without using an external party") breaks the entire reason for performing timestamping at all -- having an unbiased external party verify the time. Otherwise you might as well use the "last modified on date" from the filesystem; you can change your system clock and fake the modified date as easily as you could change the clock and fudge your own internal timestamp. (again, consult your lawyers, but if you're trying to comply to FDA standards then I'm going to assume you're trying to make proofs to an external regulator of some sort) So, no, an offline (or an internal, or an affiliated-to-you) TSA would not fulfill its purpose when trying to make proofs to external entities.

And, finally, a thought about something that came up in the comments (and I briefly touched above). There was some consideration about the fact that this balloons the number of files associated with a project. And, while the .sig and .time files may be small in size, it certainly does. One thing to look at if this is a consideration is what kind of files are being verified, how often in their lifecycle a time stamp is needed, and if they are individual & separable or a group of files in a single project. Most often such signatures and timestamps are applied to final output (or major checkpoint, but not to works in progress); so you don't need tons of these files for points-in-time, just the very few to handle when you need an authority signature. And, often one will look for this kind of authenticity verification for a project and not individual files. If that is the case, it may be prudent to bundle the files together in a single container (zip, tar, etc) and then perform the signing and timestamping on that container. Then you'd have two files: a single archive file for each completed project, and a .signature.timestamp to verify it.

Ruscal
  • 811
  • 4
  • 7