How do I open files sent to me in a 'document envelope'?

7

I just got an email whose X-Mailer is "X-Mailer: Microsoft CDO for Windows 2000". It has as an attachment a .sgn file, whose content is an XML with one field apparently being a base64-encoded PDF:

<DocumentEnvelope><SignaturePackage><Signature =
xmlns=3D"http://www.w3.org/2000/09/xmldsig#"><SignedInfo><Canonicalizatio=
nMethod Algorithm=3D"http://www.w3.org/TR/2001/REC-xml-c14n-20010315" =
/><SignatureMethod =
Algorithm=3D"http://www.w3.org/2000/09/xmldsig#rsa-sha1" /><Reference =
URI=3D"#SignedDoc"><DigestMethod =
Algorithm=3D"http://www.w3.org/2000/09/xmldsig#sha1" =
/><DigestValue>MFV2XJ9rfjhGCyA948wKB741ChQ=3D</DigestValue></Reference></=
SignedInfo><SignatureValue>aKHfEGfu2p9RdShv1Vv/kqC6gjdymojq0rQA+AU/hPocrr=
VqMQk2wbbJD60jc8QPP0kPIo4vWqB1mVx5Y45HK0LFWxMDkJ2/CN8GcODEum2Mamn3W2j9tKV=
8JfJAexlW47LprDq99W9YwfpXusaEplCOErCRj/2dhnGc4SgZXxw=3D</SignatureValue><=
KeyInfo><KeyValue><RSAKeyValue><Modulus>nz78eiuYN1Jmm5ND8xLLbJ9QTrBpjTMfv=
h4mbmHbBSB7HSHU+7Izp5GCiyDAlmXa3JjqKBRjw2+OpwhsJf+KHPltKFKwOltTN9QJWS4HJm=
H1xqF4VAuwvpp1tlJd1KP5WL/j9YCYigzEfZIAAUC2KiFlAxoR1mwz3alMR4v96h8=3D</Mod=
ulus><Exponent>AQAB</Exponent></RSAKeyValue></KeyValue></KeyInfo><Object =
Id=3D"SignedDoc"><DocumentOriginName =
xmlns=3D"">ecd20f25-95b3-4dc3-b8e6-fc62d23db259</DocumentOriginName><Docu=
mentExtension xmlns=3D"">pdf</DocumentExtension><DocumentCreationDate =
xmlns=3D"">2014-02-27T22:10:27.4320656+02:00</DocumentCreationDate><Docum=
entContent =
xmlns=3D"">JVBERi0xLjQNJeLjz9MNCjMgMCBvYmoNPDwvQ291bnQgMS9LaWRzWzQgMCBSXS=
9QYXJlbnQgMiAwIFIgDS9UeXBlL1BhZ2VzPj4NZW5kb2JqDTQgMCBvYmoNPDwvQXJ0Qm94WzA=

(... etc. etc. ...)

P9fdsc3jL4yg7at7G488BKcqQbpnZDkhXFsfhc/VIuPexfElgnf2oagaf/QjiZHy+ganiZcAH=
dFFFrN6xYK5n0JL5g330NKzD5CHBS8X1civ8VUAKdWjgI8pm1rFsm4v20SwIp/81OH1w=3D=3D=
</CertBase64></Certificate></SignaturePackage></DocumentEnvelope>

If I copy out just the DocumentContent part, and base64-decode it, I see a PDF 1.3 header, but some decoders choke on it, and anyway, I can't get a working PDF from that thing. So:

  • How can I manually extract the PDF file from there?
  • Is there a standalone tool for extracting files from such mail messages, or from .sgn files?
  • Is there a Thunderbird extension which handles these, and presents the PDF as a regular attachment?

Notes:

  • The file was sent automatically by the Israel courts' 'Net Ha-Mishpat' platform. I can contact the courts but they have no technically-literate people, and I can't contact the software contractor they used.
  • I know of people who have, in the past, managed to extract decoded files from these .sgn's, I just don't know how exactly.

einpoklum

Posted 2014-02-27T21:13:39.597

Reputation: 5 032

Answers

2

I got one of those documents myself today.

Since explaining what is wrong to the tech support people seemed likely to take more time than attempting to extract it myself, I created a small python script to extract and decode the pdf document that was embedded in the sig file.

That is, assuming that there is a single attached pdf file and the sig file format is the same as mine.

I hope that someone would find it useful.

import base64
import xml.etree.ElementTree as ET
import sys


def decode(infile, outfile):
    tree = ET.parse(infile)
    xmlns = '{http://www.w3.org/2000/09/xmldsig#}'
    b64 = tree.find("./SignaturePackage/{0}Signature/{0}Object/DocumentContent".format(xmlns)).text
    txt = base64.b64decode(b64)

    with open(outfile, 'bw+') as f:
        f.write(txt)

if __name__ == "__main__":
    if len(sys.argv) < 2:
        print('usage: python unpack.py <input_filename>')
        exit(1)
    infile = sys.argv[1]
    outfile = 'out.pdf'
    decode(infile, outfile)
    print('Done. Result saved to {0}'.format(outfile))

I created a gist for this script.

You need to have python 3.x installed, put the sig file and the python script in the same folder (or provide the file path to the script) and execute it like so:

python unpack.py <sig_filename>

This will create a file named out.pdf in the same folder.

MasterAM

Posted 2014-02-27T21:13:39.597

Reputation: 138

Shouldn't you decode from stdin to stdout by default? Or at least only decode from file if a file is specified? – einpoklum – 2015-05-12T21:01:37.050

Using a file name as an argument seems reasonable enough. No need to use stdin/stdout. It is also more robust IMHO, as you can provide more arguments and make it extract multiple files more easily. I hope that you don't require it that often, though. – MasterAM – 2015-05-12T22:15:53.127

Not that it matters that much, but - your way this decoding can't be piped (except by creating named pipes). Not very friendly... – einpoklum – 2015-05-13T21:30:56.180

That is correct, but then again, it's a 20-LoC utility that can be easily adapted :) You should probably check if it works and let me know if there is any issue. – MasterAM – 2015-05-15T07:14:02.137

I will, next time I get one of them. I don't think I have one saved. – einpoklum – 2015-05-15T12:56:19.477

1

Here's a rudimentary script you can use on Unix-like systems (and probably on Windows too with a little modification) to extract the PDF file out of the document envelope; I call it sgn2pdf (since the doc envelope file have an sgn extension). Its command-line interface is

sgn2pdf [INPUT_FILENAME] [OUTPUT_FILENAME]

i.e. if you add a first argument it will read from that file rather than from the standard input; and if you add a second argument it will redirect the output into the second file specified.

Source:

#!/bin/bash
#
# Extract a PDF file from an Israeli courts' .sgn PDF document envelope

exec 3<&0 # tie (new) file descriptor 3 to what is currently the standard input
exec 4>&1 # tie (new) file descriptor 4 to what is currently the standard output

if [[ $# > 0 ]]; then
    exec 3<$1 
    shift
fi
if [[ $# > 0 ]]; then
    exec 4>$1
    shift
fi
exec <&3 >&4
sed -r 's/^.*<DocumentContent[^>]*>//; s/<\/Document.*$//;' | base64 -d -i >&4

The base64 decoder is part of the GNU coreutils package and should be available on any Linux distribution.

einpoklum

Posted 2014-02-27T21:13:39.597

Reputation: 5 032

0

Probably too late, but if you got this file from the Israeli court system (נט המשפט), then here they give a link (this link) to a windows program that opens it.

yohbs

Posted 2014-02-27T21:13:39.597

Reputation: 101

1Hmph. First they send out emails that require MS Outlook, then they offer to "help" you - assuming you use Windows. Wonderful. Anyway - not your fault, thanks. – einpoklum – 2015-11-29T22:52:43.460

@einpoklum I agree (I run Ubuntu and had to use my wife's laptop). But hey - at least they don't use pigeons... – yohbs – 2015-11-30T04:52:38.103

0

The use of CDO 200 and the document envelope indicate that the email was likely sent automatically or programmatically, i.e. via a script, out of Access, or in some other way via SMTP and a CDO-compliant program (not a normal mail client).

The SGN file is unlikely to be a true SGN file, which is a "Sierra Print Artist" file; it seems more likely that someone has used the extension manually for a signature file.

I do not believe that this file was intended to be the sort of attachment that you would be expected to open. It seems far more likely that the file you're seeing is included with the email as a way for the sender to show it as "signed" when it is automatically generated. Because the PDF is embedded within the XML file, there is likely no extension which would automatically decode the section of the attachment that you believe to be a PDF. You could try copying the entire section and then decoding it, and saving the decoded text with a unicode-compliant text editor, then see if that opens as a readable PDF.

But I think you are wasting your time and this attachment is along the lines of what you'd see if someone included a vCard which contained an image when they sent you email out of some program via CDO. That is, it's not intended to be decoded, because if you could do that, then perhaps you could falsify the signature of the sender.

Have you tried contacting the sender to find out whether the attachment has any meaning? It seems fairly obvious to me that it is just intended to be a qualifying signature file. The header tells you that the algorithm used to generate the signature is at http://www.w3.org/TR/2002/REC-xmldsig-core-20020212/xmldsig-core-schema.xsd#rsa-sha1 -- that alone should tell you that it's not a file that you are meant to open as such.

Debra

Posted 2014-02-27T21:13:39.597

Reputation: 4 000

Like I said, I did copy out the content section, and decoded it with partial success. It's a 3-page PDF - but I can't get the contents right. There must be something about the charset of the sgn files, or some heading/trailing junk, or something. – einpoklum – 2014-02-28T06:53:20.363

If it's a file that you are supposed to be able to open, then have the sender re-send it to you. But everything you describe seems to indicate a file that is part of a signature and not intended to be opened in this way. The way it was sent is what determines whether it can be opened as a standard attachment. The "document envelope" just relates to how one wraps a message for sending with CDO via SMTP. – Debra – 2014-02-28T07:18:36.027

Do you know of any software tools which handle these kinds of 'Envelopes'? – einpoklum – 2014-02-28T07:52:54.313