Printer Job Language --> PDF

0

2

I have received a non-human readable file which I would like to make human readable.

How would I go about getting the text content from the following file:

thufir@dur:~/Documents$ 
thufir@dur:~/Documents$ file mystery.pdf 
mystery.pdf: HP Printer Job Language data
thufir@dur:~/Documents$ 
thufir@dur:~/Documents$ pdfinfo mystery.pdf 
Syntax Warning: May not be a PDF file (continuing anyway)
Syntax Error: Couldn't find trailer dictionary
Syntax Error: Couldn't read xref table
thufir@dur:~/Documents$ 
thufir@dur:~/Documents$ head -n 2 mystery.pdf 
%-12345X@PJL
@PJL ENTER LANGUAGE = HBP
thufir@dur:~/Documents$ 

I don't think it is a pdf file at all,but has been print to file'ed instead of export pdf'ed, or similar, so the result is a file in PJL not a pdf.

see also:

http://forums.fedoraforum.org/showthread.php?t=247913

I can use ghostscript to get it back into something human readable?

It's about 4000 lines of:

�x]�x�

when directly viewed with cat or similar.

Thufir

Posted 2014-08-02T17:07:37.603

Reputation: 876

can you run head -n 50 mystery.pdf and see if after the @PJL lines end there is a %PDF-1.X line appearing? – Kurt Pfeifle – 2014-10-08T18:46:24.810

Answers

2

The @PJL lines indicate a Print Job Language header that was inserted before the actual print job. PJL is used to control print job options (such as duplexing, paper tray selection, stapling, punching, folding the output). It was invented by HP.

The print job's format could be anything -- lots of printer vendors support and use it for their own (proprietary) printer language.

The interesting part is what follows after the @PJL header lines. It could be PDF, or PostScript, or PCL, or anything else.

Also interesting is the line saying @PJL ENTER LANGUAGE = ... -- it usually reliably indicates the format of the print data stream.

In the case of the mystery.pdf this is HBP, a format that I've so far not yet encountered.

If indeed there is another open or semi-open format following (in violation to what the ENTER LANGUAGE = ... line says), like PostScript or PCL, Ghostscript or GhostPDL will be able to convert it to PDF. Just delete all the @PJL lines from the header first. Then run:

For PostScript files:

 gs -o out.pdf -sDEVICE=pdfwrite input-file

For PCL files:

 pcl6 -o out.pdf -sDEVICE=pdfwrite input-file

Kurt Pfeifle

Posted 2014-08-02T17:07:37.603

Reputation: 10 024

pardon, I never got a chance to try this, I don't think I even have the file anymore. Marked as correct. – Thufir – 2016-05-06T05:44:59.443

2

Kurt's answer is correct. The only addition I would like to make is that when dealing with PRN files from Windows, there are multiple types of files that get printed under PRN file type, so make sure it is a PCL file. Even after that, pcl used maybe MS PCLXL, in which case pcl6 conks out. Download ghostpcl and use the command:

gpcl6-920-linux_x86_64 -sDEVICE=pdfwrite -o output.pdf input-file

Akshay

Posted 2014-08-02T17:07:37.603

Reputation: 21

Thanks for this suggestion! The Debian-packaged Ghostscript failed completely on me, but GhostPCL did the trick. – Mr. DOS – 2018-01-04T20:45:32.413