How to convert PDF to ASCII Postscript so the contained text can be searched/replaced?

According to Chapter 3.2 of the PostScript Language Reference, "there are three encodings for the PostScript language:ASCII, binary token, and binary object sequence".

We've been generating PDF files from HTML/CSS with PrinceXML for quite some time. Recently, a new requirement arose in cooperation with another company that needs the contents of our PDF files as Postscript. When converting the PDF to PS via the command-line by using pdf2ps, pdftops, a2ping or others, the resulting PS files seem to have one of the binary encodings as there's no way to search for text.

We're delivering the PS file few days prior to printing and don't know the printing date beforehand, but as a requirement, the printing date needs to be printed. Therefore, we need to insert a date-placeholder (##.##.####), which they will automatically replace when printing.

If we insert that placeholder in our HTML/CSS representation, it can't be searched in the contents of the postscript file and therefore not replaced with the current date prior to printing.

Does anyone know a way to convert the PDF to ASCII PostScript so the contained text can be searched and replaced?

Codepunkt

Posted 2011-06-16T15:28:27.057

Reputation: 121

Why are you doing HTML -> PDF -> PS? Why not go straight from HTML to PS for this client? – Flimzy – 2011-06-19T06:48:06.777

because we didn't find any way to do so that produces ascii postscript so we can use placeholders the way we need to and that supports the same or almost the same html/css features as princeXML so both pdf and ps look the same. – Codepunkt – 2011-06-19T19:21:57.493

One way to make your PDF and PS look the same would be to do HTML -> PS -> PDF... although that doesn't address the text replacement requirement. IME, when text-replacement is required in PS, it's usually been done by writing raw PS. It's also possible that TeX could output into ASCII PS. But I'm sure you have absolutely no interest in rewriting your documents in a way that you could do this, though. :) I wish I could offer a better suggestion. – Flimzy – 2011-06-19T20:36:32.340

we considered writing raw postscript and then doing html > ps > pdf. could be kind of an emergency solution in case we find no other way to deal with it. any idea on books/tutorials/best practices for writing raw ps? :) – Codepunkt – 2011-06-19T22:43:43.750

Can you provide a link to a sample PDF that you need to convert to ASCII Postscript? Can you also provide a link to a non-ASCII PostScript that is supposed to contain your placeholder '##.##.####'? This way I might be able to work out a path for you to follow... – Kurt Pfeifle – 2011-06-20T16:09:04.620

How to convert PDF to ASCII Postscript so the contained text can be searched/replaced?

Answers