How do I convert a word document to a pdf?

10

1

Help guys my assignment says it needs to be in .pdf format but I did it in Word. I'm really stuck.

How do I take a word document in .docx format and make a .pdf that contains all the text from it? Bonus points if it also contains all the images and any formatting, but text is a bare minimum. The sample file I'll be using will be this one, though your solution should be generic.

I don't want it to go through any unnecessary processing steps - simply encoding then decoding the document in base64 or whatever is not in the spirit of the question, though creative use of cowsay will be an exception to this. Standard rules of code trolling apply - the solution should be technically correct, all steps should be technically necessary, the result should be technically useless. This should be more of a "Rube Goldberg" style program, than an obfuscation and obtuseness competition.

Most upvotes on answers other than my own by 5/1/14 wins.

Note: This is a question. Please do not take the question and/or answers seriously. More information here.

ymbirtt

Posted 2013-12-29T12:03:22.877

Reputation: 1 792

2This assignment is way complex, but I'm certain that the only proper approach would be to use a Preview Handler in a WPF application, take a screenshot of that, save the bitmap as a GIF and then print that as a PDF – Mathias R. Jessen – 2013-12-29T18:50:47.133

Code-trolling is in the process of being removed, as per the official stance. This post has a fair amount of votes on the question and the answers, and even though it recieved over 50% "delete" votes on the poll, it is one of the more well specified [code-trolling] posts. Therefore, I am locking it for historical significance.

– Doorknob – 2014-05-12T12:18:04.443

Answers

24

Ok this is a little tricky but not too bad because pdf uses the same graphics model as postscript which means that once you have postscript it is quite trivial to convert it to pdf and postscript is way to drive printers all you have to do is print to get postscript.

Now you could write a program to convert postscript to pdf, but we don't have to there is ghostscript, which was written for unix and works just fine on linux (no major differences for this project). Unfortunately word only runs on windows, so you need two computers, and to convince windows that the linux computer is a printer you need a serial cable and a null modem. If your computer(s) don't have serial ports usb to rs232 converters work just fine (I recommend ones with a fttdi chipset). Now hook up the two computers with the serial cable and the null modem and verify that you can communicate (make sure that your parameters match).

Ok now that you have them talking it is time to convince your windows box that the linux box is a printer: just install the printer driver for the applewriter II and say it is connected to the serial port. Now when you print you send postscript to the linux box. the next step is to save it as a file.

Now scoot over to your linux box and use this simple command:

dd -if=/dev/ttyS0 -of=- -bs=1 | ps2pdf - - | sed -e '' >tmpfile && mv tmpfile file.pdf

and as simple as that you are done.


This can actually be made to work (if you send a signal to dd when you are finished) but there are easier ways like printing to a file and running gostscript on your windows box, and although fttdi makes good quality usb to serial converters it is a royal pain to install the drivers.

hildred

Posted 2013-12-29T12:03:22.877

Reputation: 1 329

2Though testing this is beyond my means, a bit of background reading suggests that this is both valid and awful. Good work! – ymbirtt – 2013-12-29T15:25:20.397

6I thought about including instructions for making a null modem, just so a soldering iron was needed. – hildred – 2013-12-29T15:34:33.693

13

These days many printers are combination printer/scanner with automatic document feeders. It will be simple.

  1. Print the document.
  2. Scan the print out.

emory

Posted 2013-12-29T12:03:22.877

Reputation: 989

3This is how people actually do it... I wish I were kidding. And, this is [tag:code-trolling], where is your code? – derobert – 2014-01-08T22:53:33.510

9

PHP

This code produces PDF files that should print out perfectly on your ticker tape machine. If you want to view the PDF files on your monitor, you might have to zoom in a bit.

Example source document word document

PDF output (viewed in browser) partial view of PDF document

Source code

<?php

header("Content-Type: application/pdf");

$s = docx2txt("word-file.docx"); // <-- Insert filename here!
echo txt2pdf($s);


function docx2txt($filename) {
  if (!($z=zip_open($filename))) return false; // Can't open file
  while ($r=zip_read($z)) {
    if (zip_entry_name($r)!="word/document.xml") continue;
    if (!zip_entry_open($z,$r)) return false; // Can't open XML data
    for ($s="";;) {
      $c=zip_entry_read($r);
      if ($c===false || $c=="") break;
      $s.=$c;
    }
    return trim(preg_replace('/\s+/',' ',preg_replace('/<[^>]*>/','',$s)));
  }
  return false; // Can't find XML data
}


function txt2pdf($text) {
  $width="".ceil(strlen($text)*7.2);
  $text=str_replace('(','\050',str_replace(')','\051',$text));
  $length=strlen($text);
  $wlen=strlen($width);
  $len4="".(44+$length);
  $xr3=sprintf("%010d",174+$wlen);
  $xr4=sprintf("%010d",449+$wlen);
  $xrstart=544+$wlen+strlen($len4)+$length;
  return "%PDF-1.1\n%¥±ë\n\n1 0 obj\n  << /Type /Catalog\n     /Pages 2 0 R\n" .
         "  >>\nendobj\n\n2 0 obj\n  << /Type /Pages\n     /Kids [3 0 R]\n   " .
         "  /Count 1\n     /MediaBox [0 0 $width 14]\n  >>\nendobj\n\n3 0 obj" .
         "\n  <<  /Type /Page\n      /Parent 2 0 R\n      /Resources\n       " .
         "<< /Font\n           << /F1\n               << /Type /Font\n       " .
         "           /Subtype /Type1\n                  /BaseFont /Courier\n " .
         "              >>\n           >>\n       >>\n      /Contents 4 0 R\n" .
         "  >>\nendobj\n\n4 0 obj\n  << /Length $len4 >>\nstream\n  BT\n    /" .
         "F1 12 Tf\n    0 3 Td\n    ($text) Tj\n  ET\nendstream\nendobj\n\nxr" .
         "ef\n0 5\n0000000000 65535 f \n0000000018 00000 n \n0000000077 00000" .
         " n \n$xr3 00000 n \n$xr4 00000 n \ntrailer\n  <<  /Root 1 0 R\n    " .
         "  /Size 5\n  >>\nstartxref\n$xrstart\n%%EOF";
}

?>

Note: The txt2pdf() function is based on a minimal PDF file made by Brendan Zagaeski.

r3mainer

Posted 2013-12-29T12:03:22.877

Reputation: 19 135

Where's the troll? – Nacib Neme – 2014-04-21T01:13:15.790

5

On UNIX systems:

mv document.docx document.pdf && cowsay "code-trolling is cool"

On Windows:

ren document.docx document.pdf

s3lph

Posted 2013-12-29T12:03:22.877

Reputation: 1 598

3note: won't work of course... Just found it funny – s3lph – 2013-12-29T14:27:45.323

4

I believe this shell script to be a simple and intuitive method of solving the problem. Is there a better way?

( echo $'<svg>\n<text y="10">';
  unzip -p ./YOUR_FILENAME_HERE.docx word/document.xml |
  sed -e 's/<[^>]\{1,\}>//g; s/[^[:print:]]\{1,\}//g';
  echo $'\n</text>\n</svg>' ) |
inkscape -f /dev/fd/0 -D -A ./OUTPUT_FILENAME_HERE.pdf

ymbirtt

Posted 2013-12-29T12:03:22.877

Reputation: 1 792

1" why does this swap to floppy? " ;) – hildred – 2013-12-29T15:06:02.953

2

This link will surely gonna help you. http://javahive.in/convert-word-doc-to-pdf-file-in-java/ you just have to run this java code and your pdf will be there for you.

Ankush

Posted 2013-12-29T12:03:22.877

Reputation: 359

Would have been funnier if it was just the link. – Mr Lister – 2014-01-11T07:30:38.247

0

Windows Batch

The easiest way to convert a file: change the extension!

:: convert.cmd

xcopy "%~dpnx0" "%~dpn0.pdf"

Spoiler/troll: (hover below to see)

Oops...did I forget that you could convert even a file with a .exe extension? So much for that... ;) Also, I'm too lazy to code the guards.
And I thought I'd add a little extra troll in this: it doesn't even touch the data inside... (doesn't parse it to make it a valid PDF)

Isiah Meadows

Posted 2013-12-29T12:03:22.877

Reputation: 1 546