How can I batch process HTML files to prepare them for printing?

1

1

I have a collection of one thousand HTML files I need to clean up for printing. I need to delete all the tags inside the <body></body> area except for one, <div.pg>. The excess are navigation links which make the printouts messy and use up paper. The contents of the tags are not the same, so I can't find and replace the code excerpt, but the tags are the same insofar as there are three <table> tags to be deleted, each with specific class. Is there any batch processing technique or software to do this job? I'm using Windows

z403

Posted 2011-09-27T20:44:01.680

Reputation: 55

1Write a python script for this. – James T Snell – 2011-09-27T20:45:00.017

What operating system are you using? – bryan – 2011-09-27T21:16:35.477

I'm on Windows XP – z403 – 2011-09-27T21:17:57.147

Are these files you've created or just general pages on the internet? – ChrisF – 2011-09-28T20:21:58.150

Write a Perl script for this. – Randolf Richardson – 2011-09-29T10:45:58.963

Answers

9

one thousand HTML files … make them clean to be printed.

An easy solution for suppressing sections when printing is to use a CSS stylesheet

Add something like this to the head element

<link rel="stylesheet" 
   type="text/css"
   media="print" href="print.css" />

Note the media="print" - this stylesheet only applies when printing, not when viewing.

If your HTML is all formatted in a similar way, you might do this for thousands of html files in a single command with a simplistic pattern matching edit

perl -i -ne "print; print '<link … />' if /<head>/" dir1/*.html dir2/*.html

In print.css, set display:none for elements (e.g. whole divs) you don't want printed. For example

#menu { display: none; }

See A List Apart article


Update: If your thousand html files are already using a common CSS stylesheet file, the solution is even easier, no need to change or add anything in the html files, just add a section to the existing stylesheet to handle printing. For example:

@media screen
  {
  #menu {font-family:verdana,sans-serif;font-size:14px;}
  }
@media print
  {
  #menu {display:none;}
  }

RedGrittyBrick

Posted 2011-09-27T20:44:01.680

Reputation: 70 632

1Yeah, the only way to solve this is with any scripting language or tool (like sed.exe or awk.exe) supporting real regular expressions OR with a XML library such as a Beanshell script running XMLUNIT . – djangofan – 2011-09-27T22:17:41.320

3+1 for solving the real problem, not giving them what they think is the solution! – Arjang – 2011-09-27T22:19:19.053

2

Use Notepad++. You're able to do a find/replace text across multiple files.

kobaltz

Posted 2011-09-27T20:44:01.680

Reputation: 14 361