3

I'm trying to write a one-liner to convert html entities present in some files (all html with UTF-8 encoding)

I've tried recode HTML_4.0 file.htm but that also converts non-ASCII characters (it breaks the UTF-8 characters)

In StackOverflow I found something that works for one file:

php -r '$f=@fopen("file.htm", "r");echo html_entity_decode(fread($f, 20000));fclose($f);'

but when I try to make it for multiple files with

for fi in *.htm; do php -r '$f=@fopen("$fi", "r");echo html_entity_decode(fread($f, 20000));fclose($f);';done

I know the problem here is how to "escape" $fi (bash variable) so PHP doesn't read it as a PHP variable. Any advice?

Tom O'Connor
  • 27,440
  • 10
  • 72
  • 148
Diego Shevek
  • 133
  • 5
  • 1
    possible duplicate of [How can I easily convert HTML special entities from a standard input stream in Linux?](http://serverfault.com/questions/440805/how-can-i-easily-convert-html-special-entities-from-a-standard-input-stream-in-l) – Michael Hampton May 27 '13 at 16:20

2 Answers2

3

You're very nearly there.

And as it happens, the question isn't about how to do it, it's actually about how to get bash to interpret variables and pass them on.

You've got:

for fi in *.htm; do php -r '$f=@fopen("$fi", "r");echo html_entity_decode(fread($f, 20000));fclose($f);';done

Which has single quotes, with double quotes inside.

Single quotes aren't interpolated inside by bash, but double quotes are.. so if you change it to:

for fi in *.htm; do php -r "\$f=@fopen(\"$fi\",'r');echo html_entity_decode(fread(\$f, 20000));fclose(\$f);"; done

Because we're now using double quotes, we also have to escape the $ signs, otherwise bash thinks they're interpolatable variables, and replaces them out.

Tom O'Connor
  • 27,440
  • 10
  • 72
  • 148
1

With bash, (single) quotes ' are used to prevent parameter expansion (variables). So you should inverse single and double quotes usage in your bash call to php, and add escaped doubles quotes for the php command arg:

for fi in *.htm; do fi=\"$fi\"; php -r "\$f=@fopen($fi, 'r');echo html_entity_decode(fread(\$f, 20000));fclose(\$f);"; done

Or, simpler, based on Michael Hampton answer, just do:

for fi in *.htm; do cat $fi | php -R 'echo html_entity_decode($argn);'; done
Læti
  • 2,075
  • 21
  • 33