10
I've been using GNU SED on and off for a couple of years now. It spins me out a bit sometimes, but it does a good job... for single-byte char sets!
I now and then notice references to GNU SED being Unicode-aware, but the closest I've seen of this is its "binary" mode.. and binary is not Unicode.
Can GSED process a Unicode text file at CodePoint resolution, including and especially \r\n (Windows)... and if it can, does it expect UTF-8, UTF-16, or what? and how does SED detect the encoding?
1
Usually unicode is specified with the \uXXXX option. Try this japanease guy's compile http://sky.geocities.jp/hp_gabo200x/room_tool.html
– Mikhail – 2012-11-01T02:03:28.517