Use `less` pager on file with non-standard encoding

3

1

I often use the less pager to view logfiles. Usually I use less -F to follow the progress of the log à la tail.

However, some logfiles use national characters in a non-standard encoding (Latin-1, while the system uses UTF-8). Obviously, these will not be displayed correctly.

How can I view such files with less?

The only solutions I found:

  • Correct the encoding of the file (recode or iconv). This does not work while the file is still being written, so does not let me use less -F. Plus it destroys the logfiles original timestamp, which is bad from an auditing perspective.
  • Use a pipe (recode latin1... |less). Works for files in progress, but unfortunately then less -F does not appear to work (it just does not update; I believe the recode process exits once it's done).

Any solution that lets me "tail" a logfile and still shows national characters correctly?

sleske

Posted 2010-06-28T09:16:22.447

Reputation: 19 887

It looks from man less like there is a preprocessor which you could possibly set to fix your encoding. – isomorphismes – 2018-05-12T10:11:10.067

@isomorphismes: Yes, less does support calling a preprocessor. However, as far as I can tell, the preprocessor reads the input file and creates a new file for less, so this would not work for less -F. – sleske – 2018-05-12T20:12:56.397

Answers

3

Hm, apparently less cannot do this. The part in less' sourcecode that implements the "following" seems to be:

A_F_FOREVER:
                        /*
                         * Forward forever, ignoring EOF.
                         */
                        if (ch_getflags() & CH_HELPFILE)
                                break;
                        cmd_exec();
                        jump_forw();
                        ignore_eoi = 1;
                        while (!sigs)
                        {
                                make_display();
                                forward(1, 0, 0);
                        }
                        ignore_eoi = 0;

As far as my (limited) knowledge of C goes, this means that if "follow" is activated, less will:

  1. seek to the end of input
  2. read and update the display in a loop, until Ctrl-C is pressed

If input is a pipel, 1. will not return until the pipe signals EOF. If I use tail -f xx|less, the pipe will never signal EOF, so less hangs :-(.

I did however find a way to get what I want:

 tail -f inputfile | recode latin1.. > /tmp/tmpfile

then

less +F /tmp/tmpfile

This will work, because it lets less +F work on a real file. It's still somewhat awkward, because recode apparently only processes data in blocks of 4096 bytes, but it works...

sleske

Posted 2010-06-28T09:16:22.447

Reputation: 19 887

1

It's possible that recode is buffering output in the pipe so output only comes through when the buffer, probably 4K, is full. You can try using the unbuffer script that comes with expect.

Paused until further notice.

Posted 2010-06-28T09:16:22.447

Reputation: 86 075

No, that is not the problem. The recode process simply exits after it detects EOF for the file (after all, it has no way of knowing that the file is still growing); I can confirm this using ps. So unbuffer does not help. – sleske – 2010-06-29T10:23:41.683

@sleske: Have you tried tail -f | recode ... | less -F? – Paused until further notice. – 2010-06-29T13:36:24.833

@Dennis: Actually yes, I tried it, but it didn't help either. It seems less -F just plain does not work on pipes. Even tail -f myfile | less -F does not work, though in this case both processes remain alive. – sleske – 2010-06-30T07:59:15.380

Anyway, +1 for good hints. Even if they didn't work, it's good to know that :-). – sleske – 2010-06-30T08:00:08.313

1@sleske: By the way, it's less +F that follows files like tail -f (rather than less -F). After some testing, it looks like recode is doing some buffering that can't be controlled. This works, but the output is in chunks: tail -f inputfile | recode ... | less +F – Paused until further notice. – 2010-07-01T08:42:07.513

@Dennis: Interesting. Your example does not work for me: less just hangs with an emtpy screen, until I press Ctrl-C, then it shows its prompt, but no text. – sleske – 2010-07-01T21:07:00.713

To me it seems rather that less +F waits for EOF in its input before even showing a prompt. Since that never comes, it appears to hang. Just tail -f inputfile | less works, but it still hangs once I invoke Shift-F (or Shift-G). So it seems what I want just isn't possible with less... – sleske – 2010-07-01T21:12:02.810

@sleske: Try less in that pipeline without any options: tail -f inputfile | recode ... | less. Note: if your logfile is not getting much traffic, it could take a while before the buffer is full and anything is output. – Paused until further notice. – 2010-07-01T21:40:01.143

@Dennis: Yes, I tried that, and it does work, but it's not practical. It will show output, and gives me the less prompt once the first screenful of text has been printed, but scrolling to the end of text still makes less hang until enough fresh text has arrived; and Shift-F or Shift-G still hangs less permanently. So it seems less just can't do what I'd like to do... – sleske – 2010-07-04T20:31:59.480

0

Suggested reading: The section NATIONAL CHARACTER SETS in

Linux / Unix Command: less

harrymc

Posted 2010-06-28T09:16:22.447

Reputation: 306 093

that or 'env LC_ALL=en_US.LATIN1 less -F file' – akira – 2010-06-28T10:03:42.597

That does not solve my problem. This will cause less to accept Latin-1 characters as regular characters (meaning it does not highlight them), but they will still show up incorrectly in a terminal program that expects UTF-8 (as that's the system default). I want to actually convert the Latin-1 characters to valid UTF-8, not just have them show up as junk/box characters. – sleske – 2010-06-28T10:08:48.640

@sleske: I don't know of a way to convert and do less at the same time on dynamic files. One can define macros per akira's comment for the several possible encodings that you have. This is assuming that your problem is only the display and not pure conversion. – harrymc – 2010-06-28T10:55:52.643