17

I am often dealing with incredibly large log files (>3 GB). I've noticed the performance of less is terrible with these files. Often I want to jump do the middle of the file, but when I tell less to jump forward 15 M lines it takes minutes..

The problem I imagine is that less needs to scan the file for '\n' characters, but that takes too long.

Is there a way to make it just seek to an explicit offset? e.g. seek to byte offset 1.5 billion in the file. This operation should be orders of magnitude faster. If less does not provide such an ability, is there another tool that does?

UsAaR33
  • 1,036
  • 3
  • 11
  • 20
  • if you're skimming the file for forbidden characters, is it a fair assumption that you will purge the aforementioned characters after finding them? If so, may I offer `perl -pi -e 's/\n//g;' ` – Mike Pennington Jul 26 '12 at 00:43
  • Sorry, skim was the wrong word. Should have used scan. less by design scans for newline (\n). This scanning takes a very long time on large files. – UsAaR33 Jul 26 '12 at 06:42

3 Answers3

24

you can stop less from counting lines like this less -n

To jump to a specific place like say 50% in, less -n +50p /some/log This was instant for me on a 1.5GB log file.

Edit: For a specific byte offset: less -n +500000000P ./blah.log

Sekenre
  • 2,913
  • 1
  • 18
  • 17
  • 1
    Line counting was never the issue; I could just use escp/ctrl-c for that. But this is the actual answer; P jumps to a specific byte offset! – UsAaR33 Jul 26 '12 at 19:51
5

Less, being a pager, is inherently line-oriented. When you startup, if it's a large file it'll say "counting line numbers" and you hit ESC to stop that, but otherwise, it does lines. It's what it does.

If you want to jump straight into the middle of file and skip the beginning, you can always just seek past the beginning; I'd do something like tail -c +15000000 /some/log | less.

womble
  • 95,029
  • 29
  • 173
  • 228
0

less seems to have a small overhead from the locale settings

If you're using ASCII only characters, you can speed it up a bit by using:

LC_ALL=C less big-log-file.log

In my case, the throughput increased from ~ 30M ib/s to ~ 50 Mib/s (rate is CPU bound)

Romuald Brunet
  • 171
  • 1
  • 4