4

I'm constantly dealing with a quite large amount of logs (growing at around 1Gb a day), and I manage them the old way, dumping the logs before they rotate to a central server and then storing on tape.

Now, because these logs can be requested by authorities, at some point I have to read them, find what they need and send back the interesting parts [I'm located in Italy]... Anyway dealing with that stuff has become quite difficult as the Volume of generated logs keeps growing, my tape storage as well and keeping track of the stuff is not so easy as it was some years ago.

I've tried already Graylog2 and it seems to be a very nice piece of software, the only issue on my way is that there is no easy method to export the logs to another storage, and import back when needed (maybe I understood wrongly the way it works).

Can someone provide me with examples of the process they use to manage such amount of logs or a solution to easily export the logs and import back when needed?

Thanks in advance

Martino Dino
  • 1,145
  • 1
  • 10
  • 17

1 Answers1

2

Personally, we use Graylog2 - our logs may not be nearly as large as yours, but it's a great way to store and manage logs.

What I'd do in your situation is note that Graylog2 uses ElasticSearch for log storage. Set Graylog2 to keep stuff live and searchable for as long as your servers can handle it. (You can set it to remove content older than X days.)

For archival purposes, every set interval (i.e. daily), run a script you've written that will export that interval's data out to cold storage. This should be as easy as a simple JSON query to ElasticSearch.

I'm not sure why you'd need to re-import it. It's all text, so you could search it as needed using the standard tools (i.e. grep), or you could write your own ElasticSearch importer.

BJT
  • 358
  • 2
  • 10
  • Because searching trough 50Gb of logs can be "difficult" in most situations. Sometimes the authorities make such requests that it may be difficult for us to locate and extract the info they need. We really like the elasticsearch approach, but keeping that amount of logs in warm storage makes it slow and eat all of the resources to build the indexes (it kills I/O and CPU)... – Martino Dino Jan 30 '13 at 04:17
  • After looking trough the docs, I found that writing my own importer is the only reasonable solution, thank you for pointing out that possibility. – Martino Dino Jan 31 '13 at 22:40
  • You're very welcome! If you like it, please mark this as your solution. Thanks! – BJT Feb 17 '13 at 22:24