1

I wrote a simple tool to upload logs to HDFS. And I found some curious phenomenon.

If I run the tool in foreground and close it with "Ctrl - C", there will be some data in HDFS.

If I run the tool in background and kill the process with "kill -KILL pid", the data has been processed is lost and leaves an empty file in HDFS.


My tool has tried to do sync (by invoking SequenceFile.Writer.syncFs()) frequently (every 1000 lines).

And I just couldn't figure out why the data was lost. If my tool has run all day but the machine crashed suddenly, will all the data be lost?


My tool is used to collect logs from different servers and then upload to HDFS (aggregating all log to a single file every day).

Evans Y.
  • 111
  • 3

1 Answers1

0

You're really doing two fairly different tests, there. Ctrl-C delivers SIGINT to your program, but you're sending SIGKILL. I would expect different results between them -- for instance, POSIX states:

   The signals SIGKILL and SIGSTOP cannot be caught or ignored.

You could do an strace to see the effect of your syncFs() call. Does it actually call one of sync(), msync(), fsync(), fdatasync(), etc? Also, consider different implementations: can you close the file during inactivity/idle?

Brian Cain
  • 299
  • 3
  • 7
  • What Brian Cain said makes sense. When you hit ^C, the HDFS client knows the system wants it to stop, so it has time to do some cleanup before voluntarily terminating its process. Now SIGKILL comes without mercy and kills the process right way. I don't think the `strace` test will help much because we are talking of a cluster-based file system. Namenode and Datanodes must be in agreement they have the data. `syncFs` on the client should force the NameNode to have this agreement. Close will certainly help as it implicitly cause the sync on the cluster. –  Sep 25 '12 at 04:35
  • Actually I'm using "kill -KILL pid" to mock the crash case. Just to see what would happen if the the SequenceFile.Writer cannot be closed properly. And it turns data would lost without an expliclit invoking "SequenceFile.Writer.close". So that means I have to roll the file (on HDFS) timely in case of data lost? –  Sep 25 '12 at 05:39
  • Are the 1000 lines enough to fit a HDFS block? –  Sep 25 '12 at 13:10