13

I am still a newbie learner of Hadoop, and this time I was trying to process a 106GB file. I used -copyFromLocal to copy that big file to my Hadoop DFS, but since the file is big I have to wait for a long time without a clue about the current copying status.

Is there any way to show the current file copying status with this command?

Thank you guys in advance for your help!

Bang Dao
  • 233
  • 2
  • 6

4 Answers4

15

CopyFromLocal does not have the ability to display the file copy progress. Alternatively, you could open another shell and run the $ watch hadoop fs -ls <filenameyouarecopying>. This will display the file and its size once every 2.0 seconds.

Deer Hunter
  • 1,070
  • 7
  • 17
  • 25
datarockz2
  • 176
  • 1
  • 3
4

It is also possible to track the progress of reading of the local file using pv command and pipe the file content to hdfs dfs stdin:

pv mylargefile.txt | hdfs dfs -put - /path/to/file/on/hdfs/mylargefile.txt

1

It doesn't look like there's a verbose option to any of the copy commands (copyFromLocal, copyToLocal, get, put). Your best bet is probably to look at the size of the file at it's destination on HDFS in order to gauge it's progress.

Travis Campbell
  • 1,456
  • 7
  • 15
1

You can use "nohup &" to execute the copying as a background process. nohup will make the process to execute even after you log out of the server. When ever you need, you can check the process using "hadoop fs -ls .

Anan
  • 11
  • 1