Questions tagged [hdfs]

For questions regarding the Hadoop distributed file system (HDFS) which is part of the Apache Hadoop project.

73 questions
13
votes
4 answers

In Hadoop, how to show current process of -copyFromLocal

I am still a newbie learner of Hadoop, and this time I was trying to process a 106GB file. I used -copyFromLocal to copy that big file to my Hadoop DFS, but since the file is big I have to wait for a long time without a clue about the current…
Bang Dao
  • 233
  • 2
  • 6
7
votes
2 answers

HBASE Space Used Started Climbing Rapidly

Update 4,215: After looking at space usage inside of hdfs, I see that .oldlogs is using a lot of space: 1485820612766 /hbase/.oldlogs So new questions: What is it? How do I clean it up? How do I keep it from growing again What caused it to…
Kyle Brandt
  • 82,107
  • 71
  • 302
  • 444
7
votes
2 answers

Hadoop HDFS Backup & DR Strategy

We are preparing to implement our first Hadoop cluster. As such we are starting out small with a four node setup. (1 master node, and 3 worker nodes) Each node will have 6TB of storage. (6 x 1TB disks) We went with a SuperMicro 4-node chassis so…
Matt Keller
  • 221
  • 4
  • 7
6
votes
2 answers

Hadoop HDFS: set file block size from commandline?

I need to set the block-size of a file when I load it into HDFS, to some value lower than the cluster block size. For example, if HDFS is using 64mb blocks, I may want a large file to be copied in with 32mb blocks. I've done this before within a…
BigChief
  • 398
  • 1
  • 2
  • 12
5
votes
1 answer

Forward-sync to HDFS? (OR continue an incomplete hdfs upload?)

Anyone have a good suggestion for doing a forward sync to HDFS? ("forward-sync" in contrast to "bi-directional sync") Basically I have a large number of files I want to put into the HDFS. Its so large that I'll often, say, lose connectivity before…
Nate Murray
  • 973
  • 1
  • 7
  • 7
5
votes
2 answers

How to fix Hadoop HDFS cluster with missing blocks after one node was reinstalled?

I have a 5 slave Hadoop cluster (using CDH4)---slaves are where DataNode and TaskNode run. Each slave has 4 partitions dedicated to HDFS storage. One of the slaves needed a reinstall and this caused one of the HDFS partitions to be lost. At this…
Dolan Antenucci
  • 329
  • 1
  • 4
  • 16
5
votes
1 answer

Ceph: Why is a greater number of "placement groups" a "bad thing"?

I have been researching distributed databases and file systems, and while I was originally mostly interested in Hadoop/HBase because I'm a Java programmer, I found this very interesting document about Ceph, which as a major plus point, is now…
monster
  • 608
  • 2
  • 10
  • 17
4
votes
1 answer

mount.nfs: mount system call failed

I am trying to mount hdfs on my local machine running Ubuntu using the following command :--- sudo mount -t nfs -o vers=3,proto=tcp,nolock 192.168.170.52:/ /mnt/hdfs_mount/ But I am getting this error:- mount.nfs: mount system call failed Output…
Bhavya Jain
  • 141
  • 1
  • 1
  • 3
4
votes
2 answers

Upload large files with curl without RAM cache.

I'm using curl to upload large files (from 5 to 20Gb) to HOOP based on HDFS (Hadoop Cluster) as follows: curl -f --data-binary "@$file" "$HOOP_HOST$UPLOAD_PATH?user.name=$HOOP_USER&op=create" But when curl uploading large file it trying to fully…
Gening D.
  • 81
  • 1
  • 5
4
votes
3 answers

Is there a way to grep gzipped content in hdfs without extracting it?

I'm looking for a way to zgrep hdfs files something like: hadoop fs -zcat hdfs://myfile.gz | grep "hi" or hadoop fs -cat hdfs://myfile.gz | zgrep "hi" it does not really work for me is there anyway to achieve that with command line?
Jas
  • 701
  • 4
  • 13
  • 23
4
votes
0 answers

java.lang.NullPointerException When Doing A Read in HDFS

I have had a 10 node HBase cluster up and running for the past 4 months. The cluster was setup on VMs in a corporate environment which I do not control, but everything has been working great...until today. Today, every part of the system was down. I…
JasCav
  • 233
  • 1
  • 12
4
votes
1 answer

Can't connect to HDFS in pseudo-distributed mode

I followed the instructions here for installing hadoop in pseudo-distributed mode. However, I'm having trouble connecting to HDFS. When I execute this command : ./hadoop fs -ls / I get a directory listing just like I should. However, when I execute…
sangfroid
  • 193
  • 1
  • 3
  • 10
4
votes
3 answers

What is meant by "streaming data access" in HDFS?

According to the HDFS Architecture page HDFS was designed for "streaming data access". I'm not sure what that means exactly, but would guess it means an operation like seek is either disabled or has sub-optimal performance. Would this be…
Van Gale
  • 472
  • 1
  • 5
  • 10
3
votes
0 answers

How can I launch hdfs on Mesos without DC/OS?

From my understand DC/OS is a freemium managed service. Because I'd rather just have a raw Mesos implementation, I'd rather not be dependent on DC/OS and so I just want to know how to implement HDFS on Mesos without it. Unfortunately google is…
Dr.Knowitall
  • 209
  • 1
  • 10
3
votes
1 answer

Linux Network tuning to prevent tcp rcvpruned and backlogdrop?

My datanodes in my hbase cluster are triggering some tcp rcvpruned and backlog drops from time to time: It seems to be there are at least two angles to approach this at: Tune HBase/HDFS etc... so that these are not triggered Tune the Linux network…
Kyle Brandt
  • 82,107
  • 71
  • 302
  • 444
1
2 3 4 5