Questions tagged [hdfs]

For questions regarding the Hadoop distributed file system (HDFS) which is part of the Apache Hadoop project.

73 questions
1
vote
0 answers

Sample output of Rumen or Input to Gridmix

I want to see JobHistory logs, which can be fed as input to the Rumen. More specifically, I am interested in knowing input format for the Gridmix. I tried following two things for it: 1) I found this files: . What is this file exactly? Is this…
PHcoDer
  • 111
  • 2
1
vote
0 answers

ambari cluster + when need to set Block replication to 1

We get the following in Spark logs: java.io.IOException: Failed to replace a bad datanode on the existing pipeline due to no more good datanodes being available to try. (Nodes: current=[DatanodeInfoWithStorage DatanodeInfoWithStorage\ The current…
shalom
  • 451
  • 12
  • 26
1
vote
0 answers

How to prevent arbitrary executable execution on hadoop cluster

I am involved with configuring a Hadoop cluster for complete auditability and security. I am new to the Hadoop ecosystem, but I have a decent idea of the basics. I have a few concerns for which I hope someone might be able to point me in the right…
STN
  • 111
  • 1
1
vote
0 answers

Need to set 000 permission to specific hdfs data block through commandline

I am trying to set the “000” permission to the specific block. I used below command to find the block information: su - hdfs -c "hdfs fsck -locations -files -blocks /user/rohit/partition_filter_table/india.25.20.101.95000" Now, I want to set…
1
vote
0 answers

Is it possible to configure hdfs in a federation mode and in an HA mode in the same time?

I don't understand if it possible to configure HDFS in both modes in the same time. Does it make sense? Can somebody show a simple configuration of HDFS in both modes? (nameNode1, nameNode2, nameNodeStandby1, nameNodeStandby2)
1
vote
0 answers

Zombie process blocking port when restarting Hadoop (Secondary) Namenode

I'm having weird issues with the Hadoop Namenode and Secondary Namenode. Our HDFS cluster runs smoothly most of the time. But every now and then, either the Primary Namenode freezes (crashing the whole cluster) or the Secondary Namenode freezes and…
1
vote
0 answers

HDFS performances on apache spark

I have several issues related to HDFS, that may have different roots. I'm posting as much information as I can, with the hope that I can get your opinion on at least some of them. Basically the cases are: HDFS classes not found Connections with…
Bacon
  • 123
  • 7
1
vote
1 answer

Viewing NxN network traffic in a distributed system

I have a Hadoop cluster set up, and I would like to view the network usage between all nodes in this cluster, i.e. if I have N nodes, I want to see NxN network connections so that I can view all traffic between all nodes. I am running the cluster on…
jcm
  • 233
  • 3
  • 7
1
vote
0 answers

How to force HDFS to use LDAP user's UID

I have a cloudera cluster with HDFS and Hue services and I'm trying to unify the authentication using LDAP. I have my LDAP server running thanks to 389-ds (not sure if is the best way) and I can log into Hue with users from the LDAP server. When I…
Carlos Vega
  • 109
  • 2
  • 3
  • 10
1
vote
1 answer

Rhadoop hdfs.init() Error

I recently installed CDH5.1.0 along with R 3.1.*, and I got rmr2, rJava, and rhdfs all installed properly. (along with the required packages and set the required environment variables) After some trouble with installing rhdfs I add this to my…
user306603
  • 11
  • 2
1
vote
0 answers

When and how are initial directories created in HDFS

I have a Hadoop setup in which the configured HDFS umask is 027 instead of the default one. Some of the initially created directories have correct permissions (like tmp drwxrwxrwx) but others such as /home are not usable (drwxr-x---). As I'm…
sortega
  • 111
  • 2
1
vote
1 answer

Additional Storage Options for Hadoop HDFS Nodes

We have a small production Cloudera distribution Hadoop cluster(14 nodes, but growing). As we have expanded our usage of this cluster we have found that disk storage is our biggest blocker and requirement. RAM and CPU usage are minimal with our…
Geek42
  • 11
  • 1
1
vote
2 answers

Change existing linux user's uid in LDAP but hdfs seems does not recognize it

I have set up a hadoop 1.2.1 environment on centos 6. I also use nfs-proxy that mount ndfs to local file system so that I can access the files inside hdfs locally. It works perfectly until today, I was asked to integrate user authentication with…
user1817188
  • 183
  • 1
  • 8
1
vote
1 answer

Data lost after Hdfs client was killed

I wrote a simple tool to upload logs to HDFS. And I found some curious phenomenon. If I run the tool in foreground and close it with "Ctrl - C", there will be some data in HDFS. If I run the tool in background and kill the process with "kill -KILL…
Evans Y.
  • 111
  • 3
1
vote
0 answers

Hadoop commands are taking a very long time to return

I am logged in (via SSH) to the NameNode of my Hadoop cluster; the problem I am having is that any hadoop fs commands, even simple ones like hadoop fs -ls are completed quickly, but take many minutes to return control of the shell to the user. For…
ILikeFood
  • 399
  • 1
  • 5
  • 12