Questions tagged [hadoop]

Hadoop is an open-source solution for providing a distributed/replicated file system, a produciton grade map-reduce system, and has a series of complementary additions like Hive, Pig, and HBase to get more out of a Hadoop-powered cluster.

Hadoop is an Apache foundation sponsored project, with commercial support provided by multiple vendors, including Cloudera, Hortonworks, and MapR. Apache has a more complete set of commercial solutions documented.

Available complementary additions to Hadoop include:

  • Hadoop distributed filesystem ( standard )
  • The map-reduce architecture ( standard )
  • Hive, which provides a SQL like interface to the M/R arch
  • Hbase, a distributed key-value service

Recommended reference sources:

262 questions
2
votes
0 answers

Hbase block locality index is always 0

I have a Hbase (v 0.94.19 with Hadoop 1.2.1) setup with one master machine and two region servers. Each region server has 16 GB heap (6.4 GB cache, 4.0 GB memstore) and 1.6 TB (2 X 800 GB) SSD disk space. There is only one table with single…
tilmik
  • 135
  • 1
  • 8
2
votes
2 answers

Unable to convert HDFS from non-HA to HA

Introduction Aim: Convert HDFS from non-HA to HA. Method: According to this documentation it should be possible to convert HFDS from non- to HA by implementing following configuration: /etc/hadoop/conf/hdfs-site.xml
030
  • 5,731
  • 12
  • 61
  • 107
2
votes
1 answer

Cloudera Hadoop superuser group

I'm trying to create a group on one of my datanodes that will have superuser privileges for hdfs and associated fs commands. So far I have: Checked to see that dfs.permissions.superusergroup=supergroup (default) Created a local group on the…
CJONES
  • 317
  • 2
  • 11
2
votes
0 answers

Encounter an error when configuer secure hadoop : org.apache.hadoop.security.AccessControlException

I try to configure secure hadoop with kerberos. I have started KDC server , generated and copy related keytab to corresponding node. kerberos can work normally (use kinit) but when i try to start namenode , I encouter a weird error . I have…
xiaoxiao
  • 21
  • 2
2
votes
4 answers

HDFS datanode startup fails when disks are full

Our HDFS cluster is only 90% full but some datanodes have some disks that are 100% full. That means when we mass reboot the entire cluster some datanodes completely fail to start with a message like this: 2013-10-26 03:58:27,295 ERROR…
mbac
  • 21
  • 1
  • 2
2
votes
2 answers

Any good method for mounting Hadoop HDFS from another system?

I want to mount the Cloudera Hadoop as a Linux file system over the LAN. As a setup, I already have the hadoop cluster running on a set of Ubuntu machines. But now I need to be able to use it as a normal file system from a Fedora system over the…
Beel
  • 149
  • 1
  • 9
2
votes
1 answer

Hadoop - What is the purpose of the /usr/sbin/ shell scripts?

I am installing Hadoop 1.1.2 on CentOS 6.4. I read all the Hadoop documentation at http://hadoop.apache.org/docs/stable/ After installing, I noticed there are many shell scripts at /usr/sbin/. But the documentation does not explain what most of…
davidjhp
  • 630
  • 2
  • 7
  • 13
2
votes
1 answer

Hadoop hdfs namenode is throwing an error

Full list of error: hb@localhost:/etc/hadoop/conf$ sudo service hadoop-hdfs-namenode start * Starting Hadoop namenode: starting namenode, logging to /var/log/hadoop-hdfs/hadoop-hdfs-namenode-localhost.out 12/09/10 14:41:09 INFO namenode.NameNode:…
Keval Domadia
  • 587
  • 5
  • 14
2
votes
2 answers

Deleting temp directory from HDFS

Is there a smart way of deleting old files from the hdfs /tmp directory? (Just to make sure, I am not talking about the unix FS /tmp)
Istvan
  • 2,562
  • 3
  • 20
  • 28
2
votes
1 answer

Raspberry Pi based Hadoop cluster

Is it at least possible to build Hadoop cluster from Raspberry Pi-based nodes? Can such a cluster meet hardware requirements of Hadoop? And if so, how much Raspberry Pi nodes are required to meet requirements? I understand that a cluster from…
Dmitriy Sukharev
  • 233
  • 1
  • 4
  • 9
2
votes
2 answers

Hadoop DataNode is giving me an incompatible namespace ID

When I run the start-all.sh script from my master node, some of my DataNodes fail to start; the log file reports a Java IOException: Incompatible Namespace IDs in /tmp/$MY_USER_NAME.
ILikeFood
  • 399
  • 1
  • 5
  • 12
2
votes
1 answer

Setting up hive with hadoop

I am trying to set up the hive. I am using this guide: https://cwiki.apache.org/Hive/gettingstarted.html and I'm stuck at setting up /tmp and /user/hive/warehouse dirs. First of all, it seems to me a little bit strange that hive requires to change…
Ancymon
  • 121
  • 2
2
votes
1 answer

Most secure way to issue commands on ubuntu cluster with sudo'ing user?

This is sort of a follow-up question to an unanswered question I have regarding administration of Cloudera cluster, but I figure generalizing the question to all of Ubuntu may help me get an answer. I want to be able to start/stop the same service…
Dolan Antenucci
  • 329
  • 1
  • 4
  • 16
2
votes
1 answer

Does changing the default HDFS replication factor from 3 affect mapper performance?

Have a HDFS/Hadoop cluster setup and am looking into tuning. I wonder if changing the default HDFS replication factor (default:3) to something bigger will improve mapper performance, at the obvious expense of increasing disk storage used? My…
liamf
  • 372
  • 4
  • 10
2
votes
3 answers

Hadoop moving data to another user

I have a few hundreds of GB in my hdfs for userA (single node configuration). I would like to transfer all that data to userB wich will be more appropriate for the multi-node configuration I'm setting up. I tried the following without…
millebii
  • 161
  • 8