Questions tagged [hadoop]

Hadoop is an open-source solution for providing a distributed/replicated file system, a produciton grade map-reduce system, and has a series of complementary additions like Hive, Pig, and HBase to get more out of a Hadoop-powered cluster.

Hadoop is an Apache foundation sponsored project, with commercial support provided by multiple vendors, including Cloudera, Hortonworks, and MapR. Apache has a more complete set of commercial solutions documented.

Available complementary additions to Hadoop include:

Hadoop distributed filesystem ( standard )
The map-reduce architecture ( standard )
Hive, which provides a SQL like interface to the M/R arch
Hbase, a distributed key-value service

Recommended reference sources:

Hive Language Reference

262 questions

votes

3 answers

Hadoop - /usr/bin/hadoop: line 320: /usr/bin/java/bin/java: Not a directory

I am installing Hadoop on CentOS 6.4. Following these instructions http://hadoop.apache.org/docs/stable/single_node_setup.html wget http://apache.osuosl.org/hadoop/common/hadoop-1.1.2/hadoop-1.1.2-1.x86_64.rpm chmod 700 hadoop-1.1.2-1.x86_64.rpm rpm…

hadoop

asked Jun 14 '13 at 23:19

davidjhp

votes

2 answers

How does hadoop decide what its nodes hostnames are?

Currently the urls generated by the jobtracker & namenode return either hostnames like bubbles.local or just bubbles. These end up not resolving unless the client machine has specified these in their /etc/hosts file. When I run the hostname command…

centos hadoop

asked Sep 04 '12 at 17:33

Dan R

2,275
1
19
27

votes

3 answers

Hadoop ecosystem web dashboard

I am trying to find a tool, which would show me an overview of my Hadoop ecosystem - state, health, running tasks, etc. I tried to Google, but did not find any. Is there some nice useful tool?

hadoop

asked Aug 24 '12 at 06:26

Vojtech

votes

1 answer

starting hadoop on mac os lion

I want to start hadoop on my macbook pro, I did all the steps that apache says. When I use the command "bin/start-all.sh", I get this: starting namenode, logging to…

mac-osx hadoop

asked Oct 07 '11 at 22:33

AliBZ

votes

2 answers

Is there a way to get a list of Hadoop cluster machines from one of the data nodes?

I have access to a data node in a Hadoop cluster, and I'd like to find out the identity of the name nodes for the same cluster. Is there a way to do this?

cluster hadoop discovery

asked Mar 08 '11 at 17:33

Yuval

votes

1 answer

Best practice for administering a (hadoop) cluster

I've recently been playing with Hadoop. I have a six node cluster up and running - with HDFS, and having run a number of MapRed jobs. So far, so good. However I'm now looking to do this more systematically and with a larger number of nodes. Our base…

hadoop mapreduce

asked Mar 08 '11 at 07:23

Alex

votes

4 answers

Interpreting exim log files after parsing

I'm parsing exim log files and, due to my processing method, lose the original order of all entries in this file. I rebuild the transactions by their transaction ID (i.e. 1OfiYX-0000Ev-7k) but still don't have a way to determine the original…

exim hadoop

asked Aug 05 '10 at 06:44

gnucom

votes

1 answer

Hadoop slaves file necessary?

I'm working on a team trying to create a system for creating Hadoop clusters on EC2 with minimal effort on the part of the user. Ideally, we would like slave instances to only require the hostname of the master instance as user data on boot. The…

amazon-ec2 hadoop master-slave

asked Feb 21 '10 at 18:11

Tim Yates

votes

0 answers

Yarn error: Failed to create Spark client for Spark session

I'm a bit new to this and have little experience, would appreciate your help. I'm trying to install Hive on an existing Spark installation. I mostly followed the instructions in this page with no…

hadoop

asked Jul 03 '19 at 12:23

hnagaty

votes

0 answers

Hadoop Streaming with Python 3.5: java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 127

I'm trying to run my own mapper and reducer Python scripts using Hadoop Streaming on my cluster built on VMware Workstation VMs. Hadoop version - 2.7, Python - 3.5, OS - CentOS 7.2 on all the VMs. I have a separate machine which plays a role of a…

python hadoop streaming mapreduce

asked Oct 08 '16 at 05:28

alex

votes

0 answers

AWS-Hadoop Data Analytics Implementation for Multiple JSON Files

I am new to hadoop and AWS. I have setup multi-node (4 instances t2.large) AWS EC2 cluster with cloudera Hadoop distribution. I have tested the environment with basic examples using CSV files such as word count. Now, my main project is to analyze…

amazon-web-services hadoop json

asked Oct 05 '16 at 15:18

Rash

votes

1 answer

Running HDFS with only 1 data node - appending fails

I'm trying to test some services that require HDFS using Docker Compose. Since the services being tested, namenode, and data node(s) will all be running on the same physical machine (dev laptop), it would be nice to reduce the memory usage by only…

docker hadoop hdfs

asked Oct 04 '16 at 17:34

Robert Fraser

votes

1 answer

Possible to ssh into a server without using -i flag for key?

I have 3 EC2 instances and they all use the same private key. I'm setting up a hadoop cluster between these nodes and they require passwordless entry for this to work. How can I use this private key to easily ssh into the servers with keyless entry?…

ssh amazon-ec2 ssh-keys hadoop

asked Sep 26 '16 at 17:03

coderkid

votes

2 answers

How to remove RAID option from HP DL360 Gen 9 for HDFS

I am setting up a brand new DL360 G9 Server for use in a Hadoop cluster proof-of-concept. As HDFS will be taking care of the RAID, I need to bypass this option in the G9 array controller (Smart Array P440ar). I just can't find where to do that - IF…

raid redhat hp hp-proliant hadoop

asked Oct 05 '15 at 16:06

Sketch

votes

1 answer

Should I deploy hadoop on physical machines or virtual machines?

We will deploy a hadoop cluster on hundreds(say 300) of physical x86 nodes. Since we have no much production deployment experience, there is a simple question as the title we want to hear response from experienced guys. What are the best practics?…

virtual-machines cloud hadoop

asked Apr 23 '15 at 08:54

John Wang

Prev 1 2

…

17 18 Next