Highest Voted 'mapreduce' Questions - Server Fault Stack Exchange

4

votes

2 answers

hadoop-config.sh in bin/ and libexec/

While setting up hadoop, I found that hadoop-config.sh script is present in two directories, bin/ and libexec/. Both the files are identical. While looking onto scripts, I found that if hadoop-config.sh is present in libexec, then it gets executed.…

hadoop mapreduce

asked Jul 03 '14 at 17:31

krackoder

151
1
3

4

votes

1 answer

How do I define the timeout for bootstrap actions on Amazon's Elastic MapReduce?

How do I change the timeout for bootstrap actions on Amazon's Elastic MapReduce?

amazon-ec2 mapreduce

asked Nov 13 '11 at 21:49

user76542

3

votes

1 answer

Best practice for administering a (hadoop) cluster

I've recently been playing with Hadoop. I have a six node cluster up and running - with HDFS, and having run a number of MapRed jobs. So far, so good. However I'm now looking to do this more systematically and with a larger number of nodes. Our base…

hadoop mapreduce

asked Mar 08 '11 at 07:23

Alex

2

votes

0 answers

Hadoop Streaming with Python 3.5: java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 127

I'm trying to run my own mapper and reducer Python scripts using Hadoop Streaming on my cluster built on VMware Workstation VMs. Hadoop version - 2.7, Python - 3.5, OS - CentOS 7.2 on all the VMs. I have a separate machine which plays a role of a…

python hadoop streaming mapreduce

asked Oct 08 '16 at 05:28

alex

21
1
3

1

vote

0 answers

Sample output of Rumen or Input to Gridmix

I want to see JobHistory logs, which can be fed as input to the Rumen. More specifically, I am interested in knowing input format for the Gridmix. I tried following two things for it: 1) I found this files: . What is this file exactly? Is this…

hadoop hdfs mapreduce

asked Apr 06 '18 at 20:24

PHcoDer

111
2

1

vote

1 answer

Hadoop FileAlreadyExistsException: Output directory hdfs://:9000/input already exists

I have Hadoop setup in fully distributed mode with one master and 3 slaves. I am trying to execute a jar file named Tasks.jar which is taking arg[0] as input directory and arg[1] as output directory. In my hadoop environment, I have the input files…

ubuntu hadoop mapreduce

asked Oct 14 '16 at 02:37

Harinarayanan Mohan

11
1
3

1

vote

2 answers

Updating group without log out or subshell

I'm trying to run Docker on Elastic MapReduce streaming but am having trouble with a permissions issue. In my bootstrap script, I need the "hadoop" user to be part of the "docker" group (as described on the AWS Docker Basics page): sudo usermod -a…

amazon-web-services docker mapreduce

asked Mar 10 '16 at 04:29

Max

111
2

1

vote

1 answer

MapReduce job is hung after 1 of 5 reducers completed on single-node environment

I have only one Data Node on my dev environment on EC2. I ran heavy MR job and in 6 hours noticed that 100% of mappers and 20% of reducers finished (1 of reducer shows 100% competition, other ones - 0%). Looks like job is hung between 2 reducer…

hadoop mapreduce

asked Nov 09 '12 at 17:21

Marboni

111
4

1

vote

0 answers

How to increase the performance on Amazon Elastic Mapreduce for job execution?

My task is: Initially I want to import the data from MS SQL Server into HDFS using SQOOP. Through Hive I am processing the data and generating the result in one table That result containing table from Hive is again exported to MS SQL SERVER…

amazon-web-services mapreduce

asked May 07 '12 at 12:08

Bhavesh Shah

111
2

1

vote

3 answers

Hadoop Rolling Small files

I am running Hadoop on a project and need a suggestion. Generally by default Hadoop has a "block size" of around 64mb.. There is also a suggestion to not use many/small files.. I am currently having very very very small files being put into HDFS due…

linux hadoop apache-2.2 mapreduce

asked Nov 16 '10 at 03:03

Arenstar

3,592
2
24
34

0

votes

1 answer

How to view status of recent AppEngine mapreduce jobs?

We recently upgraded our App Engine application to GAE SDK 1.9, and upgraded the older MapReduce library we'd been using to the most recent version hosted on GitHub. We now find that the old MapReduce status page…

google-cloud-platform google-app-engine mapreduce

asked Jun 05 '15 at 14:48

JP Lodine

101
1

0

votes

0 answers

Distributing Master node ssh key

For the master node to passwordless-ly ssh into the slaves, the master needs to distribute its ssh key to the slaves. Copying key using ssh-copy-id asks for the user password. If there are hundreds of nodes in the system, it may not be a good idea…

linux ssh shell hadoop mapreduce

asked Jul 16 '14 at 01:08

krackoder

151
1
3

0

votes

1 answer

MongoDB Locking - Very, very, slow to read

This is the output from db.currentOp(): > db.currentOp() { "inprog" : [ { "opid" : 2153, "active" : false, "op" : "update", "ns" : "", "query" : { "name" :…

mongodb database-performance mapreduce

asked Apr 05 '13 at 10:19

StuR

167
2
10

Questions tagged [mapreduce]