Hadoop Mapreduce job doesn't start properly

in the last few days I've been playing with Hadoop, and I've been trying to implement the k-meters count (find the absolute frequencies of different words lengths) Everything is implemented following the MapReduce paradigm. The problem is: when I launch the job on a pseudo-distributed cluster (single node, running locally) the job runs perfectly and gives expected results. When I try to run it on a real cluster, it actually doesn't start and stops on the following line:

19/06/10 12:16:05 INFO client.RMProxy: Connecting to ResourceManager at node-master/172.16.14.201:8032
19/06/10 12:16:05 WARN mapreduce.JobResourceUploader: Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this.
19/06/10 12:16:06 INFO input.FileInputFormat: Total input files to process : 1
19/06/10 12:16:06 INFO mapreduce.JobSubmitter: number of splits:1
19/06/10 12:16:06 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1559917881770_0004
19/06/10 12:16:06 INFO impl.YarnClientImpl: Submitted application application_1559917881770_0004
**19/06/10 12:16:06 INFO mapreduce.Job: The url to track the job: http://node-master:8088/proxy/application_1559917881770_0004/**

The thing is, I can easily run the Wordcount example that's provided with Hadoop.

Here's the Mapper class:

package hadoop.wordcountz.mapper;
import java.io.IOException;
import java.util.StringTokenizer;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;

public class Mapper extends org.apache.hadoop.mapreduce.Mapper<LongWritable, Text, IntWritable, IntWritable>
{
    private final static IntWritable one = new IntWritable(1);
    private IntWritable len = new IntWritable(0);
    private Text word = new Text();
    public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException
    {
        String line = value.toString();
        StringTokenizer tokenizer =  new StringTokenizer(line);
        while (tokenizer.hasMoreTokens())
        {
            String temp;
            word.set(tokenizer.nextToken());
            temp=word.toString();
            if (temp.length()!=0)
                len.set(temp.length());
            context.write(len, one);            
        }
    }
}

Here's the Reducer class:

package hadoop.wordcountz.reducer;
import java.io.*;
import java.util.StringTokenizer;

import org.apache.hadoop.io.*;

public class Reducer extends org.apache.hadoop.mapreduce.Reducer<IntWritable, IntWritable, IntWritable, IntWritable >
{
    public void reduce (IntWritable key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException
    {
        int sum = 0;
        for (IntWritable val : values)
        {
            sum+=val.get();
        }
        context.write(key,new IntWritable(sum));
    }
}

Here's the Main class:

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import hadoop.wordcountz.mapper.Mapper;
import hadoop.wordcountz.reducer.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;
public class Main {
 public static void main(String[] args) throws Exception {
        if (args.length != 2) {
            for (int i = 0;i<args.length;i++)
            {
                System.out.println("Arg: "+ i + "Val: " + args[i]);
            }
          System.out.println("usage: [input] [output]");
          System.exit(-1);
        }
        Job job = Job.getInstance(new Configuration());
        job.setOutputKeyClass(IntWritable.class);
        job.setOutputValueClass(IntWritable.class);
        job.setMapperClass(Mapper.class); 
        job.setReducerClass(Reducer.class);  
        job.setInputFormatClass(TextInputFormat.class);
        job.setOutputFormatClass(TextOutputFormat.class);
        FileInputFormat.setInputPaths(job, new Path(args[0]));
        FileOutputFormat.setOutputPath(job, new Path(args[1]));
        job.setJarByClass(Main.class);
        job.submit();
 }
}

This is the pom.xml file specifications:

<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
  <modelVersion>4.0.0</modelVersion>
  <groupId>Bho</groupId>
  <artifactId>Bho</artifactId>
  <version>0.0.1-SNAPSHOT</version>
  <name>WordCount</name>
  <dependencies>
    <dependency>
        <groupId>org.apache.hadoop</groupId>
        <artifactId>hadoop-core</artifactId>
        <version>1.2.1</version>
    </dependency>
    <dependency>
        <groupId>org.apache.hadoop</groupId>
        <artifactId>hadoop-common</artifactId>
        <version>2.8.5</version>
    </dependency>
</dependencies>
</project>

I'm using Eclipse 2019/3 with Maven. I'm not the cluster administrator, I'm starting to think that there may be troubles due to the binaries I'm using or the specific system configuration. Anyhow, if I run hadoop version, it returns me that I'm using v. 2.8.5 To build the .jar file I'm doing the following:

Export -> Jar file -> Select the whole project to export -> Main class as entry point

What am I doing wrong? Thank you.

Giulio Paoli

Posted 2019-06-10T10:43:17.687

Reputation: 1

Hadoop Mapreduce job doesn't start properly

No answers