0
in the last few days I've been playing with Hadoop, and I've been trying to implement the k-meters count (find the absolute frequencies of different words lengths) Everything is implemented following the MapReduce paradigm. The problem is: when I launch the job on a pseudo-distributed cluster (single node, running locally) the job runs perfectly and gives expected results. When I try to run it on a real cluster, it actually doesn't start and stops on the following line:
19/06/10 12:16:05 INFO client.RMProxy: Connecting to ResourceManager at node-master/172.16.14.201:8032
19/06/10 12:16:05 WARN mapreduce.JobResourceUploader: Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this.
19/06/10 12:16:06 INFO input.FileInputFormat: Total input files to process : 1
19/06/10 12:16:06 INFO mapreduce.JobSubmitter: number of splits:1
19/06/10 12:16:06 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1559917881770_0004
19/06/10 12:16:06 INFO impl.YarnClientImpl: Submitted application application_1559917881770_0004
**19/06/10 12:16:06 INFO mapreduce.Job: The url to track the job: http://node-master:8088/proxy/application_1559917881770_0004/**
The thing is, I can easily run the Wordcount example that's provided with Hadoop.
Here's the Mapper class:
package hadoop.wordcountz.mapper;
import java.io.IOException;
import java.util.StringTokenizer;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
public class Mapper extends org.apache.hadoop.mapreduce.Mapper<LongWritable, Text, IntWritable, IntWritable>
{
private final static IntWritable one = new IntWritable(1);
private IntWritable len = new IntWritable(0);
private Text word = new Text();
public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException
{
String line = value.toString();
StringTokenizer tokenizer = new StringTokenizer(line);
while (tokenizer.hasMoreTokens())
{
String temp;
word.set(tokenizer.nextToken());
temp=word.toString();
if (temp.length()!=0)
len.set(temp.length());
context.write(len, one);
}
}
}
Here's the Reducer class:
package hadoop.wordcountz.reducer;
import java.io.*;
import java.util.StringTokenizer;
import org.apache.hadoop.io.*;
public class Reducer extends org.apache.hadoop.mapreduce.Reducer<IntWritable, IntWritable, IntWritable, IntWritable >
{
public void reduce (IntWritable key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException
{
int sum = 0;
for (IntWritable val : values)
{
sum+=val.get();
}
context.write(key,new IntWritable(sum));
}
}
Here's the Main class:
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import hadoop.wordcountz.mapper.Mapper;
import hadoop.wordcountz.reducer.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;
public class Main {
public static void main(String[] args) throws Exception {
if (args.length != 2) {
for (int i = 0;i<args.length;i++)
{
System.out.println("Arg: "+ i + "Val: " + args[i]);
}
System.out.println("usage: [input] [output]");
System.exit(-1);
}
Job job = Job.getInstance(new Configuration());
job.setOutputKeyClass(IntWritable.class);
job.setOutputValueClass(IntWritable.class);
job.setMapperClass(Mapper.class);
job.setReducerClass(Reducer.class);
job.setInputFormatClass(TextInputFormat.class);
job.setOutputFormatClass(TextOutputFormat.class);
FileInputFormat.setInputPaths(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
job.setJarByClass(Main.class);
job.submit();
}
}
This is the pom.xml file specifications:
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>Bho</groupId>
<artifactId>Bho</artifactId>
<version>0.0.1-SNAPSHOT</version>
<name>WordCount</name>
<dependencies>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-core</artifactId>
<version>1.2.1</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-common</artifactId>
<version>2.8.5</version>
</dependency>
</dependencies>
</project>
I'm using Eclipse 2019/3 with Maven.
I'm not the cluster administrator, I'm starting to think that there may be troubles due to the binaries I'm using or the specific system configuration.
Anyhow, if I run hadoop version
, it returns me that I'm using v. 2.8.5
To build the .jar
file I'm doing the following:
Export -> Jar file -> Select the whole project to export -> Main class as entry point
What am I doing wrong? Thank you.