We have root n machines, each having 10 root n memory space. The input is a set of items (i; vi), for i = 1; 2; : : : ; n, distributed in an arbitrary order among the root n machines. We want to compute the output (i; P ji vj ), for i = 1; : : : ; n. The n output pairs can also be distributed in an arbitrary order on the root n machines. Please design a MapReduce algorithm to solve this problem in constant number of rounds (the fewer the better). Please specify the followings at each round: The Map function in the format of (key; value) ! (key1; value1); (key2; value2); : : : The Reduce function in the format of (key; value1); (key; value2); : : : ! (key1; value1); (key2; value2); : : :
The MapReduce framework automatically sorts the keys generated by mappers. This means that, before starting reducers, all intermediate key-value pairs generated by mappers must be sorted by key (and not by value). Values passed to each reducer are not sorted at all; they can be in any order. What if you also want to sort a reducer’s values? MapReduce/Hadoop and Spark do not sort values for a reducer. So, for those applications (such as time series data) in which you want to sort your reducer data, the Secondary Sort design pattern enables you to do so.
First we’ll focus on the MapReduce/Hadoop solution. Let’s look at the MapReduce paradigm and then unpack the concept of the secondary sort:
(K, V1), (K, V2), ..., (K, Vn)
then all these values {V1, V2, ..., Vn} will be processed by a single reducer (for key = K), but
there will be no order (ascending or descending) between instances of Vi., Secondary Sort is a design pattern we can use to apply an order (such as “ascending sort” or “descending sort”) to the values. How do we accomplish this? Say we want to apply some order to the reducer values:
package org.myorg;
import java.io.IOException;
import java.util.*;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.conf.*;
import org.apache.hadoop.io.*;
import org.apache.hadoop.mapred.*;
import org.apache.hadoop.util.*;
public class WordCount {
public static class Map extends MapReduceBase implements Mapper<LongWritable, Text, Text, IntWritable> {
private final static IntWritable one = new IntWritable(1);
private Text word = new Text();
public void map(LongWritable key, Text value, OutputCollector<Text, IntWritable> output, Reporter reporter) throws IOException {
String line = value.toString();
StringTokenizer tokenizer = new StringTokenizer(line);
while (tokenizer.hasMoreTokens()) {
word.set(tokenizer.nextToken());
output.collect(word, one);
}
}
}
public static class Reduce extends MapReduceBase implements Reducer<Text, IntWritable, Text, IntWritable> {
public void reduce(Text key, Iterator<IntWritable> values, OutputCollector<Text, IntWritable> output, Reporter reporter) throws IOException {
int sum = 0;
while (values.hasNext()) {
sum += values.next().get();
}
output.collect(key, new IntWritable(sum));
}
}
public static void main(String[] args) throws Exception {
JobConf conf = new JobConf(WordCount.class);
conf.setJobName("wordcount");
conf.setOutputKeyClass(Text.class);
conf.setOutputValueClass(IntWritable.class);
conf.setMapperClass(Map.class);
conf.setCombinerClass(Reduce.class);
conf.setReducerClass(Reduce.class);
conf.setInputFormat(TextInputFormat.class);
conf.setOutputFormat(TextOutputFormat.class);
FileInputFormat.setInputPaths(conf, new Path(args[0]));
FileOutputFormat.setOutputPath(conf, new Path(args[1]));
JobClient.runJob(conf);
}
}
We have root n machines, each having 10 root n memory space. The input is a...
I need help in C++ implementing binary search tree. I have the .h file for the binary search tree class. I have 4 classic texts, and 2 different dictionaries. Classic Texts: Alice In Wonderland.txt A Tale of Two Cities.txt Pride And Prejudice.txt War and Peace.txt 2 different dictionaries: Dictionary.txt Dictionary-brit.txt The data structures from the standard template library can not be used.The main program should open the text file, read in the words, remove the punctuation and change all the...
How to write the insert, search, and remove functions for this hash table program? I'm stuck... This program is written in C++ Hash Tables Hash Table Header File Copy and paste the following code into a header file named HashTable.h Please do not alter this file in any way or you may not receive credit for this lab For this lab, you will implement each of the hash table functions whose prototypes are in HashTable.h. Write these functions in a...