We have root n machines, each having 10 root n memory space. The input is a...

Question

Question

We have root n machines, each having 10 root n memory space. The input is a...

We have root n machines, each having 10 root n memory space. The input is a set of items (i; vi), for i = 1; 2; : : : ; n, distributed in an arbitrary order among the root n machines. We want to compute the output (i; P ji vj ), for i = 1; : : : ; n. The n output pairs can also be distributed in an arbitrary order on the root n machines. Please design a MapReduce algorithm to solve this problem in constant number of rounds (the fewer the better). Please specify the followings at each round: The Map function in the format of (key; value) ! (key1; value1); (key2; value2); : : : The Reduce function in the format of (key; value1); (key; value2); : : : ! (key1; value1); (key2; value2); : : :

engineering Computer-Science

Add a comment Improve this question Transcribed image text

Answer 1

Answer #1

The MapReduce framework automatically sorts the keys generated by mappers. This means that, before starting reducers, all intermediate key-value pairs generated by mappers must be sorted by key (and not by value). Values passed to each reducer are not sorted at all; they can be in any order. What if you also want to sort a reducer’s values? MapReduce/Hadoop and Spark do not sort values for a reducer. So, for those applications (such as time series data) in which you want to sort your reducer data, the Secondary Sort design pattern enables you to do so.

First we’ll focus on the MapReduce/Hadoop solution. Let’s look at the MapReduce paradigm and then unpack the concept of the secondary sort:

< >(key₁, value₁) → list(key₂, value₂)< >(key₂, list(value₂)) → list(key₃, value₃)< >(value₂) = (V₁, V₂, ..., V_n)< >₁, V₂, ..., V_n) = (S₁, S₂, ..., S_n)< >₂) = (S₁, S₂, ..., S_n)< >₁ < S₂ < ... < S_n (ascending order), or< >₁ > S₂ > ... > S_n (descending order)
(K, V₁), (K, V₂), ..., (K, V_n)

then all these values {V₁, V₂, ..., V_n} will be processed by a single reducer (for key = K), but

there will be no order (ascending or descending) between instances of V_i., Secondary Sort is a design pattern we can use to apply an order (such as “ascending sort” or “descending sort”) to the values. How do we accomplish this? Say we want to apply some order to the reducer values:

< >₁ ≤ S₂ ≤ ... ≤ S_n< >₁ ≥ S₂ ≥ ... ≥ S_n
package org.myorg;

import java.io.IOException;

import java.util.*;

import org.apache.hadoop.fs.Path;

import org.apache.hadoop.conf.*;

import org.apache.hadoop.io.*;

import org.apache.hadoop.mapred.*;

import org.apache.hadoop.util.*;

public class WordCount {

   public static class Map extends MapReduceBase implements Mapper<LongWritable, Text, Text, IntWritable> {

     private final static IntWritable one = new IntWritable(1);

     private Text word = new Text();

     public void map(LongWritable key, Text value, OutputCollector<Text, IntWritable> output, Reporter reporter) throws IOException {

       String line = value.toString();

       StringTokenizer tokenizer = new StringTokenizer(line);

       while (tokenizer.hasMoreTokens()) {

         word.set(tokenizer.nextToken());

         output.collect(word, one);

       }

     }

   }

   public static class Reduce extends MapReduceBase implements Reducer<Text, IntWritable, Text, IntWritable> {

     public void reduce(Text key, Iterator<IntWritable> values, OutputCollector<Text, IntWritable> output, Reporter reporter) throws IOException {

       int sum = 0;

       while (values.hasNext()) {

         sum += values.next().get();

       }

       output.collect(key, new IntWritable(sum));

     }

   }

   public static void main(String[] args) throws Exception {

     JobConf conf = new JobConf(WordCount.class);

     conf.setJobName("wordcount");

     conf.setOutputKeyClass(Text.class);

     conf.setOutputValueClass(IntWritable.class);

     conf.setMapperClass(Map.class);

     conf.setCombinerClass(Reduce.class);

     conf.setReducerClass(Reduce.class);

     conf.setInputFormat(TextInputFormat.class);

     conf.setOutputFormat(TextOutputFormat.class);

     FileInputFormat.setInputPaths(conf, new Path(args[0]));

     FileOutputFormat.setOutputPath(conf, new Path(args[1]));

     JobClient.runJob(conf);

   }

}

Add a comment

Answer 2

We have root n machines, each having 10 root n memory space. The input is a...

Homework Answers

Add Answer to:
We have root n machines, each having 10 root n memory space. The input is a...

Post as a guest

Earn Coins

I need help in C++ implementing binary search tree. I have the .h file for the...

How to write the insert, search, and remove functions for this hash table program? I'm stuck......

We have root n machines, each having 10 root n memory space. The input is a...

Homework Answers

Add Answer to: We have root n machines, each having 10 root n memory space. The input is a...

Post as a guest

Earn Coins

I need help in C++ implementing binary search tree. I have the .h file for the...

How to write the insert, search, and remove functions for this hash table program? I'm stuck......

Add Answer to:
We have root n machines, each having 10 root n memory space. The input is a...