For each problem, provide test data (as input files), MapReduce programs, and running results (screen shoots) on Hadoop.
i) Write a program to read input file (an integer number per line) and remove duplicated numbers.
ii) When data are transmitted from map to reduce, <key, value> will be automatically sorted in an ascending order. Write a program that can read input file (an integer number per line) and write them out in a descending order.
i)
Requirement::
Suppose your data file which are having user's basic information like first name, last name, designation, city etc. and seperated by ',' delimiter. Now the requirement is to reduce duplicate values using map reduce.
Components Involved::
Sample Data::

Step 1: Input Data Preparation::
Let's keep the sample data into a file and keep it at the local path. My file name is "sampledataForDuplicate" and local path is "/home/NN/HadoopRepo/MapReduce/resources/duplicateValue".
We need to move this data file to HDFS location which will be input path for the map reduce.

It will copy the data file from local to HDFS location.

Step 2: Create maven Project::
Follow the below steps to create maven project:
Step 3: Resolve Dependency::
Add dependecy in pom.XML file and resolve dependency using cmd:

Step 4: Write mapper::
Once you are done with all above steps, write a mapper calss which will take an input file. It will read the file and store each word of file with key-value pair. Here using a java program to write the mapper.

Step 5: Write Reducer::
In this step, we will take the mapper data as an input and process it. The actual logic has been written here.

Step 6:
In order to execute the mapper and reducer, let's create a driver class which will call mapper and reducer. Find the below driver class code::

Continution Screenshot

Step 7: Package Preparation::
In this step, We will create a package(.jar) of the project, Follow below steps::
mvn package

It will create a .jar file under target directory. Copy this created jar and keep it at the local path. Here in my case jar file is available at "/home/NN/HadoopRepo/MapReduce".
Step 8: Execution::
Command for execution::
hadoop jar <path of jar> <driver class with package name> <input data path of HDFS> <output path at HDFS>
Step 9:: Validate Output::
Check the output at HDFS path

Check the output

For each problem, provide test data (as input files), MapReduce programs, and running results (screen shoots)...
Question 1 (50) MapReduce For each problem, provide test data (as input files), MapReduce programs, and running results (screen shoots) on Hadoop i) Write a program to read input file (an integer number per line) and remove duplicated numbers ii) When data are transmitted from map to reduce, <key, value will be automatically sorted in an ascending order. Write a program that can read input file (an integer number per line) and write them out in a descending order.
write a complete Java program with comments in main and in each method. Data: The input data for this program is given as two columns of numbers. All data will be entered from a fle named input.txt and all output will go to the screen Assume there will not be more than 100 7 23.56 16 88.12 10 75.1 Design a Java class with a main method that does the following 1) Reads the data into two arrays of doubles,...
C++
2.3 Activity 3: Bubble Sort For this activity, you are required to provide files called act3.cpp as well as a makefile to compile and run it. In your file, act3.cpp, should have a skeleton of a main program as per normal which you will then fill it in as per the following. The objective of this activity is to demonstrate the bubble sort algorithm for arrays. You are going to implement this as a function with the following definition:...
(C++ programming) Need help with homework. Write a program that can be used to gather statistical data about the number of hours per week college students play video games. The program should perform the following steps: 1). Read data from the input file into a dynamically allocated array. The first number in the input file, n, represents the number of students that were surveyed. First read this number then use it to dynamically allocate an array of n integers. On...
C++ programming language Write a program that asks the user for a file name. The file contains a series of scores(integers), each written on a separate line. The program should read the contents of the file into an array and then display the following content: 1) The scores in rows of 10 scores and in sorted in descending order. 2) The lowest score in the array 3) The highest score in the array 4) The total number of scores in...
UNIX is all about manipulating files and input/output streams fluidly, so it is important to get a strong grasp of how this fundamentally works at the system call level to understand higher-level system programming concepts. Every program automatically has three file descriptors opened by the shell standard input standard output standard error 1 2 One can use read and write other open file. Normally, standard input and output on the terminal are line-buffered, so, for example, the specified number of...
Using c++ Write a program that reads in a list of up to 25 first names from an input file nameData.txt and allows the user to display the name data, sort it in ascending or descending order, count the number of occurrences of a given name, or exit the program. The input file consists of a single names per line with each line terminated with an end of line (i.e. using Enter key to end the line). Your program should...
PROGRAM DESCRIPTION Implement the combined O(n) radix/bucket sort as described in class. (i.e. divide the input by radix, bucket sort (with no insertion sort step) once for each radix starting from the least significant. Make sure that your overall implementation is O(n) NPUT The input to your program will an unspecified number of entries. Each entry is a non-negative integer containing nine (zero padded) digits ( this means that the integer may have either leading or trailing zeros), one per...
For this program, you will be working with data from the NASA website which lists Near Earth Objects detected by the JPL Sentry System. You are given a text file listing the designation and impact probability (with Earth, generally within the next 100 years) of 585 Near Earth Objects. Your job will be to sort these objects by their impact probabilities. Input File Format The input file contains 585 records. Each record is on a separate line. Each line contains...
Files, Pointers and Dynamic Memory Allocation, and Structs Due date/time: Tuesday, Nov 26th, 11:00 PM. WRITE A C++ PROGRAM (USE DYNAMIC MEMORY ALLOCATION) THAT READS N CUSTOMER RECORDS FROM A TEXT FILE (CUSTOMERS.TXT) SUCH THAT THE NUMBER OF THE RECORDS IS STORED ON THE FIRST LINE IN THE FILE. EACH RECORD HAS 4 FIELDS (PIECES OF INFORMATION) AND STORED IN THE FILE AS SHOWN BELOW: Account Number (integer) Customer full name (string) Customer email (string) Account Balance (double) The program...