The first thing that comes into mind while writing a MapReduce program is the types we you are going to use in the code for Mapper and Reducer class.There are few points that should be followed for writing and understanding Mapreduce program.Here is a recap for the data types used in MapReduce (in case you have missed the MapReduce Introduction post).
Broadly the data types used in MapRduce are as follows.
- LongWritable-Corresponds to Java Long
- Text -Corresponds to Java String
- IntWritable -Corresponds to Java Integer
- NullWritable - Corrresponds to Null Values
Having a quick overview, we can jump over to the key thing that is data type in MapReduce. Now MapReduce has a simple model of data processing: inputs and outputs for the map and reduce functions are key-value pairs
- The map and reduce functions in MapReduce have the following general form:
map: (K1, V1) → list(K2, V2)
reduce: (K2, list(V2)) → list(K3, V3)
- K1-Input Key
- V1-Input value
- K2-Output Key
- V2-Output value
Default MapReduce Job:No Mapper, No Reducer
Ever tried to run MapReduce program without setting a mapper or a reducer? Here is the minimal MapReduce program.
Run it over a small data and check the output. Here is little data which I used and the final result.You can take a larger data set.
Notice the result file we get after running the above code on the given data. It added an extra column with some numbers as data.What happened is the that the the newly added column contains the key for every line. The number is the offset of the line from the first line i.e. how far the beginning of the first line is placed from the first line(0 of course)similarly how many characters away is the second line from first. Count the characters, it will be 16 and so on.
This offset is taken as a key and emitted in the result.