Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
HDFS >> mail # user >> Understanding Sys.output from mapper & partitioner


Copy link to this message
-
Understanding Sys.output from mapper & partitioner
Below r my simple mapper, partitioner classes and the input file and the output displayed on Console at the end of the message:

My question is about the keys it prints in the console window highlighted in bold in the console output which looks like this:

Here is the first few lines of the output in console:

...

13/03/27 02:20:57 INFO mapred.MapTask: data buffer = 79691776/99614720
13/03/27 02:20:57 INFO mapred.MapTask: record buffer = 262144/327680
key = 0 value = 10    10
token[0] = 10 token[1] = 10
Printing Result in Partitioner = 0
IntPair in Mapper = 10-10
key = 6 value = 20    200
token[0] = 20 token[1] = 200
Printing Result in Partitioner = 0
IntPair in Mapper = 20-200

Q1: I am confused how/where it is calculating/getting these values Key=0 & Key=6 and so on?

Q2: After output of the first 2 lines it prints the output from the partitioner class:
       Printing Result in Partitioner = 0
Is this because its happening parallel y the mapper & the partitioner?

Will really appreciate if someone can take a quick look and pour some light in understanding it.

**** Mapper Class ***
public class SecondarySortMapper extends  Mapper<LongWritable, Text, IntPair, IntWritable> {
   
    private String [] tokens = null;
    private IntWritable ONE = new IntWritable(1);
    @Override
    public void map(LongWritable key, Text value,
            Context context)
            throws IOException , InterruptedException{
       
        System.out.println("key = " + key.toString() + " value = " + value.toString());
       
        if(value!=null){
            tokens = value.toString().split("\\s+") ;
            System.out.println("token[0] = " + tokens[0] + " token[1] = " + tokens[1] );
            ONE.set(Integer.parseInt(tokens[1]));
            IntPair ip = new IntPair(Integer.parseInt(tokens[0]), Integer.parseInt(tokens[1]));
            context.write(ip, ONE);
            System.out.println("IntPair in Mapper = " + ip.toString());                   
        }
    }

**** Partitioner class *** 

public class SecondarySortPartitioner extends Partitioner<IntPair, IntWritable>
{

 
    @Override
    public int getPartition(IntPair key, IntWritable value, int numOfPartitions) {
        // TODO Auto-generated method stub
       
        int result = (key.getFirst().hashCode())%numOfPartitions;
        System.out.println("Printing Result in Partitioner = " + result);
        return result;
    }
   
}
*** input file ***

10    10
20    200
30    2500
40    400
50    500
60    1
10    10
30    2500
50    500
10    100
20    2000
30    25000
40    4000
50    5000
60    10
10    100
30    25000
50    5000

********** Here is the output in the console ****
...

13/03/27 02:20:57 INFO mapred.MapTask: data buffer = 79691776/99614720
13/03/27 02:20:57 INFO mapred.MapTask: record buffer = 262144/327680
key = 0 value = 10    10
token[0] = 10 token[1] = 10
Printing Result in Partitioner = 0
IntPair in Mapper = 10-10
key = 6 value = 20    200
token[0] = 20 token[1] = 200
Printing Result in Partitioner = 0
IntPair in Mapper = 20-200
key = 13 value = 30    2500
token[0] = 30 token[1] = 2500
Printing Result in Partitioner = 0
IntPair in Mapper = 30-2500
key = 21 value = 40    400
token[0] = 40 token[1] = 400
Printing Result in Partitioner = 0
IntPair in Mapper = 40-400
key = 28 value = 50    500
token[0] = 50 token[1] = 500
Printing Result in Partitioner = 0
IntPair in Mapper = 50-500
key = 35 value = 60    1
token[0] = 60 token[1] = 1
Printing Result in Partitioner = 0
IntPair in Mapper = 60-1
key = 40 value = 10    10
token[0] = 10 token[1] = 10
Printing Result in Partitioner = 0
IntPair in Mapper = 10-10
key = 46 value = 30    2500
token[0] = 30 token[1] = 2500
Printing Result in Partitioner = 0
IntPair in Mapper = 30-2500
key = 54 value = 50    500
token[0] = 50 token[1] = 500
Printing Result in Partitioner = 0
IntPair in Mapper = 50-500
key = 61 value = 10    100
token[0] = 10 token[1] = 100
Printing Result in Partitioner = 0
IntPair in Mapper = 10-100
key = 68 value = 20    2000
token[0] = 20 token[1] = 2000
Printing Result in Partitioner = 0
IntPair in Mapper = 20-2000
key = 76 value = 30    25000
token[0] = 30 token[1] = 25000
Printing Result in Partitioner = 0
IntPair in Mapper = 30-25000
key = 85 value = 40    4000
token[0] = 40 token[1] = 4000
Printing Result in Partitioner = 0
IntPair in Mapper = 40-4000
key = 93 value = 50    5000
token[0] = 50 token[1] = 5000
Printing Result in Partitioner = 0
IntPair in Mapper = 50-5000
key = 101 value = 60    10
token[0] = 60 token[1] = 10
Printing Result in Partitioner = 0
IntPair in Mapper = 60-10
key = 107 value = 10    100
token[0] = 10 token[1] = 100
Printing Result in Partitioner = 0
IntPair in Mapper = 10-100
key = 114 value = 30    25000
token[0] = 30 token[1] = 25000
Printing Result in Partitioner = 0
IntPair in Mapper = 30-25000
key = 123 value = 50    5000
token[0] = 50 token[1] = 5000
Printing Result in Partitioner = 0
IntPair in Mapper = 50-5000

Thanks
Sai
+
Jens Scheidtmann 2013-03-29, 15:56