Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HDFS >> mail # user >> Understanding Sys.output from mapper & partitioner


Copy link to this message
-
Understanding Sys.output from mapper & partitioner
Below r my simple mapper, partitioner classes and the input file and the output displayed on Console at the end of the message:

My question is about the keys it prints in the console window highlighted in bold in the console output which looks like this:

Here is the first few lines of the output in console:

...

13/03/27 02:20:57 INFO mapred.MapTask: data buffer = 79691776/99614720
13/03/27 02:20:57 INFO mapred.MapTask: record buffer = 262144/327680
key = 0 value = 10    10
token[0] = 10 token[1] = 10
Printing Result in Partitioner = 0
IntPair in Mapper = 10-10
key = 6 value = 20    200
token[0] = 20 token[1] = 200
Printing Result in Partitioner = 0
IntPair in Mapper = 20-200

Q1: I am confused how/where it is calculating/getting these values Key=0 & Key=6 and so on?

Q2: After output of the first 2 lines it prints the output from the partitioner class:
       Printing Result in Partitioner = 0
Is this because its happening parallel y the mapper & the partitioner?

Will really appreciate if someone can take a quick look and pour some light in understanding it.

**** Mapper Class ***
public class SecondarySortMapper extends  Mapper<LongWritable, Text, IntPair, IntWritable> {
   
    private String [] tokens = null;
    private IntWritable ONE = new IntWritable(1);
    @Override
    public void map(LongWritable key, Text value,
            Context context)
            throws IOException , InterruptedException{
       
        System.out.println("key = " + key.toString() + " value = " + value.toString());
       
        if(value!=null){
            tokens = value.toString().split("\\s+") ;
            System.out.println("token[0] = " + tokens[0] + " token[1] = " + tokens[1] );
            ONE.set(Integer.parseInt(tokens[1]));
            IntPair ip = new IntPair(Integer.parseInt(tokens[0]), Integer.parseInt(tokens[1]));
            context.write(ip, ONE);
            System.out.println("IntPair in Mapper = " + ip.toString());                   
        }
    }

**** Partitioner class *** 

public class SecondarySortPartitioner extends Partitioner<IntPair, IntWritable>
{

 
    @Override
    public int getPartition(IntPair key, IntWritable value, int numOfPartitions) {
        // TODO Auto-generated method stub
       
        int result = (key.getFirst().hashCode())%numOfPartitions;
        System.out.println("Printing Result in Partitioner = " + result);
        return result;
    }
   
}
*** input file ***

10    10
20    200
30    2500
40    400
50    500
60    1
10    10
30    2500
50    500
10    100
20    2000
30    25000
40    4000
50    5000
60    10
10    100
30    25000
50    5000

********** Here is the output in the console ****
...

13/03/27 02:20:57 INFO mapred.MapTask: data buffer = 79691776/99614720
13/03/27 02:20:57 INFO mapred.MapTask: record buffer = 262144/327680
key = 0 value = 10    10
token[0] = 10 token[1] = 10
Printing Result in Partitioner = 0
IntPair in Mapper = 10-10
key = 6 value = 20    200
token[0] = 20 token[1] = 200
Printing Result in Partitioner = 0
IntPair in Mapper = 20-200
key = 13 value = 30    2500
token[0] = 30 token[1] = 2500
Printing Result in Partitioner = 0
IntPair in Mapper = 30-2500
key = 21 value = 40    400
token[0] = 40 token[1] = 400
Printing Result in Partitioner = 0
IntPair in Mapper = 40-400
key = 28 value = 50    500
token[0] = 50 token[1] = 500
Printing Result in Partitioner = 0
IntPair in Mapper = 50-500
key = 35 value = 60    1
token[0] = 60 token[1] = 1
Printing Result in Partitioner = 0
IntPair in Mapper = 60-1
key = 40 value = 10    10
token[0] = 10 token[1] = 10
Printing Result in Partitioner = 0
IntPair in Mapper = 10-10
key = 46 value = 30    2500
token[0] = 30 token[1] = 2500
Printing Result in Partitioner = 0
IntPair in Mapper = 30-2500
key = 54 value = 50    500
token[0] = 50 token[1] = 500
Printing Result in Partitioner = 0
IntPair in Mapper = 50-500
key = 61 value = 10    100
token[0] = 10 token[1] = 100
Printing Result in Partitioner = 0
IntPair in Mapper = 10-100
key = 68 value = 20    2000
token[0] = 20 token[1] = 2000
Printing Result in Partitioner = 0
IntPair in Mapper = 20-2000
key = 76 value = 30    25000
token[0] = 30 token[1] = 25000
Printing Result in Partitioner = 0
IntPair in Mapper = 30-25000
key = 85 value = 40    4000
token[0] = 40 token[1] = 4000
Printing Result in Partitioner = 0
IntPair in Mapper = 40-4000
key = 93 value = 50    5000
token[0] = 50 token[1] = 5000
Printing Result in Partitioner = 0
IntPair in Mapper = 50-5000
key = 101 value = 60    10
token[0] = 60 token[1] = 10
Printing Result in Partitioner = 0
IntPair in Mapper = 60-10
key = 107 value = 10    100
token[0] = 10 token[1] = 100
Printing Result in Partitioner = 0
IntPair in Mapper = 10-100
key = 114 value = 30    25000
token[0] = 30 token[1] = 25000
Printing Result in Partitioner = 0
IntPair in Mapper = 30-25000
key = 123 value = 50    5000
token[0] = 50 token[1] = 5000
Printing Result in Partitioner = 0
IntPair in Mapper = 50-5000

Thanks
Sai
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB