Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
MapReduce >> mail # user >> Yarn never use TeraSort#TotalOrderPartitioner when run TeraSort job?

sam liu 2013-10-16, 02:02
sam liu 2013-10-18, 03:12
Arun C Murthy 2013-10-18, 21:03
Copy link to this message
Re: Yarn never use TeraSort#TotalOrderPartitioner when run TeraSort job?
Sam, I would guess that the jar file you think is running, is not actually the one. I am guessing that in the task classpath, there is a normal jar file (without your changes) which is being picked up before your modified jar file.

On Thursday, October 17, 2013 10:13 PM, sam liu <[EMAIL PROTECTED]> wrote:
It's really weird and confusing me. Anyone can help this question?

2013/10/16 sam liu <[EMAIL PROTECTED]>

Hi Experts,
>In Hadoop-2.0.4, the TeraSort leverage TeraSort#TotalOrderPartitioner as its Partitioner: 'job.setPartitionerClass(TotalOrderPartitioner.class);'. However, seems Yarn did not execute the methods of TeraSort#TotalOrderPartitioner at all. I did some tests to verify it as below:
>Test 1: Add some code in the method readPartitions() and setConf() in TeraSort#TotalOrderPartitioner to print some words and write some word to a file.
>Expected Result: Some words should be printed and wrote into a file
>Actual Result: No word was printed and wrote into a file at all
>Test 2: Remove all existing methods in TeraSort#TotalOrderPartitioner, but only remaining some necessary but empty methods in it
Expected Result: TeraSort job will ocurr some exception, as the specified Partitioner is not implemented at all
>Actual Result: TeraSort job completed successfully without any exception
>Above tests confused me a lot, because seems Yarn never use specified partitioner TeraSort#TotalOrderPartitioner at all during job execution.
>Any one can help provide the reasons?
>Thanks very much!