-Re: Yarn never use TeraSort#TotalOrderPartitioner when run TeraSort job?
sam liu 2013-10-20, 13:26
Furthermore, I did another test: rename TeraSort#TotalOrderPartitioner to
TeraSort#MyOwnTotalOrderPartitioner to avoid conflicting with other
homonymic classes in hadoop classpath. Also, in TeraSort.java, I modified
seems the MyOwnTotalOrderPartitioner was not invoked during executing
BTW, in TeraSort#TotalOrderPartitioner#readPartitions(), there is a
statement 'DataInputStream reader = fs.open(p);', and I know 'p' is the
path of '_partition.lst'. But I am not clear two details:
- Where is the location of 'p'? It's on hdfs or Linux file system? What's
its absolute path?
- Which part or phase of Hadoop MapReduce copy the _partition.lst file to
the path 'p'? I am very confusing this part
Thanks very much!
2013/10/20 sam liu <[EMAIL PROTECTED]>
> After I took following actions, the job still could pass and seems all
> TotalOrderPartitioner classes were not invoked at all:
> - Modified libexec/hadoop-config.sh to put
> hadoop-mapreduce-examples-2.0.4-alpha.jar in the front of hadoop classpath,
> and it should ensure the TeraSort#
> TotalOrderPartitioner will be invoked first
> - Fiddled with org.apache.hadoop.mapreduce.TotalOrderPartitioner, and then
> replace with the new generated
> 2013/10/19 Arun C Murthy <[EMAIL PROTECTED]>
>> Apologies for the late response.
>> In hadoop-2 TeraSort uses the new org.apache.hadoop.mapreduce apis (not
>> Did you fiddle with the right TotalOrderPartitioner
>> i.e. org.apache.hadoop.mapreduce.TotalOrderPartitioner?
>> On Oct 17, 2013, at 8:12 PM, sam liu <[EMAIL PROTECTED]> wrote:
>> It's really weird and confusing me. Anyone can help this question?
>> 2013/10/16 sam liu <[EMAIL PROTECTED]>
>>> Hi Experts,
>>> In Hadoop-2.0.4, the TeraSort leverage TeraSort#TotalOrderPartitioner as
>>> its Partitioner: 'job.setPartitionerClass(TotalOrderPartitioner.class);'.
>>> However, seems Yarn did not execute the methods of
>>> TeraSort#TotalOrderPartitioner at all. I did some tests to verify it as
>>> Test 1: Add some code in the method readPartitions() and setConf() in
>>> TeraSort#TotalOrderPartitioner to print some words and write some word to a
>>> Expected Result: Some words should be printed and wrote into a file
>>> Actual Result: No word was printed and wrote into a file at all
>>> Test 2: Remove all existing methods in TeraSort#TotalOrderPartitioner,
>>> but only remaining some necessary but empty methods in it
>>> Expected Result: TeraSort job will ocurr some exception, as the
>>> specified Partitioner is not implemented at all
>>> Actual Result: TeraSort job completed successfully without any exception
>>> Above tests confused me a lot, because seems Yarn never use specified
>>> partitioner TeraSort#TotalOrderPartitioner at all during job execution.
>>> Any one can help provide the reasons?
>>> Thanks very much!
>> Arun C. Murthy
>> Hortonworks Inc.
>> CONFIDENTIALITY NOTICE
>> NOTICE: This message is intended for the use of the individual or entity
>> to which it is addressed and may contain information that is confidential,
>> privileged and exempt from disclosure under applicable law. If the reader
>> of this message is not the intended recipient, you are hereby notified that
>> any printing, copying, dissemination, distribution, disclosure or
>> forwarding of this communication is strictly prohibited. If you have
>> received this communication in error, please contact the sender immediately
>> and delete it from your system. Thank You.