Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # dev >> Question about Hadoop-8192 and rackToBlocks ordering


Copy link to this message
-
Re: Question about Hadoop-8192 and rackToBlocks ordering
Thanks for the reply Robert,
However I believe the main design issue is:
If there is a rack ( listed in rackToBlock hashMap) that contains all the
blocks (stored in blockToNode hashMap), regardless of the order, the split
operation terminates after the rack gets processed,  That means remaining
racks  ( listed in rackToBlock hashMap)  will not get processed . For more
details look at file CombineFileInputFormat.JAVA, method getMoreSplits(),
while loop starting at  line 344.

Best Regards
Amir Sanjar

Linux System Management Architect and Lead
IBM Senior Software Engineer
Phone# 512-286-8393
Fax#      512-838-8858

From: Robert Evans <[EMAIL PROTECTED]>
To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>
Date: 03/22/2012 11:57 AM
Subject: Re: Question about Hadoop-8192 and rackToBlocks ordering

If it really is the ordering of the hash map I would say no it should not,
and the code should be updated.  If ordering matters we need to use a map
that guarantees a given order, and hash map is not one of them.

--Bobby Evans

On 3/22/12 7:24 AM, "Kumar Ravi" <[EMAIL PROTECTED]> wrote:

Hello,

 We have been looking at IBM JDK junit failures on Hadoop-1.0.1
independently and have ran into the same failures as reported in this JIRA.
I have a question based upon what I have observed below.

We started debugging the problems in the testcase -
org.apache.hadoop.mapred.lib.TestCombineFileInputFormat
The testcase fails because the number of splits returned back from
CombineFileInputFormat.getSplits() is 1 when using IBM JDK whereas the
expected return value is 2.

So far, we have found the reason for this difference in number of splits is
because the order in which elements in the rackToBlocks hashmap get created
is in the reverse order that Sun JDK creates.

The question I have at this point is -- Should there be a strict dependency
in the order in which the rackToBlocks hashmap gets populated, to determine
the number of splits that get should get created in a hadoop cluster? Is
this Working as designed?

Regards,
Kumar
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB