Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase, mail # user - Poor data locality of MR job


Copy link to this message
-
Re: Poor data locality of MR job
Bryan Keller 2012-08-02, 17:15
I presplit the table. The regionservers have gone down on occassion but have been up for a while (weeks). How could that result in having no regions on one node?
On Aug 1, 2012, at 11:39 PM, Adrien Mogenet <[EMAIL PROTECTED]> wrote:

> Did you pre split your table or did you let balancer assign regions to
> regionservers for you ?
>
> Did your regionserver(s) fail ?
>
> On Thu, Aug 2, 2012 at 8:31 AM, Bryan Keller <[EMAIL PROTECTED]> wrote:
>
>> I have an 8 node cluster and a table that is pretty well balanced with on
>> average 36 regions/node. When I run a mapreduce job on the cluster against
>> this table, the data locality of the mappers is poor, e.g 100 rack local
>> mappers and only 188 data local mappers. I would expect nearly all of the
>> mappers to be data local. DNS appears to be fine, i.e. the hostname in the
>> splits is the same as the hostnames in the task attempts.
>>
>> The performance of the rack local mappers is poor and causes overall scan
>> performance to suffer.
>>
>> The table isn't new and from what I understand, HDFS replication will
>> eventually keep region data blocks local to the regionserver. Are there
>> other reasons for data locality to be poor and any way to fix it?
>>
>>
>
>
> --
> Adrien Mogenet
> 06.59.16.64.22
> http://www.mogenet.me