Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> Poor data locality of MR job


Copy link to this message
-
Re: Poor data locality of MR job
I presplit the table. The regionservers have gone down on occassion but have been up for a while (weeks). How could that result in having no regions on one node?
On Aug 1, 2012, at 11:39 PM, Adrien Mogenet <[EMAIL PROTECTED]> wrote:

> Did you pre split your table or did you let balancer assign regions to
> regionservers for you ?
>
> Did your regionserver(s) fail ?
>
> On Thu, Aug 2, 2012 at 8:31 AM, Bryan Keller <[EMAIL PROTECTED]> wrote:
>
>> I have an 8 node cluster and a table that is pretty well balanced with on
>> average 36 regions/node. When I run a mapreduce job on the cluster against
>> this table, the data locality of the mappers is poor, e.g 100 rack local
>> mappers and only 188 data local mappers. I would expect nearly all of the
>> mappers to be data local. DNS appears to be fine, i.e. the hostname in the
>> splits is the same as the hostnames in the task attempts.
>>
>> The performance of the rack local mappers is poor and causes overall scan
>> performance to suffer.
>>
>> The table isn't new and from what I understand, HDFS replication will
>> eventually keep region data blocks local to the regionserver. Are there
>> other reasons for data locality to be poor and any way to fix it?
>>
>>
>
>
> --
> Adrien Mogenet
> 06.59.16.64.22
> http://www.mogenet.me
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB