Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> M/R: Data-local vs Rack-local


Copy link to this message
-
Re: M/R: Data-local vs Rack-local
When running map reduce jobs against HBase, a task needs to be scheduled on the region server serving the region you're reading from to be considered local. You have three replicas of the data at the HDFS level, but not at the HBase level.

-Joey

On May 17, 2011, at 5:44, Felix Sprick <[EMAIL PROTECTED]> wrote:

> Hi,
>
> We have a setup with 4 regionservers and a replication factor of 3. We are
> running MapReduce tasks using Hbase as data-source and sink. When running
> MapReduce tasks over data stored on the 4 nodes we noticed that in the
> statistics of a successfully completed job, the majority of the maps are
> "rack-local" and not "data-local". In this particular case we had 48 maps
> where 19 of them were data-local and 29 rack-local. I would have expected to
> have the majority of them "data-local" as the data should be available on 3
> out of 4 nodes due to the replication. Is this a configuration issue or am I
> just thinking in a wrong way?
>
> thanks,
> Felix
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB