Felix Sprick 2011-05-17, 12:44
When running map reduce jobs against HBase, a task needs to be scheduled on the region server serving the region you're reading from to be considered local. You have three replicas of the data at the HDFS level, but not at the HBase level.
On May 17, 2011, at 5:44, Felix Sprick <[EMAIL PROTECTED]> wrote:
> We have a setup with 4 regionservers and a replication factor of 3. We are
> running MapReduce tasks using Hbase as data-source and sink. When running
> MapReduce tasks over data stored on the 4 nodes we noticed that in the
> statistics of a successfully completed job, the majority of the maps are
> "rack-local" and not "data-local". In this particular case we had 48 maps
> where 19 of them were data-local and 29 rack-local. I would have expected to
> have the majority of them "data-local" as the data should be available on 3
> out of 4 nodes due to the replication. Is this a configuration issue or am I
> just thinking in a wrong way?