Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> Reading in parallel from table's regions in MapReduce


Copy link to this message
-
Re: Reading in parallel from table's regions in MapReduce
Hi Loakim:

Sorry, your hypothesis doesn't make sense. I would suggest you to read the
"Learning HBase Internals" by Lars Hofhansl at
http://www.slideshare.net/cloudera/3-learning-h-base-internals-lars-hofhansl-salesforce-final
to
understand how HBase locking works.

Regarding to the issue you are facing, are you sure you configure the job
properly (i.e. requesting the jobtracker to have more than 1 mapper to
execute)? If you are testing on a single machine, you properly need to
configure the number of tasktracker per node as well to see more than 1
mapper to execute on a single machine.

my $0.02

Jerry

On Tue, Sep 4, 2012 at 11:17 AM, Ioakim Perros <[EMAIL PROTECTED]> wrote:

> Hello,
>
> I would be grateful if someone could shed a light to the following:
>
> Each M/R map task is reading data from a separate region of a table.
> From the jobtracker 's GUI, at the map completion graph, I notice that
> although data read from mappers are different, they read data sequentially
> - like the table has a lock that permits only one mapper to read data from
> every region at a time.
>
> Does this "lock" hypothesis make sense? Is there any way I could avoid
> this useless delay?
>
> Thanks in advance and regards,
> Ioakim
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB