Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> Reading in parallel from table's regions in MapReduce


Copy link to this message
-
Re: Reading in parallel from table's regions in MapReduce
Hi Loakim:

Sorry, your hypothesis doesn't make sense. I would suggest you to read the
"Learning HBase Internals" by Lars Hofhansl at
http://www.slideshare.net/cloudera/3-learning-h-base-internals-lars-hofhansl-salesforce-final
to
understand how HBase locking works.

Regarding to the issue you are facing, are you sure you configure the job
properly (i.e. requesting the jobtracker to have more than 1 mapper to
execute)? If you are testing on a single machine, you properly need to
configure the number of tasktracker per node as well to see more than 1
mapper to execute on a single machine.

my $0.02

Jerry

On Tue, Sep 4, 2012 at 11:17 AM, Ioakim Perros <[EMAIL PROTECTED]> wrote:

> Hello,
>
> I would be grateful if someone could shed a light to the following:
>
> Each M/R map task is reading data from a separate region of a table.
> From the jobtracker 's GUI, at the map completion graph, I notice that
> although data read from mappers are different, they read data sequentially
> - like the table has a lock that permits only one mapper to read data from
> every region at a time.
>
> Does this "lock" hypothesis make sense? Is there any way I could avoid
> this useless delay?
>
> Thanks in advance and regards,
> Ioakim
>