Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> Reading in parallel from table's regions in MapReduce


Copy link to this message
-
Re: Reading in parallel from table's regions in MapReduce

Hi there-

Yes, there is an input split for each region of the source table of a MR
job.

There is a blurb on that in the RefGuide...

http://hbase.apache.org/book.html#splitter

On 9/4/12 11:17 AM, "Ioakim Perros" <[EMAIL PROTECTED]> wrote:

>Hello,
>
>I would be grateful if someone could shed a light to the following:
>
>Each M/R map task is reading data from a separate region of a table.
> From the jobtracker 's GUI, at the map completion graph, I notice that
>although data read from mappers are different, they read data
>sequentially - like the table has a lock that permits only one mapper to
>read data from every region at a time.
>
>Does this "lock" hypothesis make sense? Is there any way I could avoid
>this useless delay?
>
>Thanks in advance and regards,
>Ioakim
>