Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> Reading in parallel from table's regions in MapReduce


Copy link to this message
-
Re: Reading in parallel from table's regions in MapReduce

Hi there-

Yes, there is an input split for each region of the source table of a MR
job.

There is a blurb on that in the RefGuide...

http://hbase.apache.org/book.html#splitter

On 9/4/12 11:17 AM, "Ioakim Perros" <[EMAIL PROTECTED]> wrote:

>Hello,
>
>I would be grateful if someone could shed a light to the following:
>
>Each M/R map task is reading data from a separate region of a table.
> From the jobtracker 's GUI, at the map completion graph, I notice that
>although data read from mappers are different, they read data
>sequentially - like the table has a lock that permits only one mapper to
>read data from every region at a time.
>
>Does this "lock" hypothesis make sense? Is there any way I could avoid
>this useless delay?
>
>Thanks in advance and regards,
>Ioakim
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB