Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
HBase, mail # user - Reagrding HBase Hadoop multiple scan objects issue


+
Xu, Leon 2013-01-18, 22:43
Copy link to this message
-
Re: Reagrding HBase Hadoop multiple scan objects issue
Doug Meil 2013-01-18, 23:48

Hi there-

You probably want to review this section of the RegGuide:
http://hbase.apache.org/book.html#mapreduce

re:  "it's inefficient to have one scan object to scan everything."
It is.  But in the MapReduce case, there is a Map-task for each input
split (see the RefGuide for details), and therefore a Scanner instance per
Map-task.

On 1/18/13 5:43 PM, "Xu, Leon" <[EMAIL PROTECTED]> wrote:

>Hi HBase users,
>
>I am currently trying to set up a denormalization map-reduce job for my
>HBase Table.
>Since our table contains large volume of data, it's inefficient to have
>one scan object to scan everything. We are only need to process those
>records that have changes. I am planning to have multiple scan objects,
>each of which scan object specifies range given that we are in track of
>what rows has been changed.
>Therefore I am trying to set up the map-reduce job with multiple scan
>objects, is this possible?
>I am seeing some post online suggesting extending the InputFormat object
>and change the getSplits, is this the most efficient way?
>
>Using filter seems to be not very efficient in my case because it's
>basically still scan the whole table,right? Just filter out some certain
>records.
>
>Can you point me to the right direction?
>
>
>Thanks
>Leon
+
Ted Yu 2013-01-19, 17:20