Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> Reagrding HBase Hadoop multiple scan objects issue


Copy link to this message
-
Re: Reagrding HBase Hadoop multiple scan objects issue
Have you subscribed to user mailing list ?
Please do not mix email for user@ and subscription.

Some email system would regard messages from amazon.com as unverifiable and
put them in Spam folder.

What HBase version are you using ?

bq.  it's inefficient to have one scan object to scan everything

Have you looked at the following javadoc in Scan.java ?

 * To only retrieve columns within a specific range of version timestamps,

 * execute {@link #setTimeRange(long, long) setTimeRange}.
Cheers

On Fri, Jan 18, 2013 at 2:43 PM, Xu, Leon <[EMAIL PROTECTED]> wrote:

> Hi HBase users,
>
> I am currently trying to set up a denormalization map-reduce job for my
> HBase Table.
> Since our table contains large volume of data, it's inefficient to have
> one scan object to scan everything. We are only need to process those
> records that have changes. I am planning to have multiple scan objects,
> each of which scan object specifies range given that we are in track of
> what rows has been changed.
> Therefore I am trying to set up the map-reduce job with multiple scan
> objects, is this possible?
> I am seeing some post online suggesting extending the InputFormat object
> and change the getSplits, is this the most efficient way?
>
> Using filter seems to be not very efficient in my case because it's
> basically still scan the whole table,right? Just filter out some certain
> records.
>
> Can you point me to the right direction?
>
>
> Thanks
> Leon
>