Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
HBase >> mail # user >> Reagrding HBase Hadoop multiple scan objects issue


+
Xu, Leon 2013-01-18, 22:43
+
Doug Meil 2013-01-18, 23:48
Copy link to this message
-
Re: Reagrding HBase Hadoop multiple scan objects issue
Have you subscribed to user mailing list ?
Please do not mix email for user@ and subscription.

Some email system would regard messages from amazon.com as unverifiable and
put them in Spam folder.

What HBase version are you using ?

bq.  it's inefficient to have one scan object to scan everything

Have you looked at the following javadoc in Scan.java ?

 * To only retrieve columns within a specific range of version timestamps,

 * execute {@link #setTimeRange(long, long) setTimeRange}.
Cheers

On Fri, Jan 18, 2013 at 2:43 PM, Xu, Leon <[EMAIL PROTECTED]> wrote:

> Hi HBase users,
>
> I am currently trying to set up a denormalization map-reduce job for my
> HBase Table.
> Since our table contains large volume of data, it's inefficient to have
> one scan object to scan everything. We are only need to process those
> records that have changes. I am planning to have multiple scan objects,
> each of which scan object specifies range given that we are in track of
> what rows has been changed.
> Therefore I am trying to set up the map-reduce job with multiple scan
> objects, is this possible?
> I am seeing some post online suggesting extending the InputFormat object
> and change the getSplits, is this the most efficient way?
>
> Using filter seems to be not very efficient in my case because it's
> basically still scan the whole table,right? Just filter out some certain
> records.
>
> Can you point me to the right direction?
>
>
> Thanks
> Leon
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB