|
|
-
Re: Reagrding HBase Hadoop multiple scan objects issueTed Yu 2013-01-19, 17:20
Have you subscribed to user mailing list ?
Please do not mix email for user@ and subscription. Some email system would regard messages from amazon.com as unverifiable and put them in Spam folder. What HBase version are you using ? bq. it's inefficient to have one scan object to scan everything Have you looked at the following javadoc in Scan.java ? * To only retrieve columns within a specific range of version timestamps, * execute {@link #setTimeRange(long, long) setTimeRange}. Cheers On Fri, Jan 18, 2013 at 2:43 PM, Xu, Leon <[EMAIL PROTECTED]> wrote: > Hi HBase users, > > I am currently trying to set up a denormalization map-reduce job for my > HBase Table. > Since our table contains large volume of data, it's inefficient to have > one scan object to scan everything. We are only need to process those > records that have changes. I am planning to have multiple scan objects, > each of which scan object specifies range given that we are in track of > what rows has been changed. > Therefore I am trying to set up the map-reduce job with multiple scan > objects, is this possible? > I am seeing some post online suggesting extending the InputFormat object > and change the getSplits, is this the most efficient way? > > Using filter seems to be not very efficient in my case because it's > basically still scan the whole table,right? Just filter out some certain > records. > > Can you point me to the right direction? > > > Thanks > Leon > |