Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> Filtering/Collection columns during Major Compaction

Copy link to this message
Re: Filtering/Collection columns during Major Compaction
Thanks ! This is exactly what I need. I am looking at the code in
compactStore() under Store.java but I am trying to understand why, for the
real compaction - smallestReadPoint needs to be passed - I thought the read
point was a memstore only thing. Also the preCompactScannerOpen does not
have a way of passing this value.


On Mon, Dec 10, 2012 at 6:08 AM, ramkrishna vasudevan <

> Hi Varun
> If you are using 0.94 version you have a coprocessor that is getting
> invoked before and after Compaction selection.
> preCompactScannerOpen() helps you to create your own scanner which actually
> does the next() operation.
> Now if you can wrap your own scanner and implement your next() it will help
> you to play with the kvs that you need.  So basically you can say what cols
> to include and what to exclude.
> Does this help you Varun?
> Regards
> Ram
> On Mon, Dec 10, 2012 at 7:28 PM, Varun Sharma <[EMAIL PROTECTED]> wrote:
> > Hi,
> >
> > My understanding of major compaction is that it rewrites one store file
> and
> > does a merge of the memstore, store files on disk and cleans out delete
> > tombstones and puts prior to them and cleans out excess versions. We want
> > to limit the number of columns per row in hbase. Also, we want to limit
> > them in lexicographically sorted order - which means we take the top, say
> > 100 smallest columns (in lexicographical sense) and only keep them while
> > discard the rest.
> >
> > One way to do this would be to clean out columns in a daily mapreduce
> job.
> > Or another way is to clean them out during the major compaction which can
> > be run daily too. I see, from the code that a major compaction
> essentially
> > invokes a Scan over the region - so if the Scan is invoked with the
> > appropriate filter (say ColumnCountGetFilter) - would that do the trick ?
> >
> > Thanks
> > Varun
> >