Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> Filtering/Collection columns during Major Compaction


Copy link to this message
-
RE: Filtering/Collection columns during Major Compaction
Hi Varun

>but I am trying to understand why, for the
> real compaction - smallestReadPoint needs to be passed - I thought the read
> point was a memstore only thing

No this will be needed not only for memstore. In between the scan the memstore can get flushed... That is why the MVCC  ts is also getting written to the HFile.
Hope the reply from Ram helped you in doing what you want. If you are facing any issues pls let us know. We have done this already using the CP hooks. Thanks to Lars H for this new hooks :) Very useful...

-Anoop-
________________________________________
From: Varun Sharma [[EMAIL PROTECTED]]
Sent: Monday, December 10, 2012 8:59 PM
To: [EMAIL PROTECTED]
Subject: Re: Filtering/Collection columns during Major Compaction

Okay - I looked more thoroughly again - I should be able to extract these
from the region observer.

Thanks !

On Mon, Dec 10, 2012 at 6:59 AM, Varun Sharma <[EMAIL PROTECTED]> wrote:

> Thanks ! This is exactly what I need. I am looking at the code in
> compactStore() under Store.java but I am trying to understand why, for the
> real compaction - smallestReadPoint needs to be passed - I thought the read
> point was a memstore only thing. Also the preCompactScannerOpen does not
> have a way of passing this value.
>
> Varun
>
>
> On Mon, Dec 10, 2012 at 6:08 AM, ramkrishna vasudevan <
> [EMAIL PROTECTED]> wrote:
>
>> Hi Varun
>>
>> If you are using 0.94 version you have a coprocessor that is getting
>> invoked before and after Compaction selection.
>> preCompactScannerOpen() helps you to create your own scanner which
>> actually
>> does the next() operation.
>> Now if you can wrap your own scanner and implement your next() it will
>> help
>> you to play with the kvs that you need.  So basically you can say what
>> cols
>> to include and what to exclude.
>> Does this help you Varun?
>>
>> Regards
>> Ram
>>
>> On Mon, Dec 10, 2012 at 7:28 PM, Varun Sharma <[EMAIL PROTECTED]>
>> wrote:
>>
>> > Hi,
>> >
>> > My understanding of major compaction is that it rewrites one store file
>> and
>> > does a merge of the memstore, store files on disk and cleans out delete
>> > tombstones and puts prior to them and cleans out excess versions. We
>> want
>> > to limit the number of columns per row in hbase. Also, we want to limit
>> > them in lexicographically sorted order - which means we take the top,
>> say
>> > 100 smallest columns (in lexicographical sense) and only keep them while
>> > discard the rest.
>> >
>> > One way to do this would be to clean out columns in a daily mapreduce
>> job.
>> > Or another way is to clean them out during the major compaction which
>> can
>> > be run daily too. I see, from the code that a major compaction
>> essentially
>> > invokes a Scan over the region - so if the Scan is invoked with the
>> > appropriate filter (say ColumnCountGetFilter) - would that do the trick
>> ?
>> >
>> > Thanks
>> > Varun
>> >
>>
>
>