Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
HBase, mail # user - Filtering/Collection columns during Major Compaction


+
Varun Sharma 2012-12-10, 13:58
+
lars hofhansl 2012-12-11, 05:06
+
Varun Sharma 2012-12-11, 05:09
+
ramkrishna vasudevan 2012-12-10, 14:08
+
Varun Sharma 2012-12-10, 14:59
+
Varun Sharma 2012-12-10, 15:29
Copy link to this message
-
Re: Filtering/Collection columns during Major Compaction
lars hofhansl 2012-12-11, 05:09
In your case you probably just want to filter on top of the provided scanner with preCompact (rather than actually replacing the scanner, which preCompactScannerOpen does). 

(And sorry I only saw this reply after I sent my own reply to your initial question.)

________________________________
 From: Varun Sharma <[EMAIL PROTECTED]>
To: [EMAIL PROTECTED]
Sent: Monday, December 10, 2012 7:29 AM
Subject: Re: Filtering/Collection columns during Major Compaction
 
Okay - I looked more thoroughly again - I should be able to extract these
from the region observer.

Thanks !

On Mon, Dec 10, 2012 at 6:59 AM, Varun Sharma <[EMAIL PROTECTED]> wrote:

> Thanks ! This is exactly what I need. I am looking at the code in
> compactStore() under Store.java but I am trying to understand why, for the
> real compaction - smallestReadPoint needs to be passed - I thought the read
> point was a memstore only thing. Also the preCompactScannerOpen does not
> have a way of passing this value.
>
> Varun
>
>
> On Mon, Dec 10, 2012 at 6:08 AM, ramkrishna vasudevan <
> [EMAIL PROTECTED]> wrote:
>
>> Hi Varun
>>
>> If you are using 0.94 version you have a coprocessor that is getting
>> invoked before and after Compaction selection.
>> preCompactScannerOpen() helps you to create your own scanner which
>> actually
>> does the next() operation.
>> Now if you can wrap your own scanner and implement your next() it will
>> help
>> you to play with the kvs that you need.  So basically you can say what
>> cols
>> to include and what to exclude.
>> Does this help you Varun?
>>
>> Regards
>> Ram
>>
>> On Mon, Dec 10, 2012 at 7:28 PM, Varun Sharma <[EMAIL PROTECTED]>
>> wrote:
>>
>> > Hi,
>> >
>> > My understanding of major compaction is that it rewrites one store file
>> and
>> > does a merge of the memstore, store files on disk and cleans out delete
>> > tombstones and puts prior to them and cleans out excess versions. We
>> want
>> > to limit the number of columns per row in hbase. Also, we want to limit
>> > them in lexicographically sorted order - which means we take the top,
>> say
>> > 100 smallest columns (in lexicographical sense) and only keep them while
>> > discard the rest.
>> >
>> > One way to do this would be to clean out columns in a daily mapreduce
>> job.
>> > Or another way is to clean them out during the major compaction which
>> can
>> > be run daily too. I see, from the code that a major compaction
>> essentially
>> > invokes a Scan over the region - so if the Scan is invoked with the
>> > appropriate filter (say ColumnCountGetFilter) - would that do the trick
>> ?
>> >
>> > Thanks
>> > Varun
>> >
>>
>
>
+
Varun Sharma 2012-12-11, 07:04
+
lars hofhansl 2012-12-11, 07:19
+
Varun Sharma 2012-12-12, 00:51
+
lars hofhansl 2012-12-12, 01:58
+
Anoop Sam John 2012-12-11, 04:10