Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
HBase, mail # user - Filtering/Collection columns during Major Compaction


+
Varun Sharma 2012-12-10, 13:58
+
lars hofhansl 2012-12-11, 05:06
Copy link to this message
-
Re: Filtering/Collection columns during Major Compaction
Varun Sharma 2012-12-11, 05:09
So, I actually wrote something that uses the preCompactScannerOpen and
initialize a StoreScanner in exactly the same way as we do for a major
compaction. Except that I add the filter I need to this scanner
(ColumnPaginationFilter) - I guess that should accomplish the same thing.

On Mon, Dec 10, 2012 at 9:06 PM, lars hofhansl <[EMAIL PROTECTED]> wrote:

> You can replace (or post filter) the scanner used for the compaction using
> coprocessors.
> Take a look at RegionObserver.preCompact, which is passed a scanner that
> will iterate over all KVs that should make it into the new store file.
> You can now wrap this scanner and then any filtering you'd like to do.
>
>
>
> ________________________________
>  From: Varun Sharma <[EMAIL PROTECTED]>
> To: [EMAIL PROTECTED]
> Sent: Monday, December 10, 2012 5:58 AM
> Subject: Filtering/Collection columns during Major Compaction
>
> Hi,
>
> My understanding of major compaction is that it rewrites one store file and
> does a merge of the memstore, store files on disk and cleans out delete
> tombstones and puts prior to them and cleans out excess versions. We want
> to limit the number of columns per row in hbase. Also, we want to limit
> them in lexicographically sorted order - which means we take the top, say
> 100 smallest columns (in lexicographical sense) and only keep them while
> discard the rest.
>
> One way to do this would be to clean out columns in a daily mapreduce job.
> Or another way is to clean them out during the major compaction which can
> be run daily too. I see, from the code that a major compaction essentially
> invokes a Scan over the region - so if the Scan is invoked with the
> appropriate filter (say ColumnCountGetFilter) - would that do the trick ?
>
> Thanks
> Varun
>
+
ramkrishna vasudevan 2012-12-10, 14:08
+
Varun Sharma 2012-12-10, 14:59
+
Varun Sharma 2012-12-10, 15:29
+
lars hofhansl 2012-12-11, 05:09
+
Varun Sharma 2012-12-11, 07:04
+
lars hofhansl 2012-12-11, 07:19
+
Varun Sharma 2012-12-12, 00:51
+
lars hofhansl 2012-12-12, 01:58
+
Anoop Sam John 2012-12-11, 04:10