Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
HBase >> mail # user >> Filtering/Collection columns during Major Compaction


+
Varun Sharma 2012-12-10, 13:58
+
lars hofhansl 2012-12-11, 05:06
Copy link to this message
-
Re: Filtering/Collection columns during Major Compaction
So, I actually wrote something that uses the preCompactScannerOpen and
initialize a StoreScanner in exactly the same way as we do for a major
compaction. Except that I add the filter I need to this scanner
(ColumnPaginationFilter) - I guess that should accomplish the same thing.

On Mon, Dec 10, 2012 at 9:06 PM, lars hofhansl <[EMAIL PROTECTED]> wrote:

> You can replace (or post filter) the scanner used for the compaction using
> coprocessors.
> Take a look at RegionObserver.preCompact, which is passed a scanner that
> will iterate over all KVs that should make it into the new store file.
> You can now wrap this scanner and then any filtering you'd like to do.
>
>
>
> ________________________________
>  From: Varun Sharma <[EMAIL PROTECTED]>
> To: [EMAIL PROTECTED]
> Sent: Monday, December 10, 2012 5:58 AM
> Subject: Filtering/Collection columns during Major Compaction
>
> Hi,
>
> My understanding of major compaction is that it rewrites one store file and
> does a merge of the memstore, store files on disk and cleans out delete
> tombstones and puts prior to them and cleans out excess versions. We want
> to limit the number of columns per row in hbase. Also, we want to limit
> them in lexicographically sorted order - which means we take the top, say
> 100 smallest columns (in lexicographical sense) and only keep them while
> discard the rest.
>
> One way to do this would be to clean out columns in a daily mapreduce job.
> Or another way is to clean them out during the major compaction which can
> be run daily too. I see, from the code that a major compaction essentially
> invokes a Scan over the region - so if the Scan is invoked with the
> appropriate filter (say ColumnCountGetFilter) - would that do the trick ?
>
> Thanks
> Varun
>
+
ramkrishna vasudevan 2012-12-10, 14:08
+
Varun Sharma 2012-12-10, 14:59
+
Varun Sharma 2012-12-10, 15:29
+
lars hofhansl 2012-12-11, 05:09
+
Varun Sharma 2012-12-11, 07:04
+
lars hofhansl 2012-12-11, 07:19
+
Varun Sharma 2012-12-12, 00:51
+
lars hofhansl 2012-12-12, 01:58
+
Anoop Sam John 2012-12-11, 04:10
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB