|
Varun Sharma
2012-12-10, 13:58
lars hofhansl
2012-12-11, 05:06
Varun Sharma
2012-12-11, 05:09
ramkrishna vasudevan
2012-12-10, 14:08
Varun Sharma
2012-12-10, 14:59
Varun Sharma
2012-12-10, 15:29
lars hofhansl
2012-12-11, 05:09
Varun Sharma
2012-12-11, 07:04
lars hofhansl
2012-12-11, 07:19
Varun Sharma
2012-12-12, 00:51
lars hofhansl
2012-12-12, 01:58
Anoop Sam John
2012-12-11, 04:10
|
-
Filtering/Collection columns during Major CompactionVarun Sharma 2012-12-10, 13:58
Hi,
My understanding of major compaction is that it rewrites one store file and does a merge of the memstore, store files on disk and cleans out delete tombstones and puts prior to them and cleans out excess versions. We want to limit the number of columns per row in hbase. Also, we want to limit them in lexicographically sorted order - which means we take the top, say 100 smallest columns (in lexicographical sense) and only keep them while discard the rest. One way to do this would be to clean out columns in a daily mapreduce job. Or another way is to clean them out during the major compaction which can be run daily too. I see, from the code that a major compaction essentially invokes a Scan over the region - so if the Scan is invoked with the appropriate filter (say ColumnCountGetFilter) - would that do the trick ? Thanks Varun +
Varun Sharma 2012-12-10, 13:58
-
Re: Filtering/Collection columns during Major Compactionlars hofhansl 2012-12-11, 05:06
You can replace (or post filter) the scanner used for the compaction using coprocessors.
Take a look at RegionObserver.preCompact, which is passed a scanner that will iterate over all KVs that should make it into the new store file. You can now wrap this scanner and then any filtering you'd like to do. ________________________________ From: Varun Sharma <[EMAIL PROTECTED]> To: [EMAIL PROTECTED] Sent: Monday, December 10, 2012 5:58 AM Subject: Filtering/Collection columns during Major Compaction Hi, My understanding of major compaction is that it rewrites one store file and does a merge of the memstore, store files on disk and cleans out delete tombstones and puts prior to them and cleans out excess versions. We want to limit the number of columns per row in hbase. Also, we want to limit them in lexicographically sorted order - which means we take the top, say 100 smallest columns (in lexicographical sense) and only keep them while discard the rest. One way to do this would be to clean out columns in a daily mapreduce job. Or another way is to clean them out during the major compaction which can be run daily too. I see, from the code that a major compaction essentially invokes a Scan over the region - so if the Scan is invoked with the appropriate filter (say ColumnCountGetFilter) - would that do the trick ? Thanks Varun +
lars hofhansl 2012-12-11, 05:06
-
Re: Filtering/Collection columns during Major CompactionVarun Sharma 2012-12-11, 05:09
So, I actually wrote something that uses the preCompactScannerOpen and
initialize a StoreScanner in exactly the same way as we do for a major compaction. Except that I add the filter I need to this scanner (ColumnPaginationFilter) - I guess that should accomplish the same thing. On Mon, Dec 10, 2012 at 9:06 PM, lars hofhansl <[EMAIL PROTECTED]> wrote: > You can replace (or post filter) the scanner used for the compaction using > coprocessors. > Take a look at RegionObserver.preCompact, which is passed a scanner that > will iterate over all KVs that should make it into the new store file. > You can now wrap this scanner and then any filtering you'd like to do. > > > > ________________________________ > From: Varun Sharma <[EMAIL PROTECTED]> > To: [EMAIL PROTECTED] > Sent: Monday, December 10, 2012 5:58 AM > Subject: Filtering/Collection columns during Major Compaction > > Hi, > > My understanding of major compaction is that it rewrites one store file and > does a merge of the memstore, store files on disk and cleans out delete > tombstones and puts prior to them and cleans out excess versions. We want > to limit the number of columns per row in hbase. Also, we want to limit > them in lexicographically sorted order - which means we take the top, say > 100 smallest columns (in lexicographical sense) and only keep them while > discard the rest. > > One way to do this would be to clean out columns in a daily mapreduce job. > Or another way is to clean them out during the major compaction which can > be run daily too. I see, from the code that a major compaction essentially > invokes a Scan over the region - so if the Scan is invoked with the > appropriate filter (say ColumnCountGetFilter) - would that do the trick ? > > Thanks > Varun > +
Varun Sharma 2012-12-11, 05:09
-
Re: Filtering/Collection columns during Major Compactionramkrishna vasudevan 2012-12-10, 14:08
Hi Varun
If you are using 0.94 version you have a coprocessor that is getting invoked before and after Compaction selection. preCompactScannerOpen() helps you to create your own scanner which actually does the next() operation. Now if you can wrap your own scanner and implement your next() it will help you to play with the kvs that you need. So basically you can say what cols to include and what to exclude. Does this help you Varun? Regards Ram On Mon, Dec 10, 2012 at 7:28 PM, Varun Sharma <[EMAIL PROTECTED]> wrote: > Hi, > > My understanding of major compaction is that it rewrites one store file and > does a merge of the memstore, store files on disk and cleans out delete > tombstones and puts prior to them and cleans out excess versions. We want > to limit the number of columns per row in hbase. Also, we want to limit > them in lexicographically sorted order - which means we take the top, say > 100 smallest columns (in lexicographical sense) and only keep them while > discard the rest. > > One way to do this would be to clean out columns in a daily mapreduce job. > Or another way is to clean them out during the major compaction which can > be run daily too. I see, from the code that a major compaction essentially > invokes a Scan over the region - so if the Scan is invoked with the > appropriate filter (say ColumnCountGetFilter) - would that do the trick ? > > Thanks > Varun > +
ramkrishna vasudevan 2012-12-10, 14:08
-
Re: Filtering/Collection columns during Major CompactionVarun Sharma 2012-12-10, 14:59
Thanks ! This is exactly what I need. I am looking at the code in
compactStore() under Store.java but I am trying to understand why, for the real compaction - smallestReadPoint needs to be passed - I thought the read point was a memstore only thing. Also the preCompactScannerOpen does not have a way of passing this value. Varun On Mon, Dec 10, 2012 at 6:08 AM, ramkrishna vasudevan < [EMAIL PROTECTED]> wrote: > Hi Varun > > If you are using 0.94 version you have a coprocessor that is getting > invoked before and after Compaction selection. > preCompactScannerOpen() helps you to create your own scanner which actually > does the next() operation. > Now if you can wrap your own scanner and implement your next() it will help > you to play with the kvs that you need. So basically you can say what cols > to include and what to exclude. > Does this help you Varun? > > Regards > Ram > > On Mon, Dec 10, 2012 at 7:28 PM, Varun Sharma <[EMAIL PROTECTED]> wrote: > > > Hi, > > > > My understanding of major compaction is that it rewrites one store file > and > > does a merge of the memstore, store files on disk and cleans out delete > > tombstones and puts prior to them and cleans out excess versions. We want > > to limit the number of columns per row in hbase. Also, we want to limit > > them in lexicographically sorted order - which means we take the top, say > > 100 smallest columns (in lexicographical sense) and only keep them while > > discard the rest. > > > > One way to do this would be to clean out columns in a daily mapreduce > job. > > Or another way is to clean them out during the major compaction which can > > be run daily too. I see, from the code that a major compaction > essentially > > invokes a Scan over the region - so if the Scan is invoked with the > > appropriate filter (say ColumnCountGetFilter) - would that do the trick ? > > > > Thanks > > Varun > > > +
Varun Sharma 2012-12-10, 14:59
-
Re: Filtering/Collection columns during Major CompactionVarun Sharma 2012-12-10, 15:29
Okay - I looked more thoroughly again - I should be able to extract these
from the region observer. Thanks ! On Mon, Dec 10, 2012 at 6:59 AM, Varun Sharma <[EMAIL PROTECTED]> wrote: > Thanks ! This is exactly what I need. I am looking at the code in > compactStore() under Store.java but I am trying to understand why, for the > real compaction - smallestReadPoint needs to be passed - I thought the read > point was a memstore only thing. Also the preCompactScannerOpen does not > have a way of passing this value. > > Varun > > > On Mon, Dec 10, 2012 at 6:08 AM, ramkrishna vasudevan < > [EMAIL PROTECTED]> wrote: > >> Hi Varun >> >> If you are using 0.94 version you have a coprocessor that is getting >> invoked before and after Compaction selection. >> preCompactScannerOpen() helps you to create your own scanner which >> actually >> does the next() operation. >> Now if you can wrap your own scanner and implement your next() it will >> help >> you to play with the kvs that you need. So basically you can say what >> cols >> to include and what to exclude. >> Does this help you Varun? >> >> Regards >> Ram >> >> On Mon, Dec 10, 2012 at 7:28 PM, Varun Sharma <[EMAIL PROTECTED]> >> wrote: >> >> > Hi, >> > >> > My understanding of major compaction is that it rewrites one store file >> and >> > does a merge of the memstore, store files on disk and cleans out delete >> > tombstones and puts prior to them and cleans out excess versions. We >> want >> > to limit the number of columns per row in hbase. Also, we want to limit >> > them in lexicographically sorted order - which means we take the top, >> say >> > 100 smallest columns (in lexicographical sense) and only keep them while >> > discard the rest. >> > >> > One way to do this would be to clean out columns in a daily mapreduce >> job. >> > Or another way is to clean them out during the major compaction which >> can >> > be run daily too. I see, from the code that a major compaction >> essentially >> > invokes a Scan over the region - so if the Scan is invoked with the >> > appropriate filter (say ColumnCountGetFilter) - would that do the trick >> ? >> > >> > Thanks >> > Varun >> > >> > > +
Varun Sharma 2012-12-10, 15:29
-
Re: Filtering/Collection columns during Major Compactionlars hofhansl 2012-12-11, 05:09
In your case you probably just want to filter on top of the provided scanner with preCompact (rather than actually replacing the scanner, which preCompactScannerOpen does).
(And sorry I only saw this reply after I sent my own reply to your initial question.) ________________________________ From: Varun Sharma <[EMAIL PROTECTED]> To: [EMAIL PROTECTED] Sent: Monday, December 10, 2012 7:29 AM Subject: Re: Filtering/Collection columns during Major Compaction Okay - I looked more thoroughly again - I should be able to extract these from the region observer. Thanks ! On Mon, Dec 10, 2012 at 6:59 AM, Varun Sharma <[EMAIL PROTECTED]> wrote: > Thanks ! This is exactly what I need. I am looking at the code in > compactStore() under Store.java but I am trying to understand why, for the > real compaction - smallestReadPoint needs to be passed - I thought the read > point was a memstore only thing. Also the preCompactScannerOpen does not > have a way of passing this value. > > Varun > > > On Mon, Dec 10, 2012 at 6:08 AM, ramkrishna vasudevan < > [EMAIL PROTECTED]> wrote: > >> Hi Varun >> >> If you are using 0.94 version you have a coprocessor that is getting >> invoked before and after Compaction selection. >> preCompactScannerOpen() helps you to create your own scanner which >> actually >> does the next() operation. >> Now if you can wrap your own scanner and implement your next() it will >> help >> you to play with the kvs that you need. So basically you can say what >> cols >> to include and what to exclude. >> Does this help you Varun? >> >> Regards >> Ram >> >> On Mon, Dec 10, 2012 at 7:28 PM, Varun Sharma <[EMAIL PROTECTED]> >> wrote: >> >> > Hi, >> > >> > My understanding of major compaction is that it rewrites one store file >> and >> > does a merge of the memstore, store files on disk and cleans out delete >> > tombstones and puts prior to them and cleans out excess versions. We >> want >> > to limit the number of columns per row in hbase. Also, we want to limit >> > them in lexicographically sorted order - which means we take the top, >> say >> > 100 smallest columns (in lexicographical sense) and only keep them while >> > discard the rest. >> > >> > One way to do this would be to clean out columns in a daily mapreduce >> job. >> > Or another way is to clean them out during the major compaction which >> can >> > be run daily too. I see, from the code that a major compaction >> essentially >> > invokes a Scan over the region - so if the Scan is invoked with the >> > appropriate filter (say ColumnCountGetFilter) - would that do the trick >> ? >> > >> > Thanks >> > Varun >> > >> > > +
lars hofhansl 2012-12-11, 05:09
-
Re: Filtering/Collection columns during Major CompactionVarun Sharma 2012-12-11, 07:04
Hi Lars,
In my case, I just want to use ColumnPaginationFilter() rather than implementing my own logic for filter. Is there an easy way to apply this filter on top of an existing scanner ? Do I do something like RegionScannerImpl scanner = new RegionScannerImpl(scan_with_my_filter, original_compaction_scanner) Thanks Varun On Mon, Dec 10, 2012 at 9:09 PM, lars hofhansl <[EMAIL PROTECTED]> wrote: > In your case you probably just want to filter on top of the provided > scanner with preCompact (rather than actually replacing the scanner, which > preCompactScannerOpen does). > > (And sorry I only saw this reply after I sent my own reply to your initial > question.) > > > > ________________________________ > From: Varun Sharma <[EMAIL PROTECTED]> > To: [EMAIL PROTECTED] > Sent: Monday, December 10, 2012 7:29 AM > Subject: Re: Filtering/Collection columns during Major Compaction > > Okay - I looked more thoroughly again - I should be able to extract these > from the region observer. > > Thanks ! > > On Mon, Dec 10, 2012 at 6:59 AM, Varun Sharma <[EMAIL PROTECTED]> wrote: > > > Thanks ! This is exactly what I need. I am looking at the code in > > compactStore() under Store.java but I am trying to understand why, for > the > > real compaction - smallestReadPoint needs to be passed - I thought the > read > > point was a memstore only thing. Also the preCompactScannerOpen does not > > have a way of passing this value. > > > > Varun > > > > > > On Mon, Dec 10, 2012 at 6:08 AM, ramkrishna vasudevan < > > [EMAIL PROTECTED]> wrote: > > > >> Hi Varun > >> > >> If you are using 0.94 version you have a coprocessor that is getting > >> invoked before and after Compaction selection. > >> preCompactScannerOpen() helps you to create your own scanner which > >> actually > >> does the next() operation. > >> Now if you can wrap your own scanner and implement your next() it will > >> help > >> you to play with the kvs that you need. So basically you can say what > >> cols > >> to include and what to exclude. > >> Does this help you Varun? > >> > >> Regards > >> Ram > >> > >> On Mon, Dec 10, 2012 at 7:28 PM, Varun Sharma <[EMAIL PROTECTED]> > >> wrote: > >> > >> > Hi, > >> > > >> > My understanding of major compaction is that it rewrites one store > file > >> and > >> > does a merge of the memstore, store files on disk and cleans out > delete > >> > tombstones and puts prior to them and cleans out excess versions. We > >> want > >> > to limit the number of columns per row in hbase. Also, we want to > limit > >> > them in lexicographically sorted order - which means we take the top, > >> say > >> > 100 smallest columns (in lexicographical sense) and only keep them > while > >> > discard the rest. > >> > > >> > One way to do this would be to clean out columns in a daily mapreduce > >> job. > >> > Or another way is to clean them out during the major compaction which > >> can > >> > be run daily too. I see, from the code that a major compaction > >> essentially > >> > invokes a Scan over the region - so if the Scan is invoked with the > >> > appropriate filter (say ColumnCountGetFilter) - would that do the > trick > >> ? > >> > > >> > Thanks > >> > Varun > >> > > >> > > > > > +
Varun Sharma 2012-12-11, 07:04
-
Re: Filtering/Collection columns during Major Compactionlars hofhansl 2012-12-11, 07:19
Filters do not work for compactions. We only support them for user scans.
(some of them might incidentally work, but that is entirely untested and unsupported) You best bet is to use the preCompact hook and return a wrapper scanner like so: public InternalScanner preCompact(ObserverContext<RegionCoprocessorEnvironment> e, Store store, final InternalScanner scanner) { return new InternalScanner() { public boolean next(List<KeyValue> results) throws IOException { return next(results, -1); } public boolean next(List<KeyValue> results, String metric) throws IOException { return next(results, -1, metric); } public boolean next(List<KeyValue> results, int limit) throws IOException{ return next(results, limit, null); } public boolean next(List<KeyValue> results, int limit, String metric) throws IOException { // call next on the passed scanner // do your filtering here } public void close() throws IOException { scanner.close(); } }; } -- Lars ________________________________ From: Varun Sharma <[EMAIL PROTECTED]> To: [EMAIL PROTECTED]; lars hofhansl <[EMAIL PROTECTED]> Sent: Monday, December 10, 2012 11:04 PM Subject: Re: Filtering/Collection columns during Major Compaction Hi Lars, In my case, I just want to use ColumnPaginationFilter() rather than implementing my own logic for filter. Is there an easy way to apply this filter on top of an existing scanner ? Do I do something like RegionScannerImpl scanner = new RegionScannerImpl(scan_with_my_filter, original_compaction_scanner) Thanks Varun On Mon, Dec 10, 2012 at 9:09 PM, lars hofhansl <[EMAIL PROTECTED]> wrote: > In your case you probably just want to filter on top of the provided > scanner with preCompact (rather than actually replacing the scanner, which > preCompactScannerOpen does). > > (And sorry I only saw this reply after I sent my own reply to your initial > question.) > > > > ________________________________ > From: Varun Sharma <[EMAIL PROTECTED]> > To: [EMAIL PROTECTED] > Sent: Monday, December 10, 2012 7:29 AM > Subject: Re: Filtering/Collection columns during Major Compaction > > Okay - I looked more thoroughly again - I should be able to extract these > from the region observer. > > Thanks ! > > On Mon, Dec 10, 2012 at 6:59 AM, Varun Sharma <[EMAIL PROTECTED]> wrote: > > > Thanks ! This is exactly what I need. I am looking at the code in > > compactStore() under Store.java but I am trying to understand why, for > the > > real compaction - smallestReadPoint needs to be passed - I thought the > read > > point was a memstore only thing. Also the preCompactScannerOpen does not > > have a way of passing this value. > > > > Varun > > > > > > On Mon, Dec 10, 2012 at 6:08 AM, ramkrishna vasudevan < > > [EMAIL PROTECTED]> wrote: > > > >> Hi Varun > >> > >> If you are using 0.94 version you have a coprocessor that is getting > >> invoked before and after Compaction selection. > >> preCompactScannerOpen() helps you to create your own scanner which > >> actually > >> does the next() operation. > >> Now if you can wrap your own scanner and implement your next() it will > >> help > >> you to play with the kvs that you need. So basically you can say what > >> cols > >> to include and what to exclude. > >> Does this help you Varun? > >> > >> Regards > >> Ram > >> > >> On Mon, Dec 10, 2012 at 7:28 PM, Varun Sharma <[EMAIL PROTECTED]> > >> wrote: > >> > >> > Hi, > >> > > >> > My understanding of major compaction is that it rewrites one store > file > >> and > >> > does a merge of the memstore, store files on disk and cleans out > delete > >> > tombstones and puts prior to them and cleans out excess versions. We > >> want > >> > to limit the number of columns per row in hbase. Also, we want to > limit > >> > them in lexicographically sorted order - which means we take the top, +
lars hofhansl 2012-12-11, 07:19
-
Re: Filtering/Collection columns during Major CompactionVarun Sharma 2012-12-12, 00:51
Hi Lars,
Thanks for the detailed tip - we will go down that path. Looking at the javadoc for InternalScanner.next() - it says grab the next row's values - is this rows in the hbase sense or are these rows in the HFile - I suspect it is the latter ? Thanks ! On Mon, Dec 10, 2012 at 11:19 PM, lars hofhansl <[EMAIL PROTECTED]> wrote: > Filters do not work for compactions. We only support them for user scans. > (some of them might incidentally work, but that is entirely untested and > unsupported) > > You best bet is to use the preCompact hook and return a wrapper scanner > like so: > > public InternalScanner > preCompact(ObserverContext<RegionCoprocessorEnvironment> e, > Store store, final InternalScanner scanner) { > return new InternalScanner() { > public boolean next(List<KeyValue> results) throws IOException { > return next(results, -1); > } > public boolean next(List<KeyValue> results, String metric) > throws IOException { > return next(results, -1, metric); > } > public boolean next(List<KeyValue> results, int limit) > throws IOException{ > return next(results, limit, null); > } > public boolean next(List<KeyValue> results, int limit, String > metric) > throws IOException { > > // call next on the passed scanner > // do your filtering here > } > > public void close() throws IOException { > scanner.close(); > } > }; > } > > -- Lars > > > > ________________________________ > From: Varun Sharma <[EMAIL PROTECTED]> > To: [EMAIL PROTECTED]; lars hofhansl <[EMAIL PROTECTED]> > Sent: Monday, December 10, 2012 11:04 PM > Subject: Re: Filtering/Collection columns during Major Compaction > > Hi Lars, > > In my case, I just want to use ColumnPaginationFilter() rather than > implementing my own logic for filter. Is there an easy way to apply this > filter on top of an existing scanner ? Do I do something like > > RegionScannerImpl scanner = new RegionScannerImpl(scan_with_my_filter, > original_compaction_scanner) > > Thanks > Varun > > On Mon, Dec 10, 2012 at 9:09 PM, lars hofhansl <[EMAIL PROTECTED]> > wrote: > > > In your case you probably just want to filter on top of the provided > > scanner with preCompact (rather than actually replacing the scanner, > which > > preCompactScannerOpen does). > > > > (And sorry I only saw this reply after I sent my own reply to your > initial > > question.) > > > > > > > > ________________________________ > > From: Varun Sharma <[EMAIL PROTECTED]> > > To: [EMAIL PROTECTED] > > Sent: Monday, December 10, 2012 7:29 AM > > Subject: Re: Filtering/Collection columns during Major Compaction > > > > Okay - I looked more thoroughly again - I should be able to extract these > > from the region observer. > > > > Thanks ! > > > > On Mon, Dec 10, 2012 at 6:59 AM, Varun Sharma <[EMAIL PROTECTED]> > wrote: > > > > > Thanks ! This is exactly what I need. I am looking at the code in > > > compactStore() under Store.java but I am trying to understand why, for > > the > > > real compaction - smallestReadPoint needs to be passed - I thought the > > read > > > point was a memstore only thing. Also the preCompactScannerOpen does > not > > > have a way of passing this value. > > > > > > Varun > > > > > > > > > On Mon, Dec 10, 2012 at 6:08 AM, ramkrishna vasudevan < > > > [EMAIL PROTECTED]> wrote: > > > > > >> Hi Varun > > >> > > >> If you are using 0.94 version you have a coprocessor that is getting > > >> invoked before and after Compaction selection. > > >> preCompactScannerOpen() helps you to create your own scanner which > > >> actually > > >> does the next() operation. > > >> Now if you can wrap your own scanner and implement your next() it will > > >> help > > >> you to play with the kvs that you need. So basically you can say what > > >> cols > > >> to include and what to exclude. +
Varun Sharma 2012-12-12, 00:51
-
Re: Filtering/Collection columns during Major Compactionlars hofhansl 2012-12-12, 01:58
In this case on each iteration you should get all KeyValues (KVs) for all columns in this column family for a single row.
i.e. each KV should have the same rowkey. -- Lars ________________________________ From: Varun Sharma <[EMAIL PROTECTED]> To: [EMAIL PROTECTED]; lars hofhansl <[EMAIL PROTECTED]> Sent: Tuesday, December 11, 2012 4:51 PM Subject: Re: Filtering/Collection columns during Major Compaction Hi Lars, Thanks for the detailed tip - we will go down that path. Looking at the javadoc for InternalScanner.next() - it says grab the next row's values - is this rows in the hbase sense or are these rows in the HFile - I suspect it is the latter ? Thanks ! On Mon, Dec 10, 2012 at 11:19 PM, lars hofhansl <[EMAIL PROTECTED]> wrote: > Filters do not work for compactions. We only support them for user scans. > (some of them might incidentally work, but that is entirely untested and > unsupported) > > You best bet is to use the preCompact hook and return a wrapper scanner > like so: > > public InternalScanner > preCompact(ObserverContext<RegionCoprocessorEnvironment> e, > Store store, final InternalScanner scanner) { > return new InternalScanner() { > public boolean next(List<KeyValue> results) throws IOException { > return next(results, -1); > } > public boolean next(List<KeyValue> results, String metric) > throws IOException { > return next(results, -1, metric); > } > public boolean next(List<KeyValue> results, int limit) > throws IOException{ > return next(results, limit, null); > } > public boolean next(List<KeyValue> results, int limit, String > metric) > throws IOException { > > // call next on the passed scanner > // do your filtering here > } > > public void close() throws IOException { > scanner.close(); > } > }; > } > > -- Lars > > > > ________________________________ > From: Varun Sharma <[EMAIL PROTECTED]> > To: [EMAIL PROTECTED]; lars hofhansl <[EMAIL PROTECTED]> > Sent: Monday, December 10, 2012 11:04 PM > Subject: Re: Filtering/Collection columns during Major Compaction > > Hi Lars, > > In my case, I just want to use ColumnPaginationFilter() rather than > implementing my own logic for filter. Is there an easy way to apply this > filter on top of an existing scanner ? Do I do something like > > RegionScannerImpl scanner = new RegionScannerImpl(scan_with_my_filter, > original_compaction_scanner) > > Thanks > Varun > > On Mon, Dec 10, 2012 at 9:09 PM, lars hofhansl <[EMAIL PROTECTED]> > wrote: > > > In your case you probably just want to filter on top of the provided > > scanner with preCompact (rather than actually replacing the scanner, > which > > preCompactScannerOpen does). > > > > (And sorry I only saw this reply after I sent my own reply to your > initial > > question.) > > > > > > > > ________________________________ > > From: Varun Sharma <[EMAIL PROTECTED]> > > To: [EMAIL PROTECTED] > > Sent: Monday, December 10, 2012 7:29 AM > > Subject: Re: Filtering/Collection columns during Major Compaction > > > > Okay - I looked more thoroughly again - I should be able to extract these > > from the region observer. > > > > Thanks ! > > > > On Mon, Dec 10, 2012 at 6:59 AM, Varun Sharma <[EMAIL PROTECTED]> > wrote: > > > > > Thanks ! This is exactly what I need. I am looking at the code in > > > compactStore() under Store.java but I am trying to understand why, for > > the > > > real compaction - smallestReadPoint needs to be passed - I thought the > > read > > > point was a memstore only thing. Also the preCompactScannerOpen does > not > > > have a way of passing this value. > > > > > > Varun > > > > > > > > > On Mon, Dec 10, 2012 at 6:08 AM, ramkrishna vasudevan < > > > [EMAIL PROTECTED]> wrote: > > > > > >> Hi Varun > > >> > > >> If you are using 0.94 version you have a coprocessor that is getting +
lars hofhansl 2012-12-12, 01:58
-
RE: Filtering/Collection columns during Major CompactionAnoop Sam John 2012-12-11, 04:10
Hi Varun
>but I am trying to understand why, for the > real compaction - smallestReadPoint needs to be passed - I thought the read > point was a memstore only thing No this will be needed not only for memstore. In between the scan the memstore can get flushed... That is why the MVCC ts is also getting written to the HFile. Hope the reply from Ram helped you in doing what you want. If you are facing any issues pls let us know. We have done this already using the CP hooks. Thanks to Lars H for this new hooks :) Very useful... -Anoop- ________________________________________ From: Varun Sharma [[EMAIL PROTECTED]] Sent: Monday, December 10, 2012 8:59 PM To: [EMAIL PROTECTED] Subject: Re: Filtering/Collection columns during Major Compaction Okay - I looked more thoroughly again - I should be able to extract these from the region observer. Thanks ! On Mon, Dec 10, 2012 at 6:59 AM, Varun Sharma <[EMAIL PROTECTED]> wrote: > Thanks ! This is exactly what I need. I am looking at the code in > compactStore() under Store.java but I am trying to understand why, for the > real compaction - smallestReadPoint needs to be passed - I thought the read > point was a memstore only thing. Also the preCompactScannerOpen does not > have a way of passing this value. > > Varun > > > On Mon, Dec 10, 2012 at 6:08 AM, ramkrishna vasudevan < > [EMAIL PROTECTED]> wrote: > >> Hi Varun >> >> If you are using 0.94 version you have a coprocessor that is getting >> invoked before and after Compaction selection. >> preCompactScannerOpen() helps you to create your own scanner which >> actually >> does the next() operation. >> Now if you can wrap your own scanner and implement your next() it will >> help >> you to play with the kvs that you need. So basically you can say what >> cols >> to include and what to exclude. >> Does this help you Varun? >> >> Regards >> Ram >> >> On Mon, Dec 10, 2012 at 7:28 PM, Varun Sharma <[EMAIL PROTECTED]> >> wrote: >> >> > Hi, >> > >> > My understanding of major compaction is that it rewrites one store file >> and >> > does a merge of the memstore, store files on disk and cleans out delete >> > tombstones and puts prior to them and cleans out excess versions. We >> want >> > to limit the number of columns per row in hbase. Also, we want to limit >> > them in lexicographically sorted order - which means we take the top, >> say >> > 100 smallest columns (in lexicographical sense) and only keep them while >> > discard the rest. >> > >> > One way to do this would be to clean out columns in a daily mapreduce >> job. >> > Or another way is to clean them out during the major compaction which >> can >> > be run daily too. I see, from the code that a major compaction >> essentially >> > invokes a Scan over the region - so if the Scan is invoked with the >> > appropriate filter (say ColumnCountGetFilter) - would that do the trick >> ? >> > >> > Thanks >> > Varun >> > >> > > +
Anoop Sam John 2012-12-11, 04:10
|