Varun Sharma 2013-02-18, 09:57
Anoop Sam John 2013-02-18, 10:49
Viral Bajaria 2013-02-18, 10:49
Nicolas Liochon 2013-02-18, 10:56
ramkrishna vasudevan 2013-02-18, 11:07
Michael Segel 2013-02-18, 12:52
lars hofhansl 2013-02-19, 01:48
Varun Sharma 2013-02-19, 06:45
lars hofhansl 2013-02-19, 08:02
Nicolas Liochon 2013-02-19, 08:37
Varun Sharma 2013-02-19, 15:52
Nicolas Liochon 2013-02-19, 17:28
Varun Sharma 2013-02-19, 18:19
lars hofhansl 2013-02-19, 18:27
Nicolas Liochon 2013-02-19, 18:42
As well, an advantage of going only to the servers needed is the famous
MTTR: there are a less chance to go to a dead server or to a region that
On Tue, Feb 19, 2013 at 7:42 PM, Nicolas Liochon <[EMAIL PROTECTED]> wrote:
> Interesting, in the client we're doing a group by location the multiget.
> So we could have the filter as HBase core code, and then we could use it
> in the client for the multiget: compared to my initial proposal, we don't
> have to change anything in the server code and we reuse the filtering
> framework. The filter can be also be used independently.
> Is there any issue with this? The reseek seems to be quite smart in the
> way it handles the bloom filters, I don't know if it behaves differently in
> this case vs. a simple get.
> On Tue, Feb 19, 2013 at 7:27 PM, lars hofhansl <[EMAIL PROTECTED]> wrote:
>> I was thinking along the same lines. Doing a skip scan via filter
>> hinting. The problem is as you say that the Filter is instantiated
>> everywhere and it might be of significant size (have to maintain all row
>> keys you are looking for).
>> RegionScanner now a reseek method, it is possible to do this via a
>> coprocessor. They are also loaded per region (but at least not for each
>> store), and one can use the shared coproc state I added to alleviate the
>> memory concern.
>> Thinking about this in terms of multiple scan is interesting. One could
>> identify clusters of close row keys in the Gets and issue a Scan for each
>> -- Lars
>> From: Nicolas Liochon <[EMAIL PROTECTED]>
>> To: user <[EMAIL PROTECTED]>
>> Sent: Tuesday, February 19, 2013 9:28 AM
>> Subject: Re: Optimizing Multi Gets in hbase
>> Imho, the easiest thing to do would be to write a filter.
>> You need to order the rows, then you can use hints to navigate to the next
>> row (SEEK_NEXT_USING_HINT).
>> The main drawback I see is that the filter will be invoked on all regions
>> servers, including the ones that don't need it. But this would also means
>> you have a very specific query pattern (which could be the case, I just
>> don't know), and you can still use the startRow / stopRow of the scan, and
>> create multiple scan if necessary. I'm also interested in Lars' opinion on
>> On Tue, Feb 19, 2013 at 4:52 PM, Varun Sharma <[EMAIL PROTECTED]>
>> > I have another question, if I am running a scan wrapped around multiple
>> > rows in the same region, in the following way:
>> > Scan scan = new scan(getWithMultipleRowsInSameRegion);
>> > Now, how does execution occur. Is it just a sequential scan across the
>> > entire region or does it seek to hfile blocks containing the actual
>> > What I truly mean is, lets say the multi get is on following rows:
>> > Row1 : HFileBlock1
>> > Row2 : HFileBlock20
>> > Row3 : Does not exist
>> > Row4 : HFileBlock25
>> > Row5 : HFileBlock100
>> > The efficient way to do this would be to determine the correct blocks
>> > the index and then searching within the blocks for, say Row1. Then,
>> seek to
>> > HFileBlock20 and then look for Row2. Elimininate Row3 and then keep on
>> > seeking to + searching within HFileBlocks as needed.
>> > I am wondering if a scan wrapped around a Get with multiple rows would
>> > the same ?
>> > Thanks
>> > Varun
>> > On Tue, Feb 19, 2013 at 12:37 AM, Nicolas Liochon <[EMAIL PROTECTED]>
>> > wrote:
>> > > Looking at the code, it seems possible to do this server side within
>> > > multi invocation: we could group the get by region, and do a single
>> > > We could also add some heuristics if necessary...
>> > >
>> > >
>> > >
>> > > On Tue, Feb 19, 2013 at 9:02 AM, lars hofhansl <[EMAIL PROTECTED]>
>> > >
>> > > > I should qualify that statement, actually.
>> > > >
>> > > > I was comparing scanning 1m KVs to getting 1m KVs when all KVs are
>> > > > returned.