We use the index blocks to find the right block (we have 64k blocks by default). Once we found the block we a linear search for the KV we're looking for.
In your example, you'd find the first block that contains a KV for row1c and then seek into that block until you found your KV. We cannot do binary search, since all KVs can have different sizes, so we have to decode their length one by one in order to find the next. In my test, this linear search has not shown as a performance concern.
If you scan along very wide rows and but you select only a few column we typically skipscan directly to the next row after the last interesting column was hit.
(And now looking at the code I see that we can probably optimize wide rows with seek hints - we know the next column, so we can seek directly to that one, rather than going considering every column).
----- Original Message -----
From: Varun Sharma <[EMAIL PROTECTED]>
To: [EMAIL PROTECTED]; Michael Stack <[EMAIL PROTECTED]>
Sent: Thursday, May 16, 2013 3:26 PM
Subject: Re: Question about HFile seeking
Referring to your comment above again
"If you doing a prefix scan w/ row1c, we should be starting the scan at
row1c, not row1 (or more correctly at the row that starts the block we
believe has a row1c row in it...)"
I am trying to understand how you could seek right across to the block
containing "row1c" using the HFile Index. If the index is just built on
HFile keys and there is no demarcation b/w rows and col(s), you would hit
the block for "row1,col1". After that you would either need a way to skip
right across to "row1c" after you find that this is not the row you are
looking for or you will have to simply keep scanning and discarding
sequentially until you get "row1c". If you have to keep scanning and
discarding, then that is probably suboptimal. But if there is a way to skip
right across from "row1,col1" to "row1c", then thats great, though I wonder
how that would be implemented.
On Thu, May 16, 2013 at 2:55 PM, Varun Sharma <[EMAIL PROTECTED]> wrote:
> Sorry I may have misunderstood what you meant.
> When you look for "row1c" in the HFile index - is it going to also match
> for "row1,col1" or only match "row1c". It all depends how the index is
> organized, if its only on HFile keys, it could also match row1,col1 unless
> we use some demarcator b/w row1 and col1 in our HFile keys. So I am just
> wondering if we will totally skip touch row1,col1 in this case and jump
> straight to row1c or not. The other option is that we would actually hit
> row1,col1 since the prefix matches row1c when looking at the HFile key and
> then, we look at the length of the row to grab the real portion from the
> concatenated HFile key and discard all row1 entries.
> Does that make my query clearer ?
> On Thu, May 16, 2013 at 2:42 PM, Varun Sharma <[EMAIL PROTECTED]> wrote:
>> Nothing, I am just curious...
>> So, we will do a bunch of wasteful scanning - that's lets say row1 has
>> col1 - col100000 - basically 100K columns, we will scan all those key
>> values even though we are going to discard them, is that correct ?
>> On Thu, May 16, 2013 at 2:30 PM, Stack <[EMAIL PROTECTED]> wrote:
>>> What you seeing Varun (or think you are seeing)?
>>> On Thu, May 16, 2013 at 2:30 PM, Stack <[EMAIL PROTECTED]> wrote:
>>> > On Thu, May 16, 2013 at 2:03 PM, Varun Sharma <[EMAIL PROTECTED]>
>>> >> Or do we use some kind of demarcator b/w rows and columns and
>>> >> when building the HFile keys and the indices ?
>>> > No demarcation but in KeyValue, we keep row, column family name, column
>>> > family qualifier, etc., lengths and offsets so the comparators on ly
>>> > compare pertinent bytes.
>>> > If you doing a prefix scan w/ row1c, we should be starting the scan at
>>> > row1c, not row1 (or more correctly at the row that starts the block we
>>> > believe has a row1c row in it...).