Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> Question about HFile seeking


Copy link to this message
-
Re: Question about HFile seeking
Referring to your comment above again

"If you doing a prefix scan w/ row1c, we should be starting the scan at
row1c, not row1 (or more correctly at the row that starts the block we
believe has a row1c row in it...)"

I am trying to understand how you could seek right across to the block
containing "row1c" using the HFile Index. If the index is just built on
HFile keys and there is no demarcation b/w rows and col(s), you would hit
the block for "row1,col1". After that you would either need a way to skip
right across to "row1c" after you find that this is not the row you are
looking for or you will have to simply keep scanning and discarding
sequentially until you get "row1c". If you have to keep scanning and
discarding, then that is probably suboptimal. But if there is a way to skip
right across from "row1,col1" to "row1c", then thats great, though I wonder
how that would be implemented.

Varun

On Thu, May 16, 2013 at 2:55 PM, Varun Sharma <[EMAIL PROTECTED]> wrote:

> Sorry I may have misunderstood what you meant.
>
> When you look for "row1c" in the HFile index - is it going to also match
> for "row1,col1" or only match "row1c". It all depends how the index is
> organized, if its only on HFile keys, it could also match row1,col1 unless
> we use some demarcator b/w row1 and col1 in our HFile keys. So I am just
> wondering if we will totally skip touch row1,col1 in this case and jump
> straight to row1c or not. The other option is that we would actually hit
> row1,col1 since the prefix matches row1c when looking at the HFile key and
> then, we look at the length of the row to grab the real portion from the
> concatenated HFile key and discard all row1 entries.
>
> Does that make my query clearer ?
>
>
> On Thu, May 16, 2013 at 2:42 PM, Varun Sharma <[EMAIL PROTECTED]> wrote:
>
>> Nothing, I am just curious...
>>
>> So, we will do a bunch of wasteful scanning - that's lets say row1 has
>> col1 - col100000 - basically 100K columns, we will scan all those key
>> values even though we are going to discard them, is that correct ?
>>
>>
>> On Thu, May 16, 2013 at 2:30 PM, Stack <[EMAIL PROTECTED]> wrote:
>>
>>> What you seeing Varun (or think you are seeing)?
>>> St.Ack
>>>
>>>
>>> On Thu, May 16, 2013 at 2:30 PM, Stack <[EMAIL PROTECTED]> wrote:
>>>
>>> > On Thu, May 16, 2013 at 2:03 PM, Varun Sharma <[EMAIL PROTECTED]>
>>> wrote:
>>> >
>>> >> Or do we use some kind of demarcator b/w rows and columns and
>>> timestamps
>>> >> when building the HFile keys and the indices ?
>>> >>
>>> >
>>> > No demarcation but in KeyValue, we keep row, column family name, column
>>> > family qualifier, etc., lengths and offsets so the comparators on ly
>>> > compare pertinent bytes.
>>> >
>>> > If you doing a prefix scan w/ row1c, we should be starting the scan at
>>> > row1c, not row1 (or more correctly at the row that starts the block we
>>> > believe has a row1c row in it...).
>>> >
>>> > St.Ack
>>> >
>>>
>>
>>
>