Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> Question about HFile seeking

Copy link to this message
Re: Question about HFile seeking
Thanks Stack and Lars for the detailed answers - This question is not
really motivated by performance problems...

So the index indeed knows what part of the HFile key is the row and which
part is the column qualifier. Thats what I needed to know. I initially
thought it saw it as an opaque concatenated key (row+col.
qualifier+timestamp) in which case, it would be difficult to run prefix
scans since prefixes could potentially bleed across row and col.

On Thu, May 16, 2013 at 11:54 PM, Michael Stack <[EMAIL PROTECTED]> wrote:

> On Thu, May 16, 2013 at 3:26 PM, Varun Sharma <[EMAIL PROTECTED]> wrote:
>> Referring to your comment above again
>> "If you doing a prefix scan w/ row1c, we should be starting the scan at
>> row1c, not row1 (or more correctly at the row that starts the block we
>> believe has a row1c row in it...)"
>> I am trying to understand how you could seek right across to the block
>> containing "row1c" using the HFile Index. If the index is just built on
>> HFile keys and there is no demarcation b/w rows and col(s), you would hit
>> the block for "row1,col1". After that you would either need a way to skip
>> right across to "row1c" after you find that this is not the row you are
>> looking for or you will have to simply keep scanning and discarding
>> sequentially until you get "row1c". If you have to keep scanning and
>> discarding, then that is probably suboptimal. But if there is a way to skip
>> right across from "row1,col1" to "row1c", then thats great, though I wonder
>> how that would be implemented.
>> (ugh... meant to send the below at 5pm but see i didn't send it...
> anyways... see mailing list.. hopefully helps)
> The hfile index looks like an opaque byte array but it actually has a
> strong format.  In KV we have comparators that will look at this byte array
> and exploit the format to tease apart row from column from qualifier.
> I have to run just now.  Will give you better answer this evening up on
> list.
> St.Ack