Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> Question about HFile seeking

Copy link to this message
Re: Question about HFile seeking
Generally we start with seeking on all the Hfiles corresponding to the
region and load the blocks that correspond to that row key specified in the

If row1 and row1c are in the same block then we may start with row1.  If
they are in different blocks then we will start with the block containing

Also as the Prefixfilter is getting used here so once we have hit the first
row in the block we keep scanning till the filterRowKey() says we have
arrived at a row that matches the prefix.
One more thing it will do is once the prefix matches are over (this will
happen because they are lexographically sorted) we will ignore all other
keys greater than what prefixfilter needs.

On Fri, May 17, 2013 at 12:52 PM, Varun Sharma <[EMAIL PROTECTED]> wrote:

> Thanks Stack and Lars for the detailed answers - This question is not
> really motivated by performance problems...
> So the index indeed knows what part of the HFile key is the row and which
> part is the column qualifier. Thats what I needed to know. I initially
> thought it saw it as an opaque concatenated key (row+col.
> qualifier+timestamp) in which case, it would be difficult to run prefix
> scans since prefixes could potentially bleed across row and col.
> Varun
> On Thu, May 16, 2013 at 11:54 PM, Michael Stack <[EMAIL PROTECTED]>
> wrote:
> > On Thu, May 16, 2013 at 3:26 PM, Varun Sharma <[EMAIL PROTECTED]>
> wrote:
> >
> >> Referring to your comment above again
> >>
> >> "If you doing a prefix scan w/ row1c, we should be starting the scan at
> >> row1c, not row1 (or more correctly at the row that starts the block we
> >> believe has a row1c row in it...)"
> >>
> >> I am trying to understand how you could seek right across to the block
> >> containing "row1c" using the HFile Index. If the index is just built on
> >> HFile keys and there is no demarcation b/w rows and col(s), you would
> hit
> >> the block for "row1,col1". After that you would either need a way to
> skip
> >> right across to "row1c" after you find that this is not the row you are
> >> looking for or you will have to simply keep scanning and discarding
> >> sequentially until you get "row1c". If you have to keep scanning and
> >> discarding, then that is probably suboptimal. But if there is a way to
> skip
> >> right across from "row1,col1" to "row1c", then thats great, though I
> wonder
> >> how that would be implemented.
> >>
> >> (ugh... meant to send the below at 5pm but see i didn't send it...
> > anyways... see mailing list.. hopefully helps)
> >
> >
> > The hfile index looks like an opaque byte array but it actually has a
> > strong format.  In KV we have comparators that will look at this byte
> array
> > and exploit the format to tease apart row from column from qualifier.
> >
> > I have to run just now.  Will give you better answer this evening up on
> > list.
> >
> > St.Ack
> >