Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Accumulo >> mail # user >> Iterators and seeking the middle of a row


Copy link to this message
-
Re: Iterators and seeking the middle of a row
Remember that the range given to an iterator is, at some point in time,
user set. If a client only wants to scan between keys K1 and K2, and each
occur in the same row, then the iterator should not be considering data
that is outside of the range supplied to it. Someone can correct me if I'm
wrong, but I also believe that if a client received a key outside of the
original scan range, then that was considered a termination condition and
the scan would stop.

Let's say I have a flat record structure for people, where the row is the
name of the person, the column family is some attribute about them, and the
column qualifier is the value for that attribute. Here's a record for Bob:

Bob eyes: blue
Bob hair: brown
Bob height: tall
Bob pants: brown
Bob shirt: white
Bob tie: blue

If you were searching for all attributes that were 'brown', you could do a
look up using the range `new Range("Bob", "Bob")`. Your iterator would be
able to see all of Bob and return to the user his hair and pants color.
However, you could just as easily perform your look up with `new Range(new
Key("Bob", "height"), new Key("Bob", "z"))`*. Your iterator would then be
allowed to look at a subset of Bob, starting at his height and continuing
until the end of his record.

* I used "z" because it sorts lexicographically after the other attributes.
On Thu, Sep 13, 2012 at 1:01 PM, Keith Turner <[EMAIL PROTECTED]> wrote:

> On Thu, Sep 13, 2012 at 3:50 PM, Cardon, Tejay E
> <[EMAIL PROTECTED]> wrote:
> > The javadoc for SortedKeyValueIterator.seek states:
> >
> > “Iterators that examine groups of adjacent key/value pairs (e.g. rows) to
> > determine their top key and value should be sure that they properly
> handle a
> > seek to a key in the middle of such a group (e.g. the middle of a row).
> Even
> > if the client always seeks to a range containing an entire group (a,c),
> the
> > tablet server could send back a batch of entries corresponding to (a,b],
> > then reseek the iterator to range (b,c) when the scan is continued.”
> >
> >
> >
> > However, it gives no indication of what proper handling is.  What should
> an
> > iterator that considers and entire row do in this case?  Does it simply
> > ignore the row?  Attempt to seek its source iterator to the full row of
> the
> > first range?  I’m struggling to understand the best approach here
> org.apache.accumulo.core.iterators.user.RowFilter does what you
> suggested.  It seeks to the beggining of a row if the range starts in
> the middle of the row.  Look at the javadoc for the row filter, it
> discusses the seeking behavior.
>
> >
> >
> >
> > In my specific case, if it matters, I’m largely looking for
> ColumnQualifiers
> > which exist in all Column Families in a given set (intersecting iterator,
> > sortof).
> >
> >
> >
> > Thanks,
> > Tejay
>