Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Accumulo >> mail # user >> Iterators returning keys out of scan range


Copy link to this message
-
Re: Iterators returning keys out of scan range
Hi all,

Just to echo and expand on Adam's comment, I'd suggest not trying this at
work either!

This week, I was moving some code from depending on 1.3.4 to Accumulo
1.4.3, and I was tracking down some kind of problem with an iterator we had
written which worked for *most* cases.  When it broke, the iterator would
return an infinite loop.  In the end, we only rarely had data big enough to
trip the reseek that Adam mentioned, and we needed to update our code to
deal with that correctly.

As for some kind of test, I am unsure if the Mock versions do reseeking,
etc.  If they do, then you'd just  need to reason through having enough
data in a row to make the Tablet/scanner reseek.  (The size cut-off that I
saw the Tablet looking for was 1 megabyte.)  If Mock doesn't reseek, this
kind of test would need to be on a run Accumulo setup.

That said, I think the same behavior could be seen in a unit test by
reseeking after *each* call to next.  At least this would test an
iterator's ability to reseek to any arbitrary position.  I imagine writing
a ReseekingIterator wouldn't be hard, and then one could add it as the last
iterator in any exisiting unit tests...

I think there are two principles to test here.  First, all iterators should
provide a sorted "view" of their underlying input.  And second, an iterator
should be able to resume (i.e., be re-seeked) from the last key it
returned.  I say "view" since an iterator could be combining multiple rows
into something else to be returned to the client.

Jim
On Sat, May 25, 2013 at 1:09 PM, Christopher <[EMAIL PROTECTED]> wrote:

> He's talking about using iterators that transform keys (we don't have
> any built-in, IIRC), like those that extend the new
> TransformingIterator. Scanner logic is written, such that it will
> resume scanning from the last key it received. This is important for
> handling failures and splits/migrations during a scan. So, in this
> context, a "reversible transformation" simply means that when the
> client tells the tserver's iterator stack scan, it can transform what
> the client thinks is the starting point for the scan, back to what it
> actually should have been prior to transformation, so it can resume
> from the correct place. This is necessary, because the client will not
> know what the data looked like prior to transformation, as it only
> sees data returned from the iterator stack.
>
> Now, the assumption here, is that the key that the client *thinks* is
> the starting point is in the same tablet that the real starting *is*.
> Otherwise, it doesn't matter if the transformation is reversible,
> because the real starting point could be on a different tablet
> entirely (due to splits). To ensure this doesn't happen, it's
> important to make sure that transforming iterators that you implement
> do not transform the RowID portion of the key... or else, if they do,
> they can send a special key back, that is understood by client code
> that can inform the client to query a different tablet server... the
> one the client needs to resume scanning from.
>
> Yes, there should be unit tests, but the unit tests would be against
> iterators that actually transform keys in this way... and I don't
> think we provide any. That'd be user code.
>
> --
> Christopher L Tubbs II
> http://gravatar.com/ctubbsii
>
>
> On Sat, May 25, 2013 at 9:36 AM, David Medinets
> <[EMAIL PROTECTED]> wrote:
> > Is there a unit test exposing this behavior? And what does "reversible
> > transformation" mean?
> >
> >
> > On Wed, May 1, 2013 at 8:36 PM, Adam Fuchs <[EMAIL PROTECTED]> wrote:
> >>
> >> For all the rest of you on this thread, the big problem you'll run into
> >> when returning keys out of range is that the reseeking behavior will
> skip a
> >> bunch of underlying keys (i.e. don't try this at home). For example,
> say you
> >> have tablets ["A","D"], ("D","M"], and ("M","ZZZZ..."]. If you do a
> query on
> >> ["A","M"] and return "N" after seeing the underlying key "A", you may
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB