Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Accumulo >> mail # user >> Iterators returning keys out of scan range

Copy link to this message
Re: Iterators returning keys out of scan range
He's talking about using iterators that transform keys (we don't have
any built-in, IIRC), like those that extend the new
TransformingIterator. Scanner logic is written, such that it will
resume scanning from the last key it received. This is important for
handling failures and splits/migrations during a scan. So, in this
context, a "reversible transformation" simply means that when the
client tells the tserver's iterator stack scan, it can transform what
the client thinks is the starting point for the scan, back to what it
actually should have been prior to transformation, so it can resume
from the correct place. This is necessary, because the client will not
know what the data looked like prior to transformation, as it only
sees data returned from the iterator stack.

Now, the assumption here, is that the key that the client *thinks* is
the starting point is in the same tablet that the real starting *is*.
Otherwise, it doesn't matter if the transformation is reversible,
because the real starting point could be on a different tablet
entirely (due to splits). To ensure this doesn't happen, it's
important to make sure that transforming iterators that you implement
do not transform the RowID portion of the key... or else, if they do,
they can send a special key back, that is understood by client code
that can inform the client to query a different tablet server... the
one the client needs to resume scanning from.

Yes, there should be unit tests, but the unit tests would be against
iterators that actually transform keys in this way... and I don't
think we provide any. That'd be user code.

Christopher L Tubbs II
On Sat, May 25, 2013 at 9:36 AM, David Medinets
> Is there a unit test exposing this behavior? And what does "reversible
> transformation" mean?
> On Wed, May 1, 2013 at 8:36 PM, Adam Fuchs <[EMAIL PROTECTED]> wrote:
>> For all the rest of you on this thread, the big problem you'll run into
>> when returning keys out of range is that the reseeking behavior will skip a
>> bunch of underlying keys (i.e. don't try this at home). For example, say you
>> have tablets ["A","D"], ("D","M"], and ("M","ZZZZ..."]. If you do a query on
>> ["A","M"] and return "N" after seeing the underlying key "A", you may never
>> see keys from the ("D","M"] tablet. A good rule of thumb is to return keys
>> in the same row as the underlying keys that were used to generate them and
>> use a reversible transformation of columns within each row.