Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Accumulo >> mail # user >> Iterators returning keys out of scan range

Copy link to this message
Re: Iterators returning keys out of scan range
Mr. VonCloud,

I suspect you're going for something like eureka synchronization. I suppose
that might work, but I wouldn't rely on that behavior persisting long-term.
It's definitely in the "undefined" set right now. I can't think of another
way you would do what I presume you want to do without modifying the
scanner clients, though.

For all the rest of you on this thread, the big problem you'll run into
when returning keys out of range is that the reseeking behavior will skip a
bunch of underlying keys (i.e. don't try this at home). For example, say
you have tablets ["A","D"], ("D","M"], and ("M","ZZZZ..."]. If you do a
query on ["A","M"] and return "N" after seeing the underlying key "A", you
may never see keys from the ("D","M"] tablet. A good rule of thumb is to
return keys in the same row as the underlying keys that were used to
generate them and use a reversible transformation of columns within each


On Wed, May 1, 2013 at 8:03 PM, William Slacum <

> Sorry guys, I forgot add some methods to the iterator to make it work.
> http://pastebin.com/pXR5veP6
> On Wed, May 1, 2013 at 8:01 PM, William Slacum <
>> I was always under the impression there was a check, presumably on the
>> client side, that would end a scan session if a key was returned that was
>> not in the original scan range.
>> Say I scanned my table for the range ["A", "B"], but I had an iterator
>> that returned only keys beginning with "C". I would expect that I wouldn't
>> see any data, and I'm reasonably certain that in some 1.3 variants this was
>> the case. However, I was able to drum up a test case that disproves this. A
>> similar test can be found here http://pastebin.com/g109eACC. It will
>> require some import magic to get running, but the jist is pretty simple. I
>> am running against Accumulo 1.4.2.
>> I'm hitting up the user list because I'd like to confirm:
>> 1) Is it expected behavior that a scan should terminate once it receives
>> a key outside of its scan range?
>> 2) If (1) is true, when did this change?
>> I'm actually incredibly glad it works the way it does for my needs,
>> however I believe we should document that doing this has several pitfalls
>> and possible remedies for those pitfalls.