Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Accumulo >> mail # user >> strategies beyond intersecting iterators?

Copy link to this message
Re: strategies beyond intersecting iterators?
By iterator stack I am referring to the Accumulo iterators. Resource
sharing among scan sessions is implemented by destroying a user scan
session and eventually recreating the iterator stack. The new stack is then
seek'd to the last key returned by the entire stack. If you were holding
some state, such as a set of keys, it would be rebuilt every time the stack
is created.
On Jul 1, 2012 5:55 PM, "Sukant Hajra" <[EMAIL PROTECTED]> wrote:

> Excerpts from William Slacum's message of Thu Jun 28 16:04:32 -0500 2012:
> >
> > You're pretty much on the spot regarding two aspects about the current
> > IntersectingIterator:
> >
> > 1- It's not really extensible (there are hooks for building doc IDs,
> > but you still need the same `partition term: docId` key structure)
> > 2- Its main strength is that it can do the merges of sorted lists of
> > doc IDs based on equality expressions (ie, `author=="bob" and
> > day=="20120627"`)
> >
> > Fortunately, the logic isn't very complicated for re-creating the
> > merging stuff. Personally, I think it's easy enough to separate the
> > logic of joining N streams of iterator results from the actual
> > scanning. Unfortunately, this would be left up to you to do at the
> > moment :)
> >
> > You could do range searches by consuming sets of values and sorting
> > all of the docIds in that range by throwing them into a TreeSet. That
> > would let you emit doc IDs in a globally sorted order for the given
> > range of terms.
> I understand everything above, I think.  Thanks for the prompt reply.
> > This can get problematic if the range ends up being very large because
> your
> > iterator stack may periodically be destroyed and rebuilt.
> This particular statement confused me.  When you said TreeSet, you're
> talking
> about a straight-forward in-memory collection from java.util or similar,
> right?
> Because I'm confused about which "iterator stack may periodically be
> destroyed
> and rebuilt."  It sounds like we're talking about some garbage collection
> specific to Accumulo.  Am I missing something here?
> -Sukant