-Re: Custom Iterators
Josh Elser 2012-08-23, 00:03
Err, double (triple) reply:
No, you are incorrect. The wikisearch example can handle any arbitrary
boolean expression containing NOT, AND, and OR. As always, I'll preface
it the same as Bill did: it *should* be able to handle them :).
I know that cleaning-up/reworking the Wikisearch code is in the works.
I'm just not positive about the timeframe.
As far as examples, I'd push you to the write-up Eric did after
benchmarking the wikisearch example:
He has some example queries that give the basic idea behind what's
supported (minus the NOTs)
On 08/22/2012 05:27 PM, Cardon, Tejay E wrote:
> Thanks for getting back to me so quickly. I explained in my lengthy
> reply to William that the comment on OrIterator.TermSource.compareTo
> indicates that implementations with more than one row per tablet need
> to compare row key first (and that is not being done in this code). It
> may be that itï¿½s not an issue and Iï¿½m simply misunderstanding
> something. As for the wikisearch example, as I understood it, it could
> only handle searches for ï¿½andedï¿½ terms. If thatï¿½s not the case, then
> an example of an or search would be helpful. In any case, Iï¿½d love a
> deeper dive on the wikisearch somewhere. I get the source code and a
> high level explanation of whatï¿½s happening, but Iï¿½d love a tutorial or
> something that walks through the classes and explains how each one
> contributes to the functionality. Donï¿½t consider that a request (that
> would be a lot more to ask then Iï¿½m willing to ask), but I would
> certainly find it useful if it does exist.
> *From:*Josh Elser [mailto:[EMAIL PROTECTED]]
> *Sent:* Wednesday, August 22, 2012 2:53 PM
> *To:* [EMAIL PROTECTED]
> *Subject:* EXTERNAL: Re: Custom Iterators
> What makes you say that the OrIterator cannot handle more than one row
> per tablet? Can you provide details?
> AFAIK, the OrIterator should work correctly in all cases (e.g.
> regardless of row distribution in a tablet). Any issues in the code
> that prevent it from doing so would be a bug that should be fixed.
> Also, the wikisearch example supports indexing over multiple
> attributes (and I believe indexes document metadata in addition to the
> tokenized document). Is there something unclear that could be better
> On 8/22/12 4:41 PM, Cardon, Tejay E wrote:
> Iï¿½m interested in writing a custom iterator, and Iï¿½ve been looking
> for documentation on how to do so. Thus far, Iï¿½ve not been able to
> find anything beyond the java docs in SortedKeyValueIterator and a
> few other sub-classes. A few of the examples use Iterators, but
> provide no real info on how to properly implement one. Is there
> anywhere to find general guidance on the iterator stack?
> (If youï¿½re interested)
> Specifically, for those that are curious, Iï¿½m trying to implement
> something similar to the wikisearch example, but with some key
> differences. In my case, Iï¿½ve got a file with various attributes
> that being indexed. So for each file there are 5 attributes, and
> each attribute has a fixed number of possible values. For example
> (totally made up):
> personID, gender, hair color, country, race, personRecord
> Row:binID; ColFam:Attribute_AttributeValue; ColQ:PersonID; Val:blank
> Row:binID; ColFam:ï¿½Dï¿½; ColQ:personID; value:personRecord
> A typical query would be:
> Give me the personRecord for all people with:
> Gender: male &
> Hair color: blond or brown &
> Country: USA or England or china or korea &
> Race: white or oriental
> The existing Iterators used in the wikisearch example are unable
> to handle the ï¿½orï¿½ clauses in each attribute.
> The OrIterator doesnï¿½t appear to handle the possibility more than
> one row per tablet