|
|
-
Re: Custom IteratorsJosh Elser 2012-08-23, 00:03
Err, double (triple) reply:
No, you are incorrect. The wikisearch example can handle any arbitrary boolean expression containing NOT, AND, and OR. As always, I'll preface it the same as Bill did: it *should* be able to handle them :). I know that cleaning-up/reworking the Wikisearch code is in the works. I'm just not positive about the timeframe. As far as examples, I'd push you to the write-up Eric did after benchmarking the wikisearch example: http://accumulo.apache.org/example/wikisearch.html He has some example queries that give the basic idea behind what's supported (minus the NOTs) On 08/22/2012 05:27 PM, Cardon, Tejay E wrote: > > Josh, > > Thanks for getting back to me so quickly. I explained in my lengthy > reply to William that the comment on OrIterator.TermSource.compareTo > indicates that implementations with more than one row per tablet need > to compare row key first (and that is not being done in this code). It > may be that it�s not an issue and I�m simply misunderstanding > something. As for the wikisearch example, as I understood it, it could > only handle searches for �anded� terms. If that�s not the case, then > an example of an or search would be helpful. In any case, I�d love a > deeper dive on the wikisearch somewhere. I get the source code and a > high level explanation of what�s happening, but I�d love a tutorial or > something that walks through the classes and explains how each one > contributes to the functionality. Don�t consider that a request (that > would be a lot more to ask then I�m willing to ask), but I would > certainly find it useful if it does exist. > > Thanks, > > Tejay > > *From:*Josh Elser [mailto:[EMAIL PROTECTED]] > *Sent:* Wednesday, August 22, 2012 2:53 PM > *To:* [EMAIL PROTECTED] > *Subject:* EXTERNAL: Re: Custom Iterators > > What makes you say that the OrIterator cannot handle more than one row > per tablet? Can you provide details? > > AFAIK, the OrIterator should work correctly in all cases (e.g. > regardless of row distribution in a tablet). Any issues in the code > that prevent it from doing so would be a bug that should be fixed. > > Also, the wikisearch example supports indexing over multiple > attributes (and I believe indexes document metadata in addition to the > tokenized document). Is there something unclear that could be better > documented? > > On 8/22/12 4:41 PM, Cardon, Tejay E wrote: > > All, > > I�m interested in writing a custom iterator, and I�ve been looking > for documentation on how to do so. Thus far, I�ve not been able to > find anything beyond the java docs in SortedKeyValueIterator and a > few other sub-classes. A few of the examples use Iterators, but > provide no real info on how to properly implement one. Is there > anywhere to find general guidance on the iterator stack? > > (If you�re interested) > > Specifically, for those that are curious, I�m trying to implement > something similar to the wikisearch example, but with some key > differences. In my case, I�ve got a file with various attributes > that being indexed. So for each file there are 5 attributes, and > each attribute has a fixed number of possible values. For example > (totally made up): > > personID, gender, hair color, country, race, personRecord > > Row:binID; ColFam:Attribute_AttributeValue; ColQ:PersonID; Val:blank > > AND > Row:binID; ColFam:�D�; ColQ:personID; value:personRecord > > A typical query would be: > > Give me the personRecord for all people with: > > Gender: male & > > Hair color: blond or brown & > > Country: USA or England or china or korea & > > Race: white or oriental > > The existing Iterators used in the wikisearch example are unable > to handle the �or� clauses in each attribute. > > The OrIterator doesn�t appear to handle the possibility more than > one row per tablet > |