Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Accumulo >> mail # user >> Custom Iterators


+
Cardon, Tejay E 2012-08-22, 20:41
+
Josh Elser 2012-08-22, 20:53
+
William Slacum 2012-08-22, 20:59
Copy link to this message
-
Re: Custom Iterators
Err, double (triple) reply:

No, you are incorrect. The wikisearch example can handle any arbitrary
boolean expression containing NOT, AND, and OR. As always, I'll preface
it the same as Bill did: it *should* be able to handle them :).

I know that cleaning-up/reworking the Wikisearch code is in the works.
I'm just not positive about the timeframe.

As far as examples, I'd push you to the write-up Eric did after
benchmarking the wikisearch example:
http://accumulo.apache.org/example/wikisearch.html

He has some example queries that give the basic idea behind what's
supported (minus the NOTs)

On 08/22/2012 05:27 PM, Cardon, Tejay E wrote:
>
> Josh,
>
> Thanks for getting back to me so quickly. I explained in my lengthy
> reply to William that the comment on OrIterator.TermSource.compareTo
> indicates that implementations with more than one row per tablet need
> to compare row key first (and that is not being done in this code). It
> may be that it�s not an issue and I�m simply misunderstanding
> something. As for the wikisearch example, as I understood it, it could
> only handle searches for �anded� terms. If that�s not the case, then
> an example of an or search would be helpful. In any case, I�d love a
> deeper dive on the wikisearch somewhere. I get the source code and a
> high level explanation of what�s happening, but I�d love a tutorial or
> something that walks through the classes and explains how each one
> contributes to the functionality. Don�t consider that a request (that
> would be a lot more to ask then I�m willing to ask), but I would
> certainly find it useful if it does exist.
>
> Thanks,
>
> Tejay
>
> *From:*Josh Elser [mailto:[EMAIL PROTECTED]]
> *Sent:* Wednesday, August 22, 2012 2:53 PM
> *To:* [EMAIL PROTECTED]
> *Subject:* EXTERNAL: Re: Custom Iterators
>
> What makes you say that the OrIterator cannot handle more than one row
> per tablet? Can you provide details?
>
> AFAIK, the OrIterator should work correctly in all cases (e.g.
> regardless of row distribution in a tablet). Any issues in the code
> that prevent it from doing so would be a bug that should be fixed.
>
> Also, the wikisearch example supports indexing over multiple
> attributes (and I believe indexes document metadata in addition to the
> tokenized document). Is there something unclear that could be better
> documented?
>
> On 8/22/12 4:41 PM, Cardon, Tejay E wrote:
>
>     All,
>
>     I�m interested in writing a custom iterator, and I�ve been looking
>     for documentation on how to do so. Thus far, I�ve not been able to
>     find anything beyond the java docs in SortedKeyValueIterator and a
>     few other sub-classes. A few of the examples use Iterators, but
>     provide no real info on how to properly implement one. Is there
>     anywhere to find general guidance on the iterator stack?
>
>     (If you�re interested)
>
>     Specifically, for those that are curious, I�m trying to implement
>     something similar to the wikisearch example, but with some key
>     differences. In my case, I�ve got a file with various attributes
>     that being indexed. So for each file there are 5 attributes, and
>     each attribute has a fixed number of possible values. For example
>     (totally made up):
>
>     personID, gender, hair color, country, race, personRecord
>
>     Row:binID; ColFam:Attribute_AttributeValue; ColQ:PersonID; Val:blank
>
>     AND
>     Row:binID; ColFam:�D�; ColQ:personID; value:personRecord
>
>     A typical query would be:
>
>     Give me the personRecord for all people with:
>
>     Gender: male &
>
>     Hair color: blond or brown &
>
>     Country: USA or England or china or korea &
>
>     Race: white or oriental
>
>     The existing Iterators used in the wikisearch example are unable
>     to handle the �or� clauses in each attribute.
>
>     The OrIterator doesn�t appear to handle the possibility more than
>     one row per tablet
>
+
Marc Parisi 2012-08-22, 23:32