Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Accumulo >> mail # user >> Wikisearch Iterators


Copy link to this message
-
Re: Wikisearch Iterators
Josh,

Appreciate the help.

I definitely have a use case that will involve terms not being found in the
index (like a user typo), and I need it to exit gracefully.

The OOME error definitely happens in the web server, and it is within the
method that creates documents.  This is puzzling, because I would expect no
documents to be created.  Can you help me understand why this is happening
and how to elegantly catch this circumstance?

Definitely agree on needing more fidelity on the iterators in this example,
I have written several simple iterators and used them well, but this is
clearly a more advanced implementation of an algorithm with them.
 Understand the concept of a document-partitioned index and an intersecting
iterator, but hard to get my brain around this whole thing.  I understand
the table structure and how they are scanned, and I understand how the
query is parsed and builds an iterator stack, but missing how they lash up.
 And of course, this case where no documents found is being turned into too
many documents returned.

Thanks,

TJ
On Thu, Jun 6, 2013 at 4:33 PM, Josh Elser <[EMAIL PROTECTED]> wrote:

> Hi Thomas,
>
> A couple of things you can glean from this.
>
> "full table scan" - Implies that, for some reason, the iterators or client
> code did not find one of the terms necessary to satisfy your query and
> attempted to find matching records using an exhaustive search. IMO, this
> shouldn't even exist as the Wikisearch indexes everything, and the
> 'feature' masks infinitely more problems than helping satisfies queries
> that the index can't satisfy (which are few).
>
> OOME - Was this the tabletserver or the webserver? If the webserver, it
> could be that your query returned too many results that fit into the
> configured Java heap space. You could try upping -Xmx and see if you can
> find the sweet spot.
>
> It should be said, also, that the iterators included in the Wikisearch
> application are *very* rough and are likely not great examples to use as a
> basis for good Accumulo SortedKeyValueIterator development. However, the
> basic algorithm which the iterators perform is sound, scalable, and can
> perform quite well, especially when coupled with certain optimizations.
>
> A would agree with you that a white-paper or similar on the table
> structure and algorithm is long overdue.
>
> If you have more specific problems, I'm sure the community at large (self,
> included) would be happy to help and go into more detail.
>
>
> On 06/06/2013 04:05 PM, Thomas Jackson wrote:
>
>> Hey everyone,
>>
>> I am taking the Wikisearch application for a test drive and ran into some
>> issues.  I have successfully ingested a number of wiki dumps for several
>> langues into Accumulo and have been able to search on terms that I know
>> exist in the corpus.  However, the issue I run into is that I get an out of
>> memory exception when the application performs a full table scan searching
>> for a term that does not exist in the index. Has anyone else encountered
>> this issue?
>>
>> Also I was hoping to find out if anyone had any documentation or
>> information on how the iterators in the wikisearch application work.
>>
>> Thanks
>> TJ
>>
>
>