Accumulo, mail # user - Wikisearch Iterators

Thomas Jackson 2013-06-06, 20:05
Re: Wikisearch Iterators
Josh Elser 2013-06-06, 20:33
Hi Thomas,

A couple of things you can glean from this.

"full table scan" - Implies that, for some reason, the iterators or
client code did not find one of the terms necessary to satisfy your
query and attempted to find matching records using an exhaustive search.
IMO, this shouldn't even exist as the Wikisearch indexes everything, and
the 'feature' masks infinitely more problems than helping satisfies
queries that the index can't satisfy (which are few).

OOME - Was this the tabletserver or the webserver? If the webserver, it
could be that your query returned too many results that fit into the
configured Java heap space. You could try upping -Xmx and see if you can
find the sweet spot.

It should be said, also, that the iterators included in the Wikisearch
application are *very* rough and are likely not great examples to use as
a basis for good Accumulo SortedKeyValueIterator development. However,
the basic algorithm which the iterators perform is sound, scalable, and
can perform quite well, especially when coupled with certain optimizations.

A would agree with you that a white-paper or similar on the table
structure and algorithm is long overdue.

If you have more specific problems, I'm sure the community at large
(self, included) would be happy to help and go into more detail.

On 06/06/2013 04:05 PM, Thomas Jackson wrote:
> Hey everyone,
> I am taking the Wikisearch application for a test drive and ran into
> some issues.  I have successfully ingested a number of wiki dumps for
> several langues into Accumulo and have been able to search on terms
> that I know exist in the corpus.  However, the issue I run into is
> that I get an out of memory exception when the application performs a
> full table scan searching for a term that does not exist in the index.
> Has anyone else encountered this issue?
> Also I was hoping to find out if anyone had any documentation or
> information on how the iterators in the wikisearch application work.
> Thanks
> TJ
