Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Accumulo >> mail # user >> Wikisearch


Copy link to this message
-
Re: Wikisearch
The forward and reverse index are very important, yes, with the
in-partition "field index" being even more important.

Yes to full table scans being undesirable and probably useless in the scope
of the wikisearch as it should index most everything and thus there is
nothing extra to be gleaned.

I forget exactly how it was implemented, but tokens will appear in the
global indices and the doc partitioned table.

The most likely reason for the oome is that the trivial web service
included attempts to suck all results into memory. There's nothing
inherently wrong with scanning all records in Accumulo, but the webserver
will easily fall over.
On Jun 9, 2013 11:08 PM, "Frank Smith" <[EMAIL PROTECTED]> wrote:

> Appreciate everyone's help on the file storage question, but I was also
> looking at Josh's response to Thomas Jackson, and do I understand him
> correctly that the scan of the Index (and likely the ReverseIndex) table
> are really the key part of the search query, and the full table scan isn't
> really useful for much (because all of the tokens should go in the Index
> tables)?
>
> So if I understand correctly, the partitioned main table is where
> documents and tokens get written, and then a combiner feeds the index
> tables, which are then scanned during a search?
>
> What would I lose if I wanted to avoid Thomas's OOME and just skip the
> full table scan part of the search?
>
> Obviously, since I am not searching Wikipedia, I am going to be making
> some changes, just want to do it smartly.
>
> Thanks,
>
> Frank
>