Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Accumulo >> mail # dev >> SQL layer over Accumulo?


Copy link to this message
-
Re: SQL layer over Accumulo?
So there may be a bit of confusion with storing index and data in the same
row. By "row" I just mean the logical Accumulo unit, not a "row" as in
"thing in my relational table." Synonyms for "row" in this scheme are
"shard" and "document partition".

You can store multiple documents and indices for those documents in
different column families within the same row. You then have separate
readers for the indices and document data ("sources" in Iterator terms).
Point and range queries are still possible in this fashion, and are made
even easier if there's another level that maps terms to
rows/shards/partition. The wikisearch example is an (admittedly rough)
implementation of this.

I think looking at how "buddy" regions work may help clarify things, since
I imagine it works similarly. If the coprocessor is just reading from a
region "I", that that contains index data for only region "D", then that
maps pretty well to an iterator scanning index data from a column family
"I" and fetching documents from a column family "D".

On Thu, May 8, 2014 at 1:09 AM, James Taylor <[EMAIL PROTECTED]> wrote: