Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Accumulo >> mail # user >> accumulo for a bi-map?


Copy link to this message
-
Re: accumulo for a bi-map?
Marc,

You might also want to check out D4M and the table organization that it
uses in Accumulo. D4M stores matrixes and their transforms, which is
essentially the same concept as a bidirectional map or a bidirected graph:
http://www.mit.edu/~kepner/D4M/

Cheers,
Adam

On Tue, Jul 16, 2013 at 5:28 PM, Marc Reichman <[EMAIL PROTECTED]
> wrote:

> We are using accumulo as a mechanism to store feature data (binary byte[])
> for some simple keys which are used for a search algorithm. We currently
> search by iterating over the feature space using AccumuloRowInputFormat.
> Results come out of a reducer into HDFS, currently in a SequenceFile.
>
> A customer has asked if we can store our results somewhere in our Hadoop
> infrastructure, and also perform nightly searches of everything vs
> everything to keep match results up to date.
>
> To me, the storage of the results in alternate column families (from the
> features) would be a way way to store the matches alongside the key rows:
> (key: abcd, features:{...}, matches{ 'm0: efgh-88%, 'm1': ijkl-90%, ...,
> 'mN': etc }
> (key: ijkl, features:{...}, matches{ 'm0: efgh-88%, 'm1': abcd-90%, ...,
> 'mN': etc }
>
> Match scores are equal between two items regardless of perspective, so
> a->b is 90% as b->a is 90%.
>
> Is there a way to simply add columns to an existing family without having
> to name them or keep track of how many there are? Am I better off making a
> column family for each match key and then store score and other fields in
> columns? Making one column with the key as the name and the score as the
> value for each match under one family?
>
> Ideally I would have some form of bidirectional map so I could look at any
> key and find all the results as other keys, and find any results to get
> other matches.
>
> One approach is to simply add both sides of the relationship every time
> anything matches anything else, which seems a bit wasteful, space-wise.
>
> Curious if any pre-existing ideas are out there. Currently on hadoop
> 1.0.3/accumulo 1.4.1, not set in (hard) concrete.
>
> Thanks,
> Marc
>
>
>