Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Accumulo, mail # user - accumulo for a bi-map?


+
Marc Reichman 2013-07-16, 21:28
+
Dave Marion 2013-07-16, 23:16
+
David Medinets 2013-07-16, 22:55
+
Josh Elser 2013-07-16, 23:25
+
Marc Reichman 2013-07-17, 15:26
+
Marc Reichman 2013-07-18, 16:15
+
Josh Elser 2013-07-18, 16:48
Copy link to this message
-
Re: accumulo for a bi-map?
Adam Fuchs 2013-07-17, 19:03
Marc,

You might also want to check out D4M and the table organization that it
uses in Accumulo. D4M stores matrixes and their transforms, which is
essentially the same concept as a bidirectional map or a bidirected graph:
http://www.mit.edu/~kepner/D4M/

Cheers,
Adam

On Tue, Jul 16, 2013 at 5:28 PM, Marc Reichman <[EMAIL PROTECTED]
> wrote:

> We are using accumulo as a mechanism to store feature data (binary byte[])
> for some simple keys which are used for a search algorithm. We currently
> search by iterating over the feature space using AccumuloRowInputFormat.
> Results come out of a reducer into HDFS, currently in a SequenceFile.
>
> A customer has asked if we can store our results somewhere in our Hadoop
> infrastructure, and also perform nightly searches of everything vs
> everything to keep match results up to date.
>
> To me, the storage of the results in alternate column families (from the
> features) would be a way way to store the matches alongside the key rows:
> (key: abcd, features:{...}, matches{ 'm0: efgh-88%, 'm1': ijkl-90%, ...,
> 'mN': etc }
> (key: ijkl, features:{...}, matches{ 'm0: efgh-88%, 'm1': abcd-90%, ...,
> 'mN': etc }
>
> Match scores are equal between two items regardless of perspective, so
> a->b is 90% as b->a is 90%.
>
> Is there a way to simply add columns to an existing family without having
> to name them or keep track of how many there are? Am I better off making a
> column family for each match key and then store score and other fields in
> columns? Making one column with the key as the name and the score as the
> value for each match under one family?
>
> Ideally I would have some form of bidirectional map so I could look at any
> key and find all the results as other keys, and find any results to get
> other matches.
>
> One approach is to simply add both sides of the relationship every time
> anything matches anything else, which seems a bit wasteful, space-wise.
>
> Curious if any pre-existing ideas are out there. Currently on hadoop
> 1.0.3/accumulo 1.4.1, not set in (hard) concrete.
>
> Thanks,
> Marc
>
>
>
+
Jeremy Kepner 2013-07-18, 17:32
+
Frank Smith 2013-07-21, 14:15
+
Kepner, Jeremy - 0553 - M... 2013-07-21, 18:11