Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Accumulo, mail # user - accumulo for a bi-map?


+
Marc Reichman 2013-07-16, 21:28
+
Dave Marion 2013-07-16, 23:16
+
David Medinets 2013-07-16, 22:55
+
Josh Elser 2013-07-16, 23:25
+
Marc Reichman 2013-07-17, 15:26
+
Marc Reichman 2013-07-18, 16:15
+
Josh Elser 2013-07-18, 16:48
+
Adam Fuchs 2013-07-17, 19:03
Copy link to this message
-
Re: accumulo for a bi-map?
Jeremy Kepner 2013-07-18, 17:32
Here is a link to the IEEE HPEC paper we wrote up on our schema work:

http://www.mit.edu/~kepner/pubs/D4Mschema_HPEC2013_Paper.pdf

On Wed, Jul 17, 2013 at 03:03:35PM -0400, Adam Fuchs wrote:
>    Marc,
>    You might also want to check out D4M and the table organization that it
>    uses in Accumulo. D4M stores matrixes and their transforms, which is
>    essentially the same concept as a bidirectional map or a bidirected
>    graph:�[1]http://www.mit.edu/~kepner/D4M/
>    Cheers,
>    Adam
>
>    On Tue, Jul 16, 2013 at 5:28 PM, Marc Reichman
>    <[2][EMAIL PROTECTED]> wrote:
>
>      We are using accumulo as a mechanism to store feature data (binary
>      byte[]) for some simple keys which are used for a search algorithm. We
>      currently search by iterating over the feature space using
>      AccumuloRowInputFormat. Results come out of a reducer into HDFS,
>      currently in a SequenceFile.
>      A customer has asked if we can store our results somewhere in our Hadoop
>      infrastructure, and also perform nightly searches of everything vs
>      everything to keep match results up to date.
>      To me, the storage of the results in alternate column families (from the
>      features) would be a way way to store the matches alongside the key
>      rows:
>      (key: abcd, features:{...}, matches{ 'm0: efgh-88%, 'm1': ijkl-90%, ...,
>      'mN': etc }
>      (key: ijkl, features:{...}, matches{ 'm0: efgh-88%, 'm1': abcd-90%, ...,
>      'mN': etc }
>      Match scores are equal between two items regardless of perspective, so
>      a->b is 90% as b->a is 90%.
>      Is there a way to simply add columns to an existing family without
>      having to name them or keep track of how many there are? Am I better off
>      making a column family for each match key and then store score and other
>      fields in columns? Making one column with the key as the name and the
>      score as the value for each match under one family?
>      Ideally I would have some form of bidirectional map so I could look at
>      any key and find all the results as other keys, and find any results to
>      get other matches.
>      One approach is to simply add both sides of the relationship every time
>      anything matches anything else, which seems a bit wasteful, space-wise.
>      Curious if any pre-existing ideas are out there. Currently on hadoop
>      1.0.3/accumulo 1.4.1, not set in (hard) concrete.
>      Thanks,
>      Marc
>
> References
>
>    Visible links
>    1. http://www.mit.edu/~kepner/D4M/
>    2. mailto:[EMAIL PROTECTED]
+
Frank Smith 2013-07-21, 14:15
+
Kepner, Jeremy - 0553 - M... 2013-07-21, 18:11