Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Accumulo >> mail # user >> joining accumulo tables with mapreduce

Copy link to this message
joining accumulo tables with mapreduce

 I am interested in learning what the best solution/practices might be to
join 3 accumulo tables by running a map reduce job. Interested in getting
feedback on best practices and such. Heres a pseudo code of what I want to
AccumuloInputFormat accepts tableA
Global variable <table_list> has table names: tableB, tableC

In a mapper, for example, you would do something like this:

for each row in TableA
 if (row.family == "abc" && row.qualifier == "xyz") value = getValue()
 if (foundvalue) {

  for each table in table_list
    scan table with (this rowid && family = "def")
    for each entry found in scan
      write to final_table (rowid, value_as_family, tablename_as_qualifier,

}//end if foundvalue

}//end for loop
This is a simple version of what I want to do. In my non mapreduce java
code I would do this by calling a using different scanners per table in the
list. Couple questions:
- how bad/good is performance when using scanners withing mappers?
- if I get one mapper per range in tableA, do I reset scanners? how? or
would I set up a scanner in the setup() of mapper ? --> i have no clue how
this will play out so thinking out loud here.
- any optimization suggestions? or examples of creating join_tables/indexes
out there that I can refer to?
Thank you for all suggestions.
Keith Turner 2013-04-17, 14:59
Aji Janis 2013-04-17, 20:43
Keith Turner 2013-04-17, 23:39
Kurt Christensen 2013-05-04, 14:15
David Medinets 2013-04-18, 01:03