I am trying to figure out the best way to approach some joining/merging
computation in a map-reduce / hbase framework. I have the following large
timeseries datasets (key/value pairs keyed and sorted by time):
Currently, I am just storing these as flat files in HDFS but I have no
problems throwing them into HBase tables. I am trying to do an operation
like the following: for every event in Events2, find and join with the event
that immediately precedes (timestamp <=) this event in table Events1.
This operation would result in something like:
t2, events2_value1, events1_value1
t3, events2_value2, events1_value2
t4, events2_value3, events1_value2
What is the best way to go about this in Hadoop?
Thanks in advance for the help.
Dmitriy Ryaboy 2009-11-04, 17:18
Dmitriy Ryaboy 2009-11-04, 17:20
Jason Venner 2009-11-05, 11:20
Dmitriy Ryaboy 2009-11-05, 17:51