|
|
-
timeseries merge/join questionCalvin 2009-11-04, 16:53
Hey all,
I am trying to figure out the best way to approach some joining/merging computation in a map-reduce / hbase framework. I have the following large timeseries datasets (key/value pairs keyed and sorted by time): Events1: t1, event1_value1 t3, event1_value2, ... Event2: t2, event2_value1 t3, event2_value2, t4, event2_value3, .... Currently, I am just storing these as flat files in HDFS but I have no problems throwing them into HBase tables. I am trying to do an operation like the following: for every event in Events2, find and join with the event that immediately precedes (timestamp <=) this event in table Events1. This operation would result in something like: JoinedEvents: t2, events2_value1, events1_value1 t3, events2_value2, events1_value2 t4, events2_value3, events1_value2 etc. What is the best way to go about this in Hadoop? Thanks in advance for the help. |