Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce, mail # user - timeseries merge/join question


Copy link to this message
-
timeseries merge/join question
Calvin 2009-11-04, 16:53
Hey all,

I am trying to figure out the best way to approach some joining/merging
computation in a map-reduce / hbase framework.  I have the following large
timeseries datasets (key/value pairs keyed and sorted by time):

Events1:
t1, event1_value1
t3, event1_value2,
...

Event2:
t2, event2_value1
t3, event2_value2,
t4, event2_value3,
....

Currently, I am just storing these as flat files in HDFS but I have no
problems throwing them into HBase tables.  I am trying to do an operation
like the following: for every event in Events2, find and join with the event
that immediately precedes (timestamp <=) this event in table Events1.

This operation would result in something like:

JoinedEvents:
t2, events2_value1, events1_value1
t3, events2_value2, events1_value2
t4, events2_value3, events1_value2

etc.

What is the best way to go about this in Hadoop?

Thanks in advance for the help.