Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
MapReduce >> mail # user >> timeseries merge/join question


Copy link to this message
-
timeseries merge/join question
Hey all,

I am trying to figure out the best way to approach some joining/merging
computation in a map-reduce / hbase framework.  I have the following large
timeseries datasets (key/value pairs keyed and sorted by time):

Events1:
t1, event1_value1
t3, event1_value2,
...

Event2:
t2, event2_value1
t3, event2_value2,
t4, event2_value3,
....

Currently, I am just storing these as flat files in HDFS but I have no
problems throwing them into HBase tables.  I am trying to do an operation
like the following: for every event in Events2, find and join with the event
that immediately precedes (timestamp <=) this event in table Events1.

This operation would result in something like:

JoinedEvents:
t2, events2_value1, events1_value1
t3, events2_value2, events1_value2
t4, events2_value3, events1_value2

etc.

What is the best way to go about this in Hadoop?

Thanks in advance for the help.
+
Dmitriy Ryaboy 2009-11-04, 17:18
+
Dmitriy Ryaboy 2009-11-04, 17:20
+
Jason Venner 2009-11-05, 11:20
+
Dmitriy Ryaboy 2009-11-05, 17:51
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB