Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # user >> aggregation by time window


Copy link to this message
-
Re: aggregation by time window
Quick idea:

since each of your events will go into several buckets, you could use map() to emit each item multiple times for each bucket.

Am 28.01.2013 um 13:56 schrieb Oleg Ruchovets <[EMAIL PROTECTED]>:

> Hi ,
>    I have such row data structure:
>
> event_id  |   time
> =============> event1     |  10:07
> event2     |  10:10
> event3     |  10:12
>
> event4     |   10:20
> event5     |   10:23
> event6     |   10:25

map(event1,10:07) would emit (10:04,event1), (10:05,event1), ..., (10:10,event1) and so on.

In reduce(), all your desired events would meet for the same minute.

Kai

--
Kai Voigt
[EMAIL PROTECTED]
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB