Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop, mail # user - aggregation by time window


Copy link to this message
-
Re: aggregation by time window
Kai Voigt 2013-01-28, 13:17
Quick idea:

since each of your events will go into several buckets, you could use map() to emit each item multiple times for each bucket.

Am 28.01.2013 um 13:56 schrieb Oleg Ruchovets <[EMAIL PROTECTED]>:

> Hi ,
>    I have such row data structure:
>
> event_id  |   time
> =============> event1     |  10:07
> event2     |  10:10
> event3     |  10:12
>
> event4     |   10:20
> event5     |   10:23
> event6     |   10:25

map(event1,10:07) would emit (10:04,event1), (10:05,event1), ..., (10:10,event1) and so on.

In reduce(), all your desired events would meet for the same minute.

Kai

--
Kai Voigt
[EMAIL PROTECTED]