Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # user >> aggregation by time window


Copy link to this message
-
Re: aggregation by time window
Hi again,

the idea is that you emit every event multiple times. So your map input record (event1, 10:07) will be emitted seven times during the map() call. Like I said, (10:04,event1), (10:05,event1), ..., (10:10,event1) will be the seven outputs for processing a single event.

The output key will be the time stamps in which neighbourhood or interval each event should be joined with events that happened +/- 3 minutes near it. So events which happened within a 7 minutes distance will both be emitted with the same time stamp as the map() output, and thus meet in a reduce() call.

A reduce() call will look like this: reduce(10:03, list_of_events). And those events had time stamps between 10:00 and 10:06 in the original input.

Kai

Am 28.01.2013 um 14:43 schrieb Oleg Ruchovets <[EMAIL PROTECTED]>:

> Hi Kai.
>    It is very interesting. Can you please explain in more details your
> Idea?
> What will be a key in a map phase?
>
> Suppose we have event at 10:07. How would you emit this to the multiple
> buckets?
>
> Thanks
> Oleg.
>
>
> On Mon, Jan 28, 2013 at 3:17 PM, Kai Voigt <[EMAIL PROTECTED]> wrote:
>
>> Quick idea:
>>
>> since each of your events will go into several buckets, you could use
>> map() to emit each item multiple times for each bucket.
>>
>> Am 28.01.2013 um 13:56 schrieb Oleg Ruchovets <[EMAIL PROTECTED]>:
>>
>>> Hi ,
>>>   I have such row data structure:
>>>
>>> event_id  |   time
>>> =============>>> event1     |  10:07
>>> event2     |  10:10
>>> event3     |  10:12
>>>
>>> event4     |   10:20
>>> event5     |   10:23
>>> event6     |   10:25
>>
>> map(event1,10:07) would emit (10:04,event1), (10:05,event1), ...,
>> (10:10,event1) and so on.
>>
>> In reduce(), all your desired events would meet for the same minute.
>>
>> Kai
>>
>> --
>> Kai Voigt
>> [EMAIL PROTECTED]
>>
>>
>>
>>
>>

--
Kai Voigt
[EMAIL PROTECTED]
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB