Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop, mail # user - aggregation by time window


Copy link to this message
-
Re: aggregation by time window
Kai Voigt 2013-01-28, 13:48
Hi again,

the idea is that you emit every event multiple times. So your map input record (event1, 10:07) will be emitted seven times during the map() call. Like I said, (10:04,event1), (10:05,event1), ..., (10:10,event1) will be the seven outputs for processing a single event.

The output key will be the time stamps in which neighbourhood or interval each event should be joined with events that happened +/- 3 minutes near it. So events which happened within a 7 minutes distance will both be emitted with the same time stamp as the map() output, and thus meet in a reduce() call.

A reduce() call will look like this: reduce(10:03, list_of_events). And those events had time stamps between 10:00 and 10:06 in the original input.

Kai

Am 28.01.2013 um 14:43 schrieb Oleg Ruchovets <[EMAIL PROTECTED]>:

> Hi Kai.
>    It is very interesting. Can you please explain in more details your
> Idea?
> What will be a key in a map phase?
>
> Suppose we have event at 10:07. How would you emit this to the multiple
> buckets?
>
> Thanks
> Oleg.
>
>
> On Mon, Jan 28, 2013 at 3:17 PM, Kai Voigt <[EMAIL PROTECTED]> wrote:
>
>> Quick idea:
>>
>> since each of your events will go into several buckets, you could use
>> map() to emit each item multiple times for each bucket.
>>
>> Am 28.01.2013 um 13:56 schrieb Oleg Ruchovets <[EMAIL PROTECTED]>:
>>
>>> Hi ,
>>>   I have such row data structure:
>>>
>>> event_id  |   time
>>> =============>>> event1     |  10:07
>>> event2     |  10:10
>>> event3     |  10:12
>>>
>>> event4     |   10:20
>>> event5     |   10:23
>>> event6     |   10:25
>>
>> map(event1,10:07) would emit (10:04,event1), (10:05,event1), ...,
>> (10:10,event1) and so on.
>>
>> In reduce(), all your desired events would meet for the same minute.
>>
>> Kai
>>
>> --
>> Kai Voigt
>> [EMAIL PROTECTED]
>>
>>
>>
>>
>>

--
Kai Voigt
[EMAIL PROTECTED]