Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> how to model data based on "time bucket"


Copy link to this message
-
Re: how to model data based on "time bucket"
Hi Rodrigo ,

  That is just GREAT Idea :-) !!!

 But how did you get a final result:

=======================event1 | event2, event3
event2 | event3
event3 |
I tried to simulate and didn't get event1| event2,event3
   (10:03, [*after*, event1])
   (10:04, [*after*, event1])
   (10:05, [*after*, event1])
   (10:06, [*after*, event1]), (10:06, [*after*, event2])
   (10:07, *[*begin*,*event1]) , (10:07, [*after*, event2])
   (10:08, [*after*, event2]), (10:08, [*after*, event3])
   (10:09, [*after *, event2]),   (10:09, [*after*, event3])
   (10:10, *[*begin*, *event2]), (10:10, [*after*, event3])
   (10:11, [*after *, event3])
   (10:12, *[*begin*, *event3])

Thanks
Oleg.
On Thu, Jan 31, 2013 at 4:34 PM, Rodrigo Ribeiro <
[EMAIL PROTECTED]> wrote:

> Hi,
> The Map and Reduce steps that you mention is the same as how i though.
>
> How should I work with this table.Should I have to scan Main table : row by
> > row and for every row get event time and based on that time query second
> > table?
> >
> >     In case I will do so , i still need to execute 50 million request?
> >
> > May be I need to work only with second table. How do I know what to query
> > (scan)?
>
>
> Yes, using that approach you need to query both tables for each eventId you
> need to lookup.
>
> I thought about something else right now, i think it'll be better for your
> use case.
> You could could distinguish the events that begin and those that are after
> a time when you emit it.
> For the example using T=5, the emits would be:
>
> For event1 in map phase will be (10:07, [*begin*,event1]) , (10:06,
> [*after*,
> event1]) , (10:05, [*after*, event1]), (10:04, [*after*, event1]), (10:03,
> [
> *after*, event1]).
> For event2 in map phase will be (10:10, [*begin*, event2]) , (10:09,
> [*after
> *, event2]) , (10:08, [*after*, event2]), (10:07, [*after*, event2]),
> (10:06, [*after*, event2]).
> For event3 in map phase will be (10:12, [*begin*, event3]) , (10:11,
> [*after
> *, event3]) , (10:10, [*after*, event3]), (10:09, [*after*, event3]),
> (10:08, [*after*, event3]).
>
>
> So, the reduce step know exactly who began in a given time and those in the
> window of time after it.
>
> The reduce step for key "10:07", would receive { [*begin*, event1],
> [*after*,
> event2], [*after*, event3] },
> So you know that event1 began in this time and events 2 and 3 are in his
> window of time, and save it to a second table.
>
> The reduce step for key "10:06", would receive { [*after*, event1],
> [*after*,
> event2]},
> No event began this time, so don't need to save.
>
> After all this, you gets a second table that i believe contains exactly
> what you want:
> eventid | events_window_time
> =======================> event1  | event2, event3
> event2  | event3
> event3  |
>
> Let me know if i'm not being clear.
>
> On Thu, Jan 31, 2013 at 10:52 AM, Oleg Ruchovets <[EMAIL PROTECTED]
> >wrote:
>
> > Hi Rodrigo ,
> >   As usual you have very intereting ! :-)
> >
> > I am not sure that I understand exactly what do you mean and I try to
> > simulate:
> >      Suppose we have such events in MAIN Table:
> >             event1 | 10:07
> >             event2 | 10:10
> >             event3 | 10:12
> >      Time window T=5 minutes.
> >
> > =================on  map================ :
> >
> > what should I emit for event1 and event2
> >
> > For event1 in map phase will be (10:07 ,event1) , (10:06 ,event1) ,
> (10:05
> > ,event1), (10:04 ,event1), (10:03 ,event1).
> > For event2 in map phase will be (10:10 ,event2) , (10:09 ,event2) ,
> (10:08
> > ,event2), (10:07 ,event2), (10:06 ,event2).
> > For event3 in map phase will be (10:12 ,event3) , (10:11 ,event3) ,
> (10:10
> > ,event3), (10:09 ,event3), (10:08 ,event3).
> >
> > I calculate from the event time T=5 steps back Is it correct?
> >
> > ==================on reduce =========:
> >
> > 10:03|event1
> > 10:04|event1
> > 10:05|event1
> > 10:06|event1,event2
> > 10:07|event1,event2
> > 10:08|event2,event3
> > 10:09|event2,event3