Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
HBase, mail # user - how to model data based on "time bucket"


+
Oleg Ruchovets 2013-01-28, 13:06
+
Rodrigo Ribeiro 2013-01-28, 15:17
+
Oleg Ruchovets 2013-01-28, 15:49
+
Rodrigo Ribeiro 2013-01-28, 16:27
+
Oleg Ruchovets 2013-01-28, 17:07
+
Rodrigo Ribeiro 2013-01-28, 17:24
+
Oleg Ruchovets 2013-01-28, 17:45
+
Oleg Ruchovets 2013-01-30, 09:57
+
Rodrigo Ribeiro 2013-01-30, 18:34
+
Oleg Ruchovets 2013-01-31, 13:52
+
Rodrigo Ribeiro 2013-01-31, 14:34
+
Oleg Ruchovets 2013-01-31, 15:39
Copy link to this message
-
Re: how to model data based on "time bucket"
Rodrigo Ribeiro 2013-01-31, 15:51
Yes, you are correct, event3 never emits for the time "10:07".
The proper result table is, as you mention:
=======================event1 | event2
event2 | event3
event3 |

I guess i was thinking about the old example(T=7). :)

On Thu, Jan 31, 2013 at 12:39 PM, Oleg Ruchovets <[EMAIL PROTECTED]>wrote:

> Hi Rodrigo ,
>
>   That is just GREAT Idea :-) !!!
>
>  But how did you get a final result:
>
> =======================> event1 | event2, event3
> event2 | event3
> event3 |
> I tried to simulate and didn't get event1| event2,event3
>
>
>    (10:03, [*after*, event1])
>    (10:04, [*after*, event1])
>    (10:05, [*after*, event1])
>    (10:06, [*after*, event1]), (10:06, [*after*, event2])
>    (10:07, *[*begin*,*event1]) , (10:07, [*after*, event2])
>    (10:08, [*after*, event2]), (10:08, [*after*, event3])
>    (10:09, [*after *, event2]),   (10:09, [*after*, event3])
>    (10:10, *[*begin*, *event2]), (10:10, [*after*, event3])
>    (10:11, [*after *, event3])
>    (10:12, *[*begin*, *event3])
>
> Thanks
> Oleg.
>
>
>
>
> On Thu, Jan 31, 2013 at 4:34 PM, Rodrigo Ribeiro <
> [EMAIL PROTECTED]> wrote:
>
> > Hi,
> > The Map and Reduce steps that you mention is the same as how i though.
> >
> > How should I work with this table.Should I have to scan Main table : row
> by
> > > row and for every row get event time and based on that time query
> second
> > > table?
> > >
> > >     In case I will do so , i still need to execute 50 million request?
> > >
> > > May be I need to work only with second table. How do I know what to
> query
> > > (scan)?
> >
> >
> > Yes, using that approach you need to query both tables for each eventId
> you
> > need to lookup.
> >
> > I thought about something else right now, i think it'll be better for
> your
> > use case.
> > You could could distinguish the events that begin and those that are
> after
> > a time when you emit it.
> > For the example using T=5, the emits would be:
> >
> > For event1 in map phase will be (10:07, [*begin*,event1]) , (10:06,
> > [*after*,
> > event1]) , (10:05, [*after*, event1]), (10:04, [*after*, event1]),
> (10:03,
> > [
> > *after*, event1]).
> > For event2 in map phase will be (10:10, [*begin*, event2]) , (10:09,
> > [*after
> > *, event2]) , (10:08, [*after*, event2]), (10:07, [*after*, event2]),
> > (10:06, [*after*, event2]).
> > For event3 in map phase will be (10:12, [*begin*, event3]) , (10:11,
> > [*after
> > *, event3]) , (10:10, [*after*, event3]), (10:09, [*after*, event3]),
> > (10:08, [*after*, event3]).
> >
> >
> > So, the reduce step know exactly who began in a given time and those in
> the
> > window of time after it.
> >
> > The reduce step for key "10:07", would receive { [*begin*, event1],
> > [*after*,
> > event2], [*after*, event3] },
> > So you know that event1 began in this time and events 2 and 3 are in his
> > window of time, and save it to a second table.
> >
> > The reduce step for key "10:06", would receive { [*after*, event1],
> > [*after*,
> > event2]},
> > No event began this time, so don't need to save.
> >
> > After all this, you gets a second table that i believe contains exactly
> > what you want:
> > eventid | events_window_time
> > =======================> > event1  | event2, event3
> > event2  | event3
> > event3  |
> >
> > Let me know if i'm not being clear.
> >
> > On Thu, Jan 31, 2013 at 10:52 AM, Oleg Ruchovets <[EMAIL PROTECTED]
> > >wrote:
> >
> > > Hi Rodrigo ,
> > >   As usual you have very intereting ! :-)
> > >
> > > I am not sure that I understand exactly what do you mean and I try to
> > > simulate:
> > >      Suppose we have such events in MAIN Table:
> > >             event1 | 10:07
> > >             event2 | 10:10
> > >             event3 | 10:12
> > >      Time window T=5 minutes.
> > >
> > > =================on  map================ :
> > >
> > > what should I emit for event1 and event2
> > >
> > > For event1 in map phase will be (10:07 ,event1) , (10:06 ,event1) ,
> > (10:05
> > > ,event1), (10:04 ,event1), (10:03 ,event1).
*Rodrigo Pereira Ribeiro*
Software Developer
www.jusbrasil.com.br
+
Michel Segel 2013-01-28, 15:54
+
Oleg Ruchovets 2013-01-28, 16:24