Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> how to model data based on "time bucket"


Copy link to this message
-
Re: how to model data based on "time bucket"
Yes, you are correct, event3 never emits for the time "10:07".
The proper result table is, as you mention:
=======================event1 | event2
event2 | event3
event3 |

I guess i was thinking about the old example(T=7). :)

On Thu, Jan 31, 2013 at 12:39 PM, Oleg Ruchovets <[EMAIL PROTECTED]>wrote:

> Hi Rodrigo ,
>
>   That is just GREAT Idea :-) !!!
>
>  But how did you get a final result:
>
> =======================> event1 | event2, event3
> event2 | event3
> event3 |
> I tried to simulate and didn't get event1| event2,event3
>
>
>    (10:03, [*after*, event1])
>    (10:04, [*after*, event1])
>    (10:05, [*after*, event1])
>    (10:06, [*after*, event1]), (10:06, [*after*, event2])
>    (10:07, *[*begin*,*event1]) , (10:07, [*after*, event2])
>    (10:08, [*after*, event2]), (10:08, [*after*, event3])
>    (10:09, [*after *, event2]),   (10:09, [*after*, event3])
>    (10:10, *[*begin*, *event2]), (10:10, [*after*, event3])
>    (10:11, [*after *, event3])
>    (10:12, *[*begin*, *event3])
>
> Thanks
> Oleg.
>
>
>
>
> On Thu, Jan 31, 2013 at 4:34 PM, Rodrigo Ribeiro <
> [EMAIL PROTECTED]> wrote:
>
> > Hi,
> > The Map and Reduce steps that you mention is the same as how i though.
> >
> > How should I work with this table.Should I have to scan Main table : row
> by
> > > row and for every row get event time and based on that time query
> second
> > > table?
> > >
> > >     In case I will do so , i still need to execute 50 million request?
> > >
> > > May be I need to work only with second table. How do I know what to
> query
> > > (scan)?
> >
> >
> > Yes, using that approach you need to query both tables for each eventId
> you
> > need to lookup.
> >
> > I thought about something else right now, i think it'll be better for
> your
> > use case.
> > You could could distinguish the events that begin and those that are
> after
> > a time when you emit it.
> > For the example using T=5, the emits would be:
> >
> > For event1 in map phase will be (10:07, [*begin*,event1]) , (10:06,
> > [*after*,
> > event1]) , (10:05, [*after*, event1]), (10:04, [*after*, event1]),
> (10:03,
> > [
> > *after*, event1]).
> > For event2 in map phase will be (10:10, [*begin*, event2]) , (10:09,
> > [*after
> > *, event2]) , (10:08, [*after*, event2]), (10:07, [*after*, event2]),
> > (10:06, [*after*, event2]).
> > For event3 in map phase will be (10:12, [*begin*, event3]) , (10:11,
> > [*after
> > *, event3]) , (10:10, [*after*, event3]), (10:09, [*after*, event3]),
> > (10:08, [*after*, event3]).
> >
> >
> > So, the reduce step know exactly who began in a given time and those in
> the
> > window of time after it.
> >
> > The reduce step for key "10:07", would receive { [*begin*, event1],
> > [*after*,
> > event2], [*after*, event3] },
> > So you know that event1 began in this time and events 2 and 3 are in his
> > window of time, and save it to a second table.
> >
> > The reduce step for key "10:06", would receive { [*after*, event1],
> > [*after*,
> > event2]},
> > No event began this time, so don't need to save.
> >
> > After all this, you gets a second table that i believe contains exactly
> > what you want:
> > eventid | events_window_time
> > =======================> > event1  | event2, event3
> > event2  | event3
> > event3  |
> >
> > Let me know if i'm not being clear.
> >
> > On Thu, Jan 31, 2013 at 10:52 AM, Oleg Ruchovets <[EMAIL PROTECTED]
> > >wrote:
> >
> > > Hi Rodrigo ,
> > >   As usual you have very intereting ! :-)
> > >
> > > I am not sure that I understand exactly what do you mean and I try to
> > > simulate:
> > >      Suppose we have such events in MAIN Table:
> > >             event1 | 10:07
> > >             event2 | 10:10
> > >             event3 | 10:12
> > >      Time window T=5 minutes.
> > >
> > > =================on  map================ :
> > >
> > > what should I emit for event1 and event2
> > >
> > > For event1 in map phase will be (10:07 ,event1) , (10:06 ,event1) ,
> > (10:05
> > > ,event1), (10:04 ,event1), (10:03 ,event1).
*Rodrigo Pereira Ribeiro*
Software Developer
www.jusbrasil.com.br
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB