Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase, mail # user - how to model data based on "time bucket"


Copy link to this message
-
Re: how to model data based on "time bucket"
Oleg Ruchovets 2013-01-28, 16:24
Yes ,
     I agree that using only timestamp it will cause hotspot. I can create
prespliting for regions.
I saw TSDB video and presentation and their data model. I think this is not
suitable for my case.

I looked thru google alot and for my surprise there is  any post about such
clasic problem. It is very strange.

I try to group timeseries not like most solutions provides -- every 1h ,
1day , 5 minutes. it is simple.
I need to group element relatively to itself by time:   I mean I have
{event1: 10:05} and I want to group it with elements which was after 10:05
during time X. in case X=7 minutes all events between 10:05 and 10:12 will
be in the group.

It is like a join of each row  with all other rows , but the performance
will be very bad. Currently I have 50Millon events => so it will be
50Million^2.

  That is why I don't want to use pure map/reduce. I want to use hbase as
output of map/reduce and model data in a such way I described above.

So is there a way to model data in such tipe of time buckets?
Please advice.

Thanks
Oleg.

On Mon, Jan 28, 2013 at 5:54 PM, Michel Segel <[EMAIL PROTECTED]>wrote:

> Tough one in that if your events are keyed on time alone, you will hit a
> hot spot on write. Reads,not so much...
>
> TSDB would be a good start ...
>
> You may not need 'buckets' but just a time stamp  and set up a start and
> stop key values.
>
> Sent from a remote device. Please excuse any typos...
>
> Mike Segel
>
> On Jan 28, 2013, at 7:06 AM, Oleg Ruchovets <[EMAIL PROTECTED]> wrote:
>
> > Hi ,
> >
> > I have such row data structure:
> >
> > event_id | time
> > ============> > event1 | 10:07
> > event2 | 10:10
> > event3 | 10:12
> >
> > event4 | 10:20
> > event5 | 10:23
> > event6 | 10:25
> >
> >
> > Numbers of records is 50-100 million.
> >
> >
> > Question:
> >
> > I need to find group of events starting form eventX and enters to the
> time
> > window bucket = T.
> >
> >
> > For example: if T=7 munutes.
> > Starting from event event1- {event1, event2 , event3} were detected
> durint
> > 7 minutes.
> >
> > Starting from event event2- {event2 , event3} were detected durint 7
> > minutes.
> >
> > Starting from event event4 - {event4, event5 , event6} were detected
> during
> > 7 minutes.
> > Is there a way to model the data in hbase to get?
> >
> > Thanks
>