Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> how to model data based on "time bucket"


Copy link to this message
-
Re: how to model data based on "time bucket"
Yes ,
     I agree that using only timestamp it will cause hotspot. I can create
prespliting for regions.
I saw TSDB video and presentation and their data model. I think this is not
suitable for my case.

I looked thru google alot and for my surprise there is  any post about such
clasic problem. It is very strange.

I try to group timeseries not like most solutions provides -- every 1h ,
1day , 5 minutes. it is simple.
I need to group element relatively to itself by time:   I mean I have
{event1: 10:05} and I want to group it with elements which was after 10:05
during time X. in case X=7 minutes all events between 10:05 and 10:12 will
be in the group.

It is like a join of each row  with all other rows , but the performance
will be very bad. Currently I have 50Millon events => so it will be
50Million^2.

  That is why I don't want to use pure map/reduce. I want to use hbase as
output of map/reduce and model data in a such way I described above.

So is there a way to model data in such tipe of time buckets?
Please advice.

Thanks
Oleg.

On Mon, Jan 28, 2013 at 5:54 PM, Michel Segel <[EMAIL PROTECTED]>wrote:

> Tough one in that if your events are keyed on time alone, you will hit a
> hot spot on write. Reads,not so much...
>
> TSDB would be a good start ...
>
> You may not need 'buckets' but just a time stamp  and set up a start and
> stop key values.
>
> Sent from a remote device. Please excuse any typos...
>
> Mike Segel
>
> On Jan 28, 2013, at 7:06 AM, Oleg Ruchovets <[EMAIL PROTECTED]> wrote:
>
> > Hi ,
> >
> > I have such row data structure:
> >
> > event_id | time
> > ============> > event1 | 10:07
> > event2 | 10:10
> > event3 | 10:12
> >
> > event4 | 10:20
> > event5 | 10:23
> > event6 | 10:25
> >
> >
> > Numbers of records is 50-100 million.
> >
> >
> > Question:
> >
> > I need to find group of events starting form eventX and enters to the
> time
> > window bucket = T.
> >
> >
> > For example: if T=7 munutes.
> > Starting from event event1- {event1, event2 , event3} were detected
> durint
> > 7 minutes.
> >
> > Starting from event event2- {event2 , event3} were detected durint 7
> > minutes.
> >
> > Starting from event event4 - {event4, event5 , event6} were detected
> during
> > 7 minutes.
> > Is there a way to model the data in hbase to get?
> >
> > Thanks
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB