Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> Write TimeSeries Data and Do Time Based Range Scans


Copy link to this message
-
Re: Write TimeSeries Data and Do Time Based Range Scans
Yeah, I saw that. In fact that is why I recommended that to you as I
couldn't infer from your email that whether you have already gone through
that source or not. A source, who did the exact same thing and discuss it
in much more detail and concerns aligning with yours (in fact I think some
of the authors/creators of that link/group are members here of this
community as well.)

Regards,
Shahab
On Mon, Sep 23, 2013 at 8:41 PM, anil gupta <[EMAIL PROTECTED]> wrote:

> Hi Shahab,
>
> If you read my solution carefully. I am already doing that.
>
> Thanks,
> Anil Gupta
>
>
> On Mon, Sep 23, 2013 at 3:51 PM, Shahab Yunus <[EMAIL PROTECTED]
> >wrote:
>
> >
> >
> http://blog.sematext.com/2012/04/09/hbasewd-avoid-regionserver-hotspotting-despite-writing-records-with-sequential-keys/
> >
> > Here you can find the discussion, trade-offs and working code/API (even
> for
> > M/R) about this and the approach you are trying out.
> >
> > Regards,
> > Shahab
> >
> >
> > On Mon, Sep 23, 2013 at 5:41 PM, anil gupta <[EMAIL PROTECTED]>
> wrote:
> >
> > > Hi All,
> > >
> > > I have a secondary index(inverted index) table with a rowkey on the
> basis
> > > of Timestamp of an event. Assume the rowkey as <TimeStamp in Epoch>.
> > > I also store some extra(apart from main_table rowkey) columns in that
> > table
> > > for doing filtering.
> > >
> > > The requirement is to do range-based scan on the basis of time of
> > > event.  Hence, the index with this rowkey.
> > > I cannot use Hashing or MD5 digest solution because then i cannot do
> > range
> > > based scans.  And, i already have a index like OpenTSDB in another
> table
> > > for the same dataset.(I have many secondary Index for same data set.)
> > >
> > > Problem: When we increase the write workload during stress test. Time
> > > secondary index becomes a bottleneck due to the famous Region
> HotSpotting
> > > problem.
> > > Solution: I am thinking of adding a prefix of { (<TimeStamp in
> > Epoch>%10) > > > bucket}  in the rowkey. Then my row key will become:
> > >  <Bucket><TimeStamp in Epoch>
> > > By using above rowkey i can at least alleviate *WRITE* problem.(i don't
> > > think problem can be fixed permanently because of the use case
> > requirement.
> > > I would love to be proven wrong.)
> > > However, with the above row key, now when i want to *READ* data, for
> > every
> > > single range scans i have to read data from 10 different regions. This
> > > extra load for read is scaring me a bit.
> > >
> > > I am wondering if anyone has better suggestion/approach to solve this
> > > problem given the constraints i have.  Looking for feedback from
> > community.
> > >
> > > --
> > > Thanks & Regards,
> > > Anil Gupta
> > >
> >
>
>
>
> --
> Thanks & Regards,
> Anil Gupta
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB