Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
HBase >> mail # user >> Write TimeSeries Data and Do Time Based Range Scans


+
anil gupta 2013-09-23, 21:41
+
Shahab Yunus 2013-09-23, 22:51
Copy link to this message
-
Re: Write TimeSeries Data and Do Time Based Range Scans
Hi Shahab,

If you read my solution carefully. I am already doing that.

Thanks,
Anil Gupta
On Mon, Sep 23, 2013 at 3:51 PM, Shahab Yunus <[EMAIL PROTECTED]>wrote:

>
> http://blog.sematext.com/2012/04/09/hbasewd-avoid-regionserver-hotspotting-despite-writing-records-with-sequential-keys/
>
> Here you can find the discussion, trade-offs and working code/API (even for
> M/R) about this and the approach you are trying out.
>
> Regards,
> Shahab
>
>
> On Mon, Sep 23, 2013 at 5:41 PM, anil gupta <[EMAIL PROTECTED]> wrote:
>
> > Hi All,
> >
> > I have a secondary index(inverted index) table with a rowkey on the basis
> > of Timestamp of an event. Assume the rowkey as <TimeStamp in Epoch>.
> > I also store some extra(apart from main_table rowkey) columns in that
> table
> > for doing filtering.
> >
> > The requirement is to do range-based scan on the basis of time of
> > event.  Hence, the index with this rowkey.
> > I cannot use Hashing or MD5 digest solution because then i cannot do
> range
> > based scans.  And, i already have a index like OpenTSDB in another table
> > for the same dataset.(I have many secondary Index for same data set.)
> >
> > Problem: When we increase the write workload during stress test. Time
> > secondary index becomes a bottleneck due to the famous Region HotSpotting
> > problem.
> > Solution: I am thinking of adding a prefix of { (<TimeStamp in
> Epoch>%10) > > bucket}  in the rowkey. Then my row key will become:
> >  <Bucket><TimeStamp in Epoch>
> > By using above rowkey i can at least alleviate *WRITE* problem.(i don't
> > think problem can be fixed permanently because of the use case
> requirement.
> > I would love to be proven wrong.)
> > However, with the above row key, now when i want to *READ* data, for
> every
> > single range scans i have to read data from 10 different regions. This
> > extra load for read is scaring me a bit.
> >
> > I am wondering if anyone has better suggestion/approach to solve this
> > problem given the constraints i have.  Looking for feedback from
> community.
> >
> > --
> > Thanks & Regards,
> > Anil Gupta
> >
>

--
Thanks & Regards,
Anil Gupta
+
Shahab Yunus 2013-09-24, 01:15
+
anil gupta 2013-09-24, 15:12
+
Shahab Yunus 2013-09-24, 15:39
+
James Taylor 2013-09-24, 16:36
+
anil gupta 2013-09-26, 06:57