Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase, mail # user - Write TimeSeries Data and Do Time Based Range Scans


Copy link to this message
-
Re: Write TimeSeries Data and Do Time Based Range Scans
Shahab Yunus 2013-09-24, 15:39
I'm only know of the links already embedded in the blog page that I sent
you or you have this.
https://groups.google.com/forum/#!forum/hbasewd

Regards,
Shahab
On Tue, Sep 24, 2013 at 11:12 AM, anil gupta <[EMAIL PROTECTED]> wrote:

> Inline
>
> On Mon, Sep 23, 2013 at 6:15 PM, Shahab Yunus <[EMAIL PROTECTED]
> >wrote:
>
> > Yeah, I saw that. In fact that is why I recommended that to you as I
> > couldn't infer from your email that whether you have already gone through
> > that source or not.
>
> Yes, i was aware of that article. But my read pattern is slighty different
> from that article.We are using HBase as DataSource for a RestFul service.
> Even though if my range scan finds 400 rows with a specified timerange. I
> only return top 20 for one rest request. So, if in case i do bucketing(lets
> say bucket=10) then i will need to fetch 20 results from each bucket and
> then i will have to do a merge sort on the client size and return final 20.
> You can assume that i need to return the 20rows sorted by timestamp.
>
> > A source, who did the exact same thing and discuss it
> > in much more detail and concerns aligning with yours (in fact I think
> some
> > of the authors/creators of that link/group are members here of this
> > community as well.)
>
> Do you know what the outcome of their experiment? Do you have any link for
> that? Thanks for your time and help.
>
>
> >
> > Regards,
> > Shahab
> >
> >
> > On Mon, Sep 23, 2013 at 8:41 PM, anil gupta <[EMAIL PROTECTED]>
> wrote:
> >
> > > Hi Shahab,
> > >
> > > If you read my solution carefully. I am already doing that.
> > >
> > > Thanks,
> > > Anil Gupta
> > >
> > >
> > > On Mon, Sep 23, 2013 at 3:51 PM, Shahab Yunus <[EMAIL PROTECTED]
> > > >wrote:
> > >
> > > >
> > > >
> > >
> >
> http://blog.sematext.com/2012/04/09/hbasewd-avoid-regionserver-hotspotting-despite-writing-records-with-sequential-keys/
> > > >
> > > > Here you can find the discussion, trade-offs and working code/API
> (even
> > > for
> > > > M/R) about this and the approach you are trying out.
> > > >
> > > > Regards,
> > > > Shahab
> > > >
> > > >
> > > > On Mon, Sep 23, 2013 at 5:41 PM, anil gupta <[EMAIL PROTECTED]>
> > > wrote:
> > > >
> > > > > Hi All,
> > > > >
> > > > > I have a secondary index(inverted index) table with a rowkey on the
> > > basis
> > > > > of Timestamp of an event. Assume the rowkey as <TimeStamp in
> Epoch>.
> > > > > I also store some extra(apart from main_table rowkey) columns in
> that
> > > > table
> > > > > for doing filtering.
> > > > >
> > > > > The requirement is to do range-based scan on the basis of time of
> > > > > event.  Hence, the index with this rowkey.
> > > > > I cannot use Hashing or MD5 digest solution because then i cannot
> do
> > > > range
> > > > > based scans.  And, i already have a index like OpenTSDB in another
> > > table
> > > > > for the same dataset.(I have many secondary Index for same data
> set.)
> > > > >
> > > > > Problem: When we increase the write workload during stress test.
> Time
> > > > > secondary index becomes a bottleneck due to the famous Region
> > > HotSpotting
> > > > > problem.
> > > > > Solution: I am thinking of adding a prefix of { (<TimeStamp in
> > > > Epoch>%10) > > > > > bucket}  in the rowkey. Then my row key will become:
> > > > >  <Bucket><TimeStamp in Epoch>
> > > > > By using above rowkey i can at least alleviate *WRITE* problem.(i
> > don't
> > > > > think problem can be fixed permanently because of the use case
> > > > requirement.
> > > > > I would love to be proven wrong.)
> > > > > However, with the above row key, now when i want to *READ* data,
> for
> > > > every
> > > > > single range scans i have to read data from 10 different regions.
> > This
> > > > > extra load for read is scaring me a bit.
> > > > >
> > > > > I am wondering if anyone has better suggestion/approach to solve
> this
> > > > > problem given the constraints i have.  Looking for feedback from
> > > > community.
> > > > >
> > > > > --