Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase, mail # user - Write TimeSeries Data and Do Time Based Range Scans


Copy link to this message
-
Re: Write TimeSeries Data and Do Time Based Range Scans
anil gupta 2013-09-26, 06:57
Hi James,

Great to hear that Phoenix supports this kind of table schema. Seems like
i'll go ahead with Bucketing for timeseries data.

Thanks,
Anil
On Tue, Sep 24, 2013 at 9:36 AM, James Taylor <[EMAIL PROTECTED]>wrote:

> Hey Anil,
> The solution you've described is the best we've found for Phoenix (inspired
> by the work of Alex at Sematext).
> You can do all of this in a few lines of SQL:
>
> CREATE TABLE event_data(
>     who VARCHAR, type SMALLINT, id BIGINT, when DATE, payload VARBINARY
>     CONSTRAINT pk PRIMARY KEY (who, type, id))
>     IMMUTABLE_ROWS=true;  // Declare event table as having immutable rows
> CREATE INDEX event_data_index ON event_data(when, type, who)
> INCLUDE(payload)
>     SALT_BUCKETS=10;  // Salt the index since it'll create write hotspots
> otherwise
>
> The following query would display event count per type across all users
> over the last week.
> It would automatically use the index:
>
> SELECT type, count(*) FROM event_data WHERE when > CURRENT_DATE() - 7 GROUP
> BY type
>
> The following query would display the event count by type for a particular
> user. It would
> automatically use the data table:
>
> SELECT who, type, count(*) FROM event_data WHERE who = ? GROUP BY who,
> type;
>
> As far as the read cost associated with reading from a salted table, we've
> found in most cases
> it actually performs better, because you get better parallelization. The
> case where it performs
> worse is on a selective query that returns a smallish set of rows that
> normally would be in the same
> block. In this case, you're reading an entire block for each row, where in
> the worst case these
> would be neighbors in the same block on an unsalted table.
>
> HTH,
>
> James
>
> On Tue, Sep 24, 2013 at 8:12 AM, anil gupta <[EMAIL PROTECTED]> wrote:
>
> > Inline
> >
> > On Mon, Sep 23, 2013 at 6:15 PM, Shahab Yunus <[EMAIL PROTECTED]
> > >wrote:
> >
> > > Yeah, I saw that. In fact that is why I recommended that to you as I
> > > couldn't infer from your email that whether you have already gone
> through
> > > that source or not.
> >
> > Yes, i was aware of that article. But my read pattern is slighty
> different
> > from that article.We are using HBase as DataSource for a RestFul service.
> > Even though if my range scan finds 400 rows with a specified timerange. I
> > only return top 20 for one rest request. So, if in case i do
> bucketing(lets
> > say bucket=10) then i will need to fetch 20 results from each bucket and
> > then i will have to do a merge sort on the client size and return final
> 20.
> > You can assume that i need to return the 20rows sorted by timestamp.
> >
>
>
> >
> > > A source, who did the exact same thing and discuss it
> > > in much more detail and concerns aligning with yours (in fact I think
> > some
> > > of the authors/creators of that link/group are members here of this
> > > community as well.)
> >
> > Do you know what the outcome of their experiment? Do you have any link
> for
> > that? Thanks for your time and help.
> >
> >
> > >
> > > Regards,
> > > Shahab
> > >
> > >
> > > On Mon, Sep 23, 2013 at 8:41 PM, anil gupta <[EMAIL PROTECTED]>
> > wrote:
> > >
> > > > Hi Shahab,
> > > >
> > > > If you read my solution carefully. I am already doing that.
> > > >
> > > > Thanks,
> > > > Anil Gupta
> > > >
> > > >
> > > > On Mon, Sep 23, 2013 at 3:51 PM, Shahab Yunus <
> [EMAIL PROTECTED]
> > > > >wrote:
> > > >
> > > > >
> > > > >
> > > >
> > >
> >
> http://blog.sematext.com/2012/04/09/hbasewd-avoid-regionserver-hotspotting-despite-writing-records-with-sequential-keys/
> > > > >
> > > > > Here you can find the discussion, trade-offs and working code/API
> > (even
> > > > for
> > > > > M/R) about this and the approach you are trying out.
> > > > >
> > > > > Regards,
> > > > > Shahab
> > > > >
> > > > >
> > > > > On Mon, Sep 23, 2013 at 5:41 PM, anil gupta <[EMAIL PROTECTED]
> >
> > > > wrote:
> > > > >
> > > > > > Hi All,
> > > > > >
> > > > > > I have a secondary index(inverted index) table with a rowkey on

Thanks & Regards,
Anil Gupta