Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> Write TimeSeries Data and Do Time Based Range Scans


Copy link to this message
-
Re: Write TimeSeries Data and Do Time Based Range Scans
Hi James,

Great to hear that Phoenix supports this kind of table schema. Seems like
i'll go ahead with Bucketing for timeseries data.

Thanks,
Anil
On Tue, Sep 24, 2013 at 9:36 AM, James Taylor <[EMAIL PROTECTED]>wrote:

> Hey Anil,
> The solution you've described is the best we've found for Phoenix (inspired
> by the work of Alex at Sematext).
> You can do all of this in a few lines of SQL:
>
> CREATE TABLE event_data(
>     who VARCHAR, type SMALLINT, id BIGINT, when DATE, payload VARBINARY
>     CONSTRAINT pk PRIMARY KEY (who, type, id))
>     IMMUTABLE_ROWS=true;  // Declare event table as having immutable rows
> CREATE INDEX event_data_index ON event_data(when, type, who)
> INCLUDE(payload)
>     SALT_BUCKETS=10;  // Salt the index since it'll create write hotspots
> otherwise
>
> The following query would display event count per type across all users
> over the last week.
> It would automatically use the index:
>
> SELECT type, count(*) FROM event_data WHERE when > CURRENT_DATE() - 7 GROUP
> BY type
>
> The following query would display the event count by type for a particular
> user. It would
> automatically use the data table:
>
> SELECT who, type, count(*) FROM event_data WHERE who = ? GROUP BY who,
> type;
>
> As far as the read cost associated with reading from a salted table, we've
> found in most cases
> it actually performs better, because you get better parallelization. The
> case where it performs
> worse is on a selective query that returns a smallish set of rows that
> normally would be in the same
> block. In this case, you're reading an entire block for each row, where in
> the worst case these
> would be neighbors in the same block on an unsalted table.
>
> HTH,
>
> James
>
> On Tue, Sep 24, 2013 at 8:12 AM, anil gupta <[EMAIL PROTECTED]> wrote:
>
> > Inline
> >
> > On Mon, Sep 23, 2013 at 6:15 PM, Shahab Yunus <[EMAIL PROTECTED]
> > >wrote:
> >
> > > Yeah, I saw that. In fact that is why I recommended that to you as I
> > > couldn't infer from your email that whether you have already gone
> through
> > > that source or not.
> >
> > Yes, i was aware of that article. But my read pattern is slighty
> different
> > from that article.We are using HBase as DataSource for a RestFul service.
> > Even though if my range scan finds 400 rows with a specified timerange. I
> > only return top 20 for one rest request. So, if in case i do
> bucketing(lets
> > say bucket=10) then i will need to fetch 20 results from each bucket and
> > then i will have to do a merge sort on the client size and return final
> 20.
> > You can assume that i need to return the 20rows sorted by timestamp.
> >
>
>
> >
> > > A source, who did the exact same thing and discuss it
> > > in much more detail and concerns aligning with yours (in fact I think
> > some
> > > of the authors/creators of that link/group are members here of this
> > > community as well.)
> >
> > Do you know what the outcome of their experiment? Do you have any link
> for
> > that? Thanks for your time and help.
> >
> >
> > >
> > > Regards,
> > > Shahab
> > >
> > >
> > > On Mon, Sep 23, 2013 at 8:41 PM, anil gupta <[EMAIL PROTECTED]>
> > wrote:
> > >
> > > > Hi Shahab,
> > > >
> > > > If you read my solution carefully. I am already doing that.
> > > >
> > > > Thanks,
> > > > Anil Gupta
> > > >
> > > >
> > > > On Mon, Sep 23, 2013 at 3:51 PM, Shahab Yunus <
> [EMAIL PROTECTED]
> > > > >wrote:
> > > >
> > > > >
> > > > >
> > > >
> > >
> >
> http://blog.sematext.com/2012/04/09/hbasewd-avoid-regionserver-hotspotting-despite-writing-records-with-sequential-keys/
> > > > >
> > > > > Here you can find the discussion, trade-offs and working code/API
> > (even
> > > > for
> > > > > M/R) about this and the approach you are trying out.
> > > > >
> > > > > Regards,
> > > > > Shahab
> > > > >
> > > > >
> > > > > On Mon, Sep 23, 2013 at 5:41 PM, anil gupta <[EMAIL PROTECTED]
> >
> > > > wrote:
> > > > >
> > > > > > Hi All,
> > > > > >
> > > > > > I have a secondary index(inverted index) table with a rowkey on

Thanks & Regards,
Anil Gupta