Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
HBase >> mail # user >> Row Key Design in time based aplication


+
Mehmet Simsek 2013-02-17, 19:33
+
James Taylor 2013-02-17, 22:50
+
Michael Segel 2013-02-17, 23:18
Copy link to this message
-
Re: Row Key Design in time based aplication
Hello Mehmet,

 If ProjectIds are sequential, then it is definitely not a feasible
approach. Division is just to make sure that all the regions are
evenly loaded. You can create pre-splitted tables to avoid the
region hotspotting. Alternatively hash your rowkeys so that all
the regionservers receive equal load.

Warm Regards,
Tariq
https://mtariq.jux.com/
cloudfront.blogspot.com
On Mon, Feb 18, 2013 at 4:48 AM, Michael Segel <[EMAIL PROTECTED]>wrote:

> I'm not sure how a SQL interface above HBase will solve some of the issues
> with regional hot spotting when using time as the key. Or the problem with
> always adding data to the right of the last row.
>
> The same would apply with the project id, assuming that it too is a number
> that grows incrementally with each project.
> On Feb 17, 2013, at 4:50 PM, James Taylor <[EMAIL PROTECTED]> wrote:
>
> > Hello,
> > Have you considered using Phoenix (
> https://github.com/forcedotcom/phoenix) for this use case? Phoenix is a
> SQL layer on top of HBase. For this use case, you'd connect to your cluster
> like this:
> >
> > Class.forName("com.salesforce.phoenix.jdbc.PhoenixDriver"); // register
> driver
> > Connection conn > DriverManager..getConnection("jdbc:phoenix:localhost"); // connect to local
> HBase
> >
> > Create a table like this (adding additional columns that you want to
> measure, like txn_count below):
> >
> > conn.createStatement().execute(
> >    "CREATE TABLE event_log (\n" +
> >    "     project_id INTEGER NOT NULL, \n" +
> >    "    time DATE NOT NULL,\n" +
> >    "txn_count LONG\n" +
> >    "CONSTRAINT pk PRIMARY KEY (project_id, time))");
> >
> > Then to insert data you'd do this:
> >
> > PreparedStatement preparedStmt = conn.prepareStatement(
> >    "UPSERT INTO event_log VALUES(?,?,0)");
> >
> > and you'd bind the values in JDBC like this:
> >
> > preparedStmt.setInt(1, projectId);
> > preparedStmt.setDate(2, time);
> > preparedStmt.execute();
> >
> > conn.commit(); // If upserting many values, you'd want to commit after
> upserting maybe 1000-10000 rows
> >
> > Then at query data time, assuming you want to report on this data by
> grouping into different "time buckets", you could do as show below. Phoenix
> stores your date values at the millisecond granularity and you can decide a
> query time how you'd like to roll it up:
> >
> > // Query with time bucket at the hour granularity
> > conn.createStatement().execute(
> >   "SELECT\n" +
> >   "    project_id, TRUNC(time,'HOUR') as time_bucket, \n" +
> >   "    MIN(txnCount), MAX(txnCount), AVG(txnCount) FROM event_log\n" +
> >   "GROUP BY project_id, TRUNC(time,'HOUR')");
> >
> > // Query with time bucket at the day granularity
> > conn.createStatement().execute(
> >    "SELECT\n" +
> >    "    project_id, TRUNC(time,'DAY') as time_bucket,\n" +
> >    "    MIN(txnCount), MAX(txnCount), AVG(txnCount) FROM event_log\n" +
> >    "GROUP BY project_id, TRUNC(time,'DAY')");
> >
> > You could, of course include a WHERE clause in the query to filter based
> on the range of dates, particular projectIds, etc. like this:
> >
> > conn.prepareStatement(
> >    "SELECT\n" +
> >    "    project_id, TRUNC(time,'DAY') as time_bucket,\n" +
> >    "    MIN(txnCount), MAX(txnCount), AVG(txnCount) FROM event_log\n" +
> >    "WHERE project_id IN (?, ?, ?) AND date >= ? AND date < ?\n" +
> >    "GROUP BY project_id, TRUNC(time,'DAY')");
> > preparedStmt.setInt(1, projectId1);
> > preparedStmt.setInt(2, projectId2);
> > preparedStmt.setInt(3, projectId3);
> > preparedStmt.setDate(4, beginDate);
> > preparedStmt.setDate(5, endDate);
> > preparedStmt.execute();
> >
> >
> > HTH.
> >
> > Regards,
> >
> >    James
> >
> > On 02/17/2013 11:33 AM, Mehmet Simsek wrote:
> >> Hi,
> >>
> >> I want to hold event log data in hbase but I couldn't decide row key. I
> must hold project id and time,I will use project ld and time combination
> while searching.
> >>
> >> Row key can be below
> >>
> >> ProjectId+timeInMs
> >>
> >> In similiar application(open source TSDB) time is divided 1000 to round
+
James Taylor 2013-02-18, 01:19
+
Michael Segel 2013-02-18, 01:29
+
Asaf Mesika 2013-02-21, 22:49