Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
HBase >> mail # user >> Row Key Design in time based aplication


+
Mehmet Simsek 2013-02-17, 19:33
+
James Taylor 2013-02-17, 22:50
+
Michael Segel 2013-02-17, 23:18
Copy link to this message
-
Re: Row Key Design in time based aplication
Hello Mehmet,

 If ProjectIds are sequential, then it is definitely not a feasible
approach. Division is just to make sure that all the regions are
evenly loaded. You can create pre-splitted tables to avoid the
region hotspotting. Alternatively hash your rowkeys so that all
the regionservers receive equal load.

Warm Regards,
Tariq
https://mtariq.jux.com/
cloudfront.blogspot.com
On Mon, Feb 18, 2013 at 4:48 AM, Michael Segel <[EMAIL PROTECTED]>wrote:

> I'm not sure how a SQL interface above HBase will solve some of the issues
> with regional hot spotting when using time as the key. Or the problem with
> always adding data to the right of the last row.
>
> The same would apply with the project id, assuming that it too is a number
> that grows incrementally with each project.
> On Feb 17, 2013, at 4:50 PM, James Taylor <[EMAIL PROTECTED]> wrote:
>
> > Hello,
> > Have you considered using Phoenix (
> https://github.com/forcedotcom/phoenix) for this use case? Phoenix is a
> SQL layer on top of HBase. For this use case, you'd connect to your cluster
> like this:
> >
> > Class.forName("com.salesforce.phoenix.jdbc.PhoenixDriver"); // register
> driver
> > Connection conn > DriverManager..getConnection("jdbc:phoenix:localhost"); // connect to local
> HBase
> >
> > Create a table like this (adding additional columns that you want to
> measure, like txn_count below):
> >
> > conn.createStatement().execute(
> >    "CREATE TABLE event_log (\n" +
> >    "     project_id INTEGER NOT NULL, \n" +
> >    "    time DATE NOT NULL,\n" +
> >    "txn_count LONG\n" +
> >    "CONSTRAINT pk PRIMARY KEY (project_id, time))");
> >
> > Then to insert data you'd do this:
> >
> > PreparedStatement preparedStmt = conn.prepareStatement(
> >    "UPSERT INTO event_log VALUES(?,?,0)");
> >
> > and you'd bind the values in JDBC like this:
> >
> > preparedStmt.setInt(1, projectId);
> > preparedStmt.setDate(2, time);
> > preparedStmt.execute();
> >
> > conn.commit(); // If upserting many values, you'd want to commit after
> upserting maybe 1000-10000 rows
> >
> > Then at query data time, assuming you want to report on this data by
> grouping into different "time buckets", you could do as show below. Phoenix
> stores your date values at the millisecond granularity and you can decide a
> query time how you'd like to roll it up:
> >
> > // Query with time bucket at the hour granularity
> > conn.createStatement().execute(
> >   "SELECT\n" +
> >   "    project_id, TRUNC(time,'HOUR') as time_bucket, \n" +
> >   "    MIN(txnCount), MAX(txnCount), AVG(txnCount) FROM event_log\n" +
> >   "GROUP BY project_id, TRUNC(time,'HOUR')");
> >
> > // Query with time bucket at the day granularity
> > conn.createStatement().execute(
> >    "SELECT\n" +
> >    "    project_id, TRUNC(time,'DAY') as time_bucket,\n" +
> >    "    MIN(txnCount), MAX(txnCount), AVG(txnCount) FROM event_log\n" +
> >    "GROUP BY project_id, TRUNC(time,'DAY')");
> >
> > You could, of course include a WHERE clause in the query to filter based
> on the range of dates, particular projectIds, etc. like this:
> >
> > conn.prepareStatement(
> >    "SELECT\n" +
> >    "    project_id, TRUNC(time,'DAY') as time_bucket,\n" +
> >    "    MIN(txnCount), MAX(txnCount), AVG(txnCount) FROM event_log\n" +
> >    "WHERE project_id IN (?, ?, ?) AND date >= ? AND date < ?\n" +
> >    "GROUP BY project_id, TRUNC(time,'DAY')");
> > preparedStmt.setInt(1, projectId1);
> > preparedStmt.setInt(2, projectId2);
> > preparedStmt.setInt(3, projectId3);
> > preparedStmt.setDate(4, beginDate);
> > preparedStmt.setDate(5, endDate);
> > preparedStmt.execute();
> >
> >
> > HTH.
> >
> > Regards,
> >
> >    James
> >
> > On 02/17/2013 11:33 AM, Mehmet Simsek wrote:
> >> Hi,
> >>
> >> I want to hold event log data in hbase but I couldn't decide row key. I
> must hold project id and time,I will use project ld and time combination
> while searching.
> >>
> >> Row key can be below
> >>
> >> ProjectId+timeInMs
> >>
> >> In similiar application(open source TSDB) time is divided 1000 to round
+
James Taylor 2013-02-18, 01:19
+
Michael Segel 2013-02-18, 01:29
+
Asaf Mesika 2013-02-21, 22:49
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB