|
|
-
Re: Rowkey design and presplit tableJames Taylor 2013-03-07, 08:42
Another possible solution for you: use Phoenix:
https://github.com/forcedotcom/phoenix Phoenix would allow you to model your scenario using SQL through JDBC, like this: Connection conn = DriverManager.connect("jdbc:phoenix:<your zookeeper quorum>"); Statement stmt = conn.createStatement( "CREATE TABLE article_discussion (" + " category_id BIGINT not null," + " article_id BIGINT not null," + " comment_id BIGINT not null " CONSTRAINT pk PRIMARY KEY (category_id, article_id, comment_id))"); stmt.execute(); You could, of course, add other non primary key columns to the above. You could also pre-split the table and/or pass through any HBase configuration parameters necessary. Then, you populate your table like this: PrepareStatement stmt = conn.prepareStatement( "UPSERT INTO article_discussion VALUES(?,?,?)"); stmt.setLong("category_id", categoryId); stmt.setLong("article_id", articleId); stmt.setLong("comment_id", commentId); And to query for all comments in a given category, you'd do this: PrepareStatement stmt1 = conn.prepareStatement( "SELECT * FROM article_discussion WHERE category_id = ?"); stmt1.setLong("category_id", categoryId); ResultSet rs = stmt1.executeQuery(); while (rs.next()) { System.out.println( "category_id=" + rs.getLong("category_id") + ",article_id=" + rs.getLong("article_id") + ",comment_id=" + rs.getLong("comment_id")); } and for all comments for a given category and article: PrepareStatement stmt1 = conn.prepareStatement( "SELECT * FROM article_discussion WHERE category_id = ? AND article_id = ?"); stmt1.setLong("category_id", categoryId); stmt1.setLong("article_id", articleId); you could use arbitrary where clauses like this: SELECT * FROM article_discussion WHERE (article_name LIKE '% baseball %' OR article_name LIKE '% food %') AND category_id IN (?, ?, ?) AND published_date >= to_date('2013-01-01 00:00:00') AND published_date <= to_date('2013-01-31 00:00:00') or do aggregation like this: SELECT category_id, count(article_id), count(comment_id) FROM article_discussion WHERE published_date > ? GROUP BY category_id HAVING count(comment_id) > 100 Regards, James On 03/06/2013 11:42 PM, Asaf Mesika wrote: > I would convert each id to long and then use Bytes.toBytes to convert this > long to a byte array. If it is an int then even better. > Now, write all 3 longs one after another to one array which will be your > rowkey. > This gives you: > * fixed size > * small row key - 3*8 bytes if you use long and 3*4 for int. > > Why do you need to use prefix split policy? > > On Monday, March 4, 2013, Lukáš Drbal wrote: > >> Hi, >> >> i have one question about rowkey design and presplit table. >> >> My usecase: >> I need store a lot of comments where each comment are for one article and >> this article has one category. >> >> What i need: >> 1) read one comment by id (where i know commentId, articleId and >> categoryId) >> 2) read all coments for article (i know categoryId and articleId) >> 3) read all comments for category (i know categoryId) >> >> From this read pattern i see one good rowkey: >> <categoryId>_<articleId>_<commentId> >> >> But here i don't have fixed size of rowkey, so i don't know how to define >> split pattern. How can be this solved? >> This id's come from external system and grow very fast, so add some like >> "padding" for each part are hard. >> >> Maybe i can use hash function for each part >> md5(<categoryId>_md5(<articleId>)_md5(<commentId>), but this rowkey is very >> long (3*32+2 bytes), i don't have experience with this long rowkeys. >> >> Can someone give me a suggestions please? >> >> Regards >> >> Lukas Drbal >> |