Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> Rowkey design and presplit table


Copy link to this message
-
Re: Rowkey design and presplit table
Another possible solution for you: use Phoenix:
https://github.com/forcedotcom/phoenix

Phoenix would allow you to model your scenario using SQL through JDBC,
like this:

Connection conn = DriverManager.connect("jdbc:phoenix:<your zookeeper
quorum>");
Statement stmt = conn.createStatement(
     "CREATE TABLE article_discussion (" +
     "    category_id BIGINT not null," +
     "    article_id BIGINT not null," +
     "    comment_id BIGINT not null
     "    CONSTRAINT pk PRIMARY KEY (category_id, article_id,
comment_id))");
stmt.execute();

You could, of course, add other non primary key columns to the above.  
You could also pre-split the table and/or pass through any HBase
configuration parameters necessary.

Then, you populate your table like this:

PrepareStatement stmt = conn.prepareStatement(
     "UPSERT INTO article_discussion VALUES(?,?,?)");
stmt.setLong("category_id", categoryId);
stmt.setLong("article_id", articleId);
stmt.setLong("comment_id", commentId);

And to query for all comments in a given category, you'd do this:

PrepareStatement stmt1 = conn.prepareStatement(
     "SELECT * FROM article_discussion WHERE category_id = ?");
stmt1.setLong("category_id", categoryId);
ResultSet rs = stmt1.executeQuery();
while (rs.next()) {
     System.out.println(
         "category_id=" + rs.getLong("category_id") +
         ",article_id=" + rs.getLong("article_id") +
         ",comment_id=" + rs.getLong("comment_id"));
}
and for all comments for a given category and article:

PrepareStatement stmt1 = conn.prepareStatement(
     "SELECT * FROM article_discussion WHERE category_id = ? AND
article_id = ?");
stmt1.setLong("category_id", categoryId);
stmt1.setLong("article_id", articleId);

you could use arbitrary where clauses like this:

SELECT * FROM article_discussion
WHERE (article_name LIKE '% baseball %' OR article_name LIKE '% food %')
AND category_id IN (?, ?, ?)
AND published_date >= to_date('2013-01-01 00:00:00')
AND published_date <= to_date('2013-01-31 00:00:00')

or do aggregation like this:

SELECT category_id, count(article_id), count(comment_id)
FROM article_discussion
WHERE published_date > ?
GROUP BY category_id
HAVING count(comment_id) > 100

Regards,

     James

On 03/06/2013 11:42 PM, Asaf Mesika wrote:
> I would convert each id to long and then use Bytes.toBytes to convert this
> long to a byte array. If it is an int then even better.
> Now, write all 3 longs one after another to one array which will be your
> rowkey.
> This gives you:
> * fixed size
> * small row key - 3*8 bytes if you use long and 3*4 for int.
>
> Why do you need to use prefix split policy?
>
> On Monday, March 4, 2013, Lukáš Drbal wrote:
>
>> Hi,
>>
>> i have one question about rowkey design and presplit table.
>>
>> My usecase:
>> I need store a lot of comments where each comment are for one article and
>> this article has one category.
>>
>> What i need:
>> 1) read one comment by id (where i know commentId, articleId and
>> categoryId)
>> 2) read all coments for article (i know categoryId and articleId)
>> 3) read all comments for category (i know categoryId)
>>
>>  From this read pattern i see one good rowkey:
>> <categoryId>_<articleId>_<commentId>
>>
>> But here i don't have fixed size of rowkey, so i don't know how to define
>> split pattern. How can be this solved?
>> This id's come from external system and grow very fast, so add some like
>> "padding" for each part are hard.
>>
>> Maybe i can use hash function for each part
>> md5(<categoryId>_md5(<articleId>)_md5(<commentId>), but this rowkey is very
>> long (3*32+2 bytes), i don't have experience with this long rowkeys.
>>
>> Can someone give me a suggestions please?
>>
>> Regards
>>
>> Lukas Drbal
>>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB