Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> Couple of schema design questions


Copy link to this message
-
Couple of schema design questions
Trying to design a HBase schema for a log processing application.  We will
get new logs every day.

1)  We are thinking we will keep data for each day in separate tables.  The
table names would be something like  XYZ-2012-02-26 etc.  There will be at
most 4 tables for each day.

Pros:
Other processes that are processing old data are not affected while data is
getting ready for each day.
It's easier to delete old data that's no longer needed.  Just delete the
tables.

Cons:
Lots of tables to deal with.
Any other??

(Other option is, of course, to create a Table with dates and other tables
will have keys that contain date - at the end of the row key).
2)  We are thinking the RowKeys will be in String format with a separator
character e.g.  ordernum*itemnum.  The keys will only contain IDs & these
IDs will be small, probably 6 digits each.

Pros:
It's easier to look/search for data using HBase Shell.
Very easy to implement.

Cons:
As pointed out here (http://hbase.apache.org/book/rowkey.design.html),
Strings need nearly 3x the bytes.

(Other option is to create a separate Classes for compound row keys. Is it
worth the effort?)
Is there a general consensus regarding these issues?  Thanks in advance for
your help.
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB