Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase, mail # user - Schema Design - Move second column family to new table


Copy link to this message
-
Re: Schema Design - Move second column family to new table
Ian Varley 2012-08-20, 14:37
Christian,

Column families are really more "within" rows, not the other way around (they're really just a way to physically partition sets of columns in a table). In your example, then, it's more correct to say that table1 has millions / billions of rows, but only hundreds of them have any columns in CF2. I'm not exactly sure how much of a penalty that 2nd column family imposes in this case--if you don't include it as a part of your scans / gets, then you won't pay any penalty at read time; but if you're reading from both "just in case" the row has data there, you'll always take a hit. I think the same goes for writes. (Question for the list: does adding a column family that you *never* use impose any penalties?)

The downside to moving it to another table is, writes will no longer be transactionally protected (i.e. if you're trying to write to both, it could fail after one and before the other). Conversely, if you put them as column families in the same row, writes to a single row are transactional. You may or may not care about that.

So, putting the lower cardinality data in another table with the same row key might be performance win, or it might not, depending on your read & write patterns. Try it both ways and compare, and let us know what you find.

Ian

On Aug 20, 2012, at 7:25 AM, Pranav Modi wrote:

This might be useful -
http://java.dzone.com/videos/hbase-schema-design-things-you

On Mon, Aug 20, 2012 at 5:17 PM, Christian Schäfer <[EMAIL PROTECTED]>wrote:

Currently I'm about to design HBase tables.

In my case there is table1 with CF1 holding millions/billions of rows and
CF2 with hundreds of rows.
Read use cases include reading both CF data by key or reading only one CF.

Referring to http://hbase.apache.org/book/number.of.cfs.html

Due to the cardinality difference I would change the schema design by
putting CF2 in an extra table (table 2), right?
So after that there are table1 and table2 each with one CF with the same
row key.
Any doubting about that?

Can anyone recommend resources about HBase-Schema-Design where HBase
Schema Design is explained on different use cases
beyond "HBase- Definitive Guide" and the HBase online reference?

regards,
Christian