Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> Suggested and max number of CFs per table

Copy link to this message
Suggested and max number of CFs per table

My Q is around the suggested or maximum number of CFs per table (see
http://hbase.apache.org/book/schema.html#number.of.cfs )

Consider the following use-case.
* A multi-tenant system.
* All tenants write data to the same table.
* Tenants have different data retention policies.

For the above use case I thought one could then just have different CFs with
different TTLs because Stack suggested relying on HBase's ability to purge old
rows by applying CF-specific TTLs: http://search-hadoop.com/m/VAeb52cvWHV.  
These CFs would have the same set of columns, just different TTLs.  Then tenants
who want to keep only last 1 month's worth of data go to the CF where TTL=1
month, tenants who want to keep last 6 months of data go to CF where TTL=6
months, and so on.  However, tenants are not going to be evenly distributed -
there will be more tenants with shorter data retention periods, which means the
CFs where these tenants have their data will grow faster.

If I'm reading http://hbase.apache.org/book/schema.html#number.of.cfs correctly,
the advice is not to have more than 2-3 CFs per table?
And what happens if I have say 6 CFs per table?

Again if I read the above page correctly, the problem is that uneven data
distribution will mean that whenever 1 of my CFs needs to be flushed, the
remaining 5 CFs will also get flushed at the same time, and this may (or will?)
trigger compaction for all CFs' files creating a sudden IO hit?

Is there a good solution for this problem?
Should one then have 6 different tables, each with just 1 CF instead of having 1
table with 6 CFs?

Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/