Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> Suggested and max number of CFs per table


Copy link to this message
-
Suggested and max number of CFs per table
Hi,

My Q is around the suggested or maximum number of CFs per table (see
http://hbase.apache.org/book/schema.html#number.of.cfs )

Consider the following use-case.
* A multi-tenant system.
* All tenants write data to the same table.
* Tenants have different data retention policies.

For the above use case I thought one could then just have different CFs with
different TTLs because Stack suggested relying on HBase's ability to purge old
rows by applying CF-specific TTLs: http://search-hadoop.com/m/VAeb52cvWHV.  
These CFs would have the same set of columns, just different TTLs.  Then tenants
who want to keep only last 1 month's worth of data go to the CF where TTL=1
month, tenants who want to keep last 6 months of data go to CF where TTL=6
months, and so on.  However, tenants are not going to be evenly distributed -
there will be more tenants with shorter data retention periods, which means the
CFs where these tenants have their data will grow faster.

If I'm reading http://hbase.apache.org/book/schema.html#number.of.cfs correctly,
the advice is not to have more than 2-3 CFs per table?
And what happens if I have say 6 CFs per table?

Again if I read the above page correctly, the problem is that uneven data
distribution will mean that whenever 1 of my CFs needs to be flushed, the
remaining 5 CFs will also get flushed at the same time, and this may (or will?)
trigger compaction for all CFs' files creating a sudden IO hit?

Is there a good solution for this problem?
Should one then have 6 different tables, each with just 1 CF instead of having 1
table with 6 CFs?

Thanks,
Otis
----
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB