Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # dev >> HBase Column Family Limit Reasoning

Copy link to this message
Re: HBase Column Family Limit Reasoning
There's also some good discussion here: https://issues.apache.org/jira/browse/HBASE-3149
This mostly discusses the small HFiles created, since all CFs have to be flushed together, but it's still worth a read.

-- Lars
To: HBase Dev List <[EMAIL PROTECTED]>
Sent: Thursday, June 20, 2013 8:30 AM
Subject: Re: HBase Column Family Limit Reasoning
On Wed, Jun 19, 2013 at 6:01 PM, Namkyu Chang <[EMAIL PROTECTED]> wrote:

> Hi everyone,
> I'm a newcomer to HBase, and as I was reading the documentation I wanted to
> learn more about the reasoning behind the limit on the number of column
> family that HBase supports.
> I understand that currently HBase can only support at most 2-3 column
> families due to the flushing and compaction issues, and the excessive i/o
> loading it may cause for some smaller column families. Since flushing and
> compaction is done on a per region basis, and each region contains most of
> the column families, 1 filled column family can trigger a flushing, but the
> other non-filled column families will also have to participate when really
> they could wait.
> However, is this the only reason? I see that this is "To be addressed by
> changing flushing and compaction to work on a per column family basis", and
> would this mean we can have as much CFs as we'd like after this fix? In
> Google's Bigtable paper, they also limit the number of their CFs to around
> 100 at most. As such, are there any other factors to this limitation?
Some folks have more than 2-3 CFs; IIRC, FB Messages has 10-15 CFs.  In
Messages, reads and writes are carefully managed per CF so they avoid the
issues 'normal' users run into when they have many CFs.

Main issue as you cite above is our flushing all CFs on a region flush
rather than just the big CF.  Fixing this will get us to a new upper-bound.
We'll have to see what it is (my guess is that it will be well below 100).

Other factors are that each CF consumes resources; each has its own
memstore for example.

Be careful. In HBase, CFs are not the same as BigTable CFs.  CFs in HBase
are more like the LocalityGroups BigTable talks of.  If HBase CFs were like
BigTable CFs, that would enable us to have more CFs too (depending upon how
they are implemented).