Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase, mail # user - How many column families in one table ?

Copy link to this message
Re: How many column families in one table ?
lars hofhansl 2013-08-04, 16:03
BigTable has one more level of abstraction: Locality Groups
A Column Family in HBase is both a Column Faimily and a Locality Group: It is a group of columns *and* it defines storage parameters (compression, versions, TTL, etc).

As to how many make sense. It depends.
If you can group your columns such that a scan is often limited to a single Column Family, you'll get huge benefit by using more Column Families.
The main consideration for many Column Families and that each has its own store files, and hence scanning involves more seeking for each Column Families included in a scan.

They are also flushed together; when one memstore (which is per Column Family) needs to be flushed all other Column Families are also flushed leading to many small files until they are compacted. If all your Column Faimilies are roughly the same size this is less of a problem. It's also possible to mitigate this by tweaking the compaction policies.
-- Lars

 From: Vimal Jain <[EMAIL PROTECTED]>
Sent: Saturday, August 3, 2013 11:28 PM
Subject: Re: How many column families in one table ?

I have tested read performance after reducing number of column families
from 14 to 3 and yes there is improvement.
Meanwhile i was going through the paper published by google on BigTable.
It says

"It is our intent that the number of distinct column
families in a table be small (in the hundreds at most), and
that families rarely change during operation."

So Is that theoretical value ( 100 CFs )  or its possible but not with the
current version of Hbase ?
On Tue, Jul 2, 2013 at 12:48 AM, Viral Bajaria <[EMAIL PROTECTED]>wrote:

> On Mon, Jul 1, 2013 at 10:06 AM, Vimal Jain <[EMAIL PROTECTED]> wrote:
> > Sorry for the typo .. please ignore previous mail.. Here is the corrected
> > one..
> > 1)I have around 140 columns for each row , out of 140 , around 100
> columns
> > hold java primitive data type , remaining 40 columns  contain serialized
> > java object as byte array(Inside each object is an ArrayList). Yes , I do
> > delete data but the frequency is very less ( 1 out of 5K operations ). I
> > dont run any compaction.
> >
> This answers the type of data in each cell not the size of data. Can you
> figure out the average size of data that you insert in that size. For
> example what is the length of the byte array ? Also for java primitive, is
> it 8-byte long ? 4-byte int ?
> In addition to that, what is in the row key ? How long is that in bytes ?
> Same for column family, can you share the names of the column family ? How
> about qualifiers ?
> If you have disabled major compactions, you should run it once a few days
> (if not once a day) to consolidate the # of files that each scan will have
> to open.
> 2) I had ran scan keeping in mind the CPU,IO and other system related
> > parameters.I found them to be normal with system load being 0.1-0.3.
> >
> How many disks do you have in your box ? Have you ever benchmarked the
> hardware ?
> Thanks,
> Viral

Thanks and Regards,
Vimal Jain