Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> schema design: rows vs wide columns


Copy link to this message
-
Re: schema design: rows vs wide columns
I think this whole idea of don't go over a certain number of column
families was a 2+ year old story. I remember hearing numbers like 5 or 6
(not 3) come up when talking at Hadoop conferences with engineers who were
at companies that were heavy HBase users. I agree with Andrew's suggestion
that we should remove that text and replace it with benchmarks. Obviously
we need to provide disclaimers that these are benchmarks based on a
specific schema design and so YMMV.

I have run a cluster with some tables having upwards of 5 CFs but the data
was evenly spread across them. I don't think I saw any performance issues
as such or maybe it got masked but 5 CFs was not a problem at all.

Stack puts out an interesting stat i.e. ~15 CFs at FB. Do they run their
own HBase version ? I feel they do and so they might have some enhancements
which are not available to the community or that is no longer the case ?

Thanks,
Viral
On Sun, Apr 7, 2013 at 3:52 PM, Andrew Purtell <[EMAIL PROTECTED]> wrote:

> Is there a pointer to evidence/experiment backed analysis of this question?
> I'm sure there is some basis for this text in the book but I recommend we
> strike it. We could replace it with YCSB or LoadTestTool driven latency
> graphs for different workloads maybe. Although that would also be a big
> simplification of 'schema design' considerations, it would not be so
> starkly lacking background.
>
> On Sunday, April 7, 2013, Ted Yu wrote:
>
> > From http://hbase.apache.org/book.html#number.of.cfs :
> >
> > HBase currently does not do well with anything above two or three column
> > families so keep the number of column families in your schema low.
> >
> > Cheers
> >
> > On Sun, Apr 7, 2013 at 3:04 PM, Stack <[EMAIL PROTECTED] <javascript:;>>
> > wrote:
> >
> > > On Sun, Apr 7, 2013 at 11:58 AM, Ted <[EMAIL PROTECTED]<javascript:;>>
> > wrote:
> > >
> > > > With regard to number of column families, 3 is the recommended
> maximum.
> > > >
> > >
> > > How did you come up w/ the number '3'?  Is it a 'hard' 3? Or does it
> > > depend?  If the latter, on what does it depend?
> > > Thanks,
> > > St.Ack
> > >
> >
>
>
> --
> Best regards,
>
>    - Andy
>
> Problems worthy of attack prove their worth by hitting back. - Piet Hein
> (via Tom White)
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB