Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> schema design: rows vs wide columns


Copy link to this message
-
Re: schema design: rows vs wide columns
I agree with Andrew here and also Stack's comment on FB usage with 15 CFs
is interesting.
Whenever people read that line from the doc, people used to ask why is it
so and also i was thinking that one restriction of having max 3 CFs was one
factor which sometimes made schema design a  bit challenging one.

Regards
Ram
On Mon, Apr 8, 2013 at 5:21 AM, Viral Bajaria <[EMAIL PROTECTED]>wrote:

> I think this whole idea of don't go over a certain number of column
> families was a 2+ year old story. I remember hearing numbers like 5 or 6
> (not 3) come up when talking at Hadoop conferences with engineers who were
> at companies that were heavy HBase users. I agree with Andrew's suggestion
> that we should remove that text and replace it with benchmarks. Obviously
> we need to provide disclaimers that these are benchmarks based on a
> specific schema design and so YMMV.
>
> I have run a cluster with some tables having upwards of 5 CFs but the data
> was evenly spread across them. I don't think I saw any performance issues
> as such or maybe it got masked but 5 CFs was not a problem at all.
>
> Stack puts out an interesting stat i.e. ~15 CFs at FB. Do they run their
> own HBase version ? I feel they do and so they might have some enhancements
> which are not available to the community or that is no longer the case ?
>
> Thanks,
> Viral
>
>
> On Sun, Apr 7, 2013 at 3:52 PM, Andrew Purtell <[EMAIL PROTECTED]>
> wrote:
>
> > Is there a pointer to evidence/experiment backed analysis of this
> question?
> > I'm sure there is some basis for this text in the book but I recommend we
> > strike it. We could replace it with YCSB or LoadTestTool driven latency
> > graphs for different workloads maybe. Although that would also be a big
> > simplification of 'schema design' considerations, it would not be so
> > starkly lacking background.
> >
> > On Sunday, April 7, 2013, Ted Yu wrote:
> >
> > > From http://hbase.apache.org/book.html#number.of.cfs :
> > >
> > > HBase currently does not do well with anything above two or three
> column
> > > families so keep the number of column families in your schema low.
> > >
> > > Cheers
> > >
> > > On Sun, Apr 7, 2013 at 3:04 PM, Stack <[EMAIL PROTECTED]<javascript:;>>
> > > wrote:
> > >
> > > > On Sun, Apr 7, 2013 at 11:58 AM, Ted <[EMAIL PROTECTED]
> <javascript:;>>
> > > wrote:
> > > >
> > > > > With regard to number of column families, 3 is the recommended
> > maximum.
> > > > >
> > > >
> > > > How did you come up w/ the number '3'?  Is it a 'hard' 3? Or does it
> > > > depend?  If the latter, on what does it depend?
> > > > Thanks,
> > > > St.Ack
> > > >
> > >
> >
> >
> > --
> > Best regards,
> >
> >    - Andy
> >
> > Problems worthy of attack prove their worth by hitting back. - Piet Hein
> > (via Tom White)
> >
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB