Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> schema design: rows vs wide columns


Copy link to this message
-
Re: schema design: rows vs wide columns
"So things are fine as long as all CFs have roughly the same size. But if
you have one that gets a lot of data and many others that are smaller, we'd
end up with a lot of unnecessary and small store files from the smaller
CFs."

This is true.  I am not very sure of other reasons.  We any way ensure
cross CF atomicity with a single row.
Regards
Ram
On Mon, Apr 8, 2013 at 10:09 AM, lars hofhansl <[EMAIL PROTECTED]> wrote:

> I think the main problem is that all CFs have to be flushed if one gets
> large enough to require a flush.
> (Does anyone remember why exactly that is? And do we still need that now
> that the memstoreTS is stored in the HFiles?)
>
>
> So things are fine as long as all CFs have roughly the same size. But if
> you have one that gets a lot of data and many others that are smaller, we'd
> end up with a lot of unnecessary and small store files from the smaller CFs.
>
> Anything else known that is bad about many column families?
>
>
> -- Lars
>
>
>
> ________________________________
>  From: Andrew Purtell <[EMAIL PROTECTED]>
> To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>
> Sent: Sunday, April 7, 2013 3:52 PM
> Subject: Re: schema design: rows vs wide columns
>
> Is there a pointer to evidence/experiment backed analysis of this question?
> I'm sure there is some basis for this text in the book but I recommend we
> strike it. We could replace it with YCSB or LoadTestTool driven latency
> graphs for different workloads maybe. Although that would also be a big
> simplification of 'schema design' considerations, it would not be so
> starkly lacking background.
>
> On Sunday, April 7, 2013, Ted Yu wrote:
>
> > From http://hbase.apache.org/book.html#number.of.cfs :
> >
> > HBase currently does not do well with anything above two or three column
> > families so keep the number of column families in your schema low.
> >
> > Cheers
> >
> > On Sun, Apr 7, 2013 at 3:04 PM, Stack <[EMAIL PROTECTED] <javascript:;>>
> > wrote:
> >
> > > On Sun, Apr 7, 2013 at 11:58 AM, Ted <[EMAIL PROTECTED]<javascript:;>>
> > wrote:
> > >
> > > > With regard to number of column families, 3 is the recommended
> maximum.
> > > >
> > >
> > > How did you come up w/ the number '3'?  Is it a 'hard' 3? Or does it
> > > depend?  If the latter, on what does it depend?
> > > Thanks,
> > > St.Ack
> > >
> >
>
>
> --
> Best regards,
>
>    - Andy
>
> Problems worthy of attack prove their worth by hitting back. - Piet Hein
> (via Tom White)
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB