Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase, mail # user - schema design: rows vs wide columns


Copy link to this message
-
Re: schema design: rows vs wide columns
ramkrishna vasudevan 2013-04-08, 04:51
"So things are fine as long as all CFs have roughly the same size. But if
you have one that gets a lot of data and many others that are smaller, we'd
end up with a lot of unnecessary and small store files from the smaller
CFs."

This is true.  I am not very sure of other reasons.  We any way ensure
cross CF atomicity with a single row.
Regards
Ram
On Mon, Apr 8, 2013 at 10:09 AM, lars hofhansl <[EMAIL PROTECTED]> wrote:

> I think the main problem is that all CFs have to be flushed if one gets
> large enough to require a flush.
> (Does anyone remember why exactly that is? And do we still need that now
> that the memstoreTS is stored in the HFiles?)
>
>
> So things are fine as long as all CFs have roughly the same size. But if
> you have one that gets a lot of data and many others that are smaller, we'd
> end up with a lot of unnecessary and small store files from the smaller CFs.
>
> Anything else known that is bad about many column families?
>
>
> -- Lars
>
>
>
> ________________________________
>  From: Andrew Purtell <[EMAIL PROTECTED]>
> To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>
> Sent: Sunday, April 7, 2013 3:52 PM
> Subject: Re: schema design: rows vs wide columns
>
> Is there a pointer to evidence/experiment backed analysis of this question?
> I'm sure there is some basis for this text in the book but I recommend we
> strike it. We could replace it with YCSB or LoadTestTool driven latency
> graphs for different workloads maybe. Although that would also be a big
> simplification of 'schema design' considerations, it would not be so
> starkly lacking background.
>
> On Sunday, April 7, 2013, Ted Yu wrote:
>
> > From http://hbase.apache.org/book.html#number.of.cfs :
> >
> > HBase currently does not do well with anything above two or three column
> > families so keep the number of column families in your schema low.
> >
> > Cheers
> >
> > On Sun, Apr 7, 2013 at 3:04 PM, Stack <[EMAIL PROTECTED] <javascript:;>>
> > wrote:
> >
> > > On Sun, Apr 7, 2013 at 11:58 AM, Ted <[EMAIL PROTECTED]<javascript:;>>
> > wrote:
> > >
> > > > With regard to number of column families, 3 is the recommended
> maximum.
> > > >
> > >
> > > How did you come up w/ the number '3'?  Is it a 'hard' 3? Or does it
> > > depend?  If the latter, on what does it depend?
> > > Thanks,
> > > St.Ack
> > >
> >
>
>
> --
> Best regards,
>
>    - Andy
>
> Problems worthy of attack prove their worth by hitting back. - Piet Hein
> (via Tom White)
>