Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
HBase, mail # user - schema design: rows vs wide columns


+
shawn du 2013-04-07, 08:03
+
Ted 2013-04-07, 18:58
+
Stack 2013-04-07, 22:04
+
Ted Yu 2013-04-07, 22:27
+
Andrew Purtell 2013-04-07, 22:52
+
Viral Bajaria 2013-04-07, 23:51
+
ramkrishna vasudevan 2013-04-08, 03:59
+
lars hofhansl 2013-04-08, 04:39
+
ramkrishna vasudevan 2013-04-08, 04:51
+
Doug Meil 2013-04-08, 14:21
+
Ted Yu 2013-04-16, 14:02
+
Jean-Marc Spaggiari 2013-04-16, 14:04
+
Ted Yu 2013-04-16, 14:08
+
Michael Segel 2013-04-16, 14:35
Copy link to this message
-
Re: schema design: rows vs wide columns
Adrien Mogenet 2013-04-28, 15:23
Wide area :-)

I agree with Michael, perhaps the best explanation could be to explicit
*WHEN* adding extra CF perfectly makes sense.
On Tue, Apr 16, 2013 at 4:35 PM, Michael Segel <[EMAIL PROTECTED]>wrote:

> I think the important thing about Column Families is trying to understand
> on how to use them properly in a design.
>
> Sparse data may make sense. It depends on the use case and an
> understanding of the trade offs.
>
> It all depends on how the data breaks down in to specific use cases.
>
> Keeping CFs to a minimum makes sense. However, what that minimum remains
> to be seen.
>
> It depends....
>
>
> On Apr 16, 2013, at 9:08 AM, Ted Yu <[EMAIL PROTECTED]> wrote:
>
> > bq. Maybe we can explain why there is some impacts, or what to consider?
> >
> > The above would be covered in the JIRA.
> >
> > Thanks
> >
> > On Tue, Apr 16, 2013 at 7:04 AM, Jean-Marc Spaggiari <
> > [EMAIL PROTECTED]> wrote:
> >
> >> Can we add more details than just changing the maximum CF number? Maybe
> we
> >> can explain why there is some impacts, or what to consider?
> >>
> >> JM
> >>
> >> 2013/4/16 Ted Yu <[EMAIL PROTECTED]>
> >>
> >>> If there is no objection, I will create a JIRA to increase the maximum
> >>> number of column families described here:
> >>>
> >>> http://hbase.apache.org/book.html#number.of.cfs
> >>>
> >>> Cheers
> >>>
> >>> On Mon, Apr 8, 2013 at 7:21 AM, Doug Meil <
> [EMAIL PROTECTED]
> >>>> wrote:
> >>>
> >>>>
> >>>>
> >>>> For the record, the refGuide mentions potential issues of CF lumpiness
> >>>> that you mentioned:
> >>>>
> >>>> http://hbase.apache.org/book.html#number.of.cfs
> >>>>
> >>>>
> >>>> 6.2.1. Cardinality of ColumnFamilies
> >>>>
> >>>> Where multiple ColumnFamilies exist in a single table, be aware of the
> >>>> cardinality (i.e., number of rows).
> >>>>      If ColumnFamilyA has 1 million rows and ColumnFamilyB has 1
> >> billion
> >>>> rows, ColumnFamilyA's data will likely be spread
> >>>>      across many, many regions (and RegionServers).  This makes mass
> >>>> scans for ColumnFamilyA less efficient.
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>> Š. anything that needs to be updated/added for this?
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>> On 4/8/13 12:39 AM, "lars hofhansl" <[EMAIL PROTECTED]> wrote:
> >>>>
> >>>>> I think the main problem is that all CFs have to be flushed if one
> >> gets
> >>>>> large enough to require a flush.
> >>>>> (Does anyone remember why exactly that is? And do we still need that
> >> now
> >>>>> that the memstoreTS is stored in the HFiles?)
> >>>>>
> >>>>>
> >>>>> So things are fine as long as all CFs have roughly the same size. But
> >> if
> >>>>> you have one that gets a lot of data and many others that are
> smaller,
> >>>>> we'd end up with a lot of unnecessary and small store files from the
> >>>>> smaller CFs.
> >>>>>
> >>>>> Anything else known that is bad about many column families?
> >>>>>
> >>>>>
> >>>>> -- Lars
> >>>>>
> >>>>>
> >>>>>
> >>>>> ________________________________
> >>>>> From: Andrew Purtell <[EMAIL PROTECTED]>
> >>>>> To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>
> >>>>> Sent: Sunday, April 7, 2013 3:52 PM
> >>>>> Subject: Re: schema design: rows vs wide columns
> >>>>>
> >>>>> Is there a pointer to evidence/experiment backed analysis of this
> >>>>> question?
> >>>>> I'm sure there is some basis for this text in the book but I
> recommend
> >>> we
> >>>>> strike it. We could replace it with YCSB or LoadTestTool driven
> >> latency
> >>>>> graphs for different workloads maybe. Although that would also be a
> >> big
> >>>>> simplification of 'schema design' considerations, it would not be so
> >>>>> starkly lacking background.
> >>>>>
> >>>>> On Sunday, April 7, 2013, Ted Yu wrote:
> >>>>>
> >>>>>> From http://hbase.apache.org/book.html#number.of.cfs :
> >>>>>>
> >>>>>> HBase currently does not do well with anything above two or three
> >>> column
> >>>>>> families so keep the number of column families in your schema low.
Adrien Mogenet
http://www.borntosegfault.com
+
Stack 2013-04-07, 22:45
+
Michael Segel 2013-04-08, 11:17