Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
HBase, mail # user - schema design: rows vs wide columns


+
shawn du 2013-04-07, 08:03
+
Ted 2013-04-07, 18:58
+
Stack 2013-04-07, 22:04
+
Ted Yu 2013-04-07, 22:27
+
Andrew Purtell 2013-04-07, 22:52
+
Viral Bajaria 2013-04-07, 23:51
+
ramkrishna vasudevan 2013-04-08, 03:59
+
lars hofhansl 2013-04-08, 04:39
+
ramkrishna vasudevan 2013-04-08, 04:51
+
Doug Meil 2013-04-08, 14:21
+
Ted Yu 2013-04-16, 14:02
+
Jean-Marc Spaggiari 2013-04-16, 14:04
Copy link to this message
-
Re: schema design: rows vs wide columns
Ted Yu 2013-04-16, 14:08
bq. Maybe we can explain why there is some impacts, or what to consider?

The above would be covered in the JIRA.

Thanks

On Tue, Apr 16, 2013 at 7:04 AM, Jean-Marc Spaggiari <
[EMAIL PROTECTED]> wrote:

> Can we add more details than just changing the maximum CF number? Maybe we
> can explain why there is some impacts, or what to consider?
>
> JM
>
> 2013/4/16 Ted Yu <[EMAIL PROTECTED]>
>
> > If there is no objection, I will create a JIRA to increase the maximum
> > number of column families described here:
> >
> > http://hbase.apache.org/book.html#number.of.cfs
> >
> > Cheers
> >
> > On Mon, Apr 8, 2013 at 7:21 AM, Doug Meil <[EMAIL PROTECTED]
> > >wrote:
> >
> > >
> > >
> > > For the record, the refGuide mentions potential issues of CF lumpiness
> > > that you mentioned:
> > >
> > > http://hbase.apache.org/book.html#number.of.cfs
> > >
> > >
> > > 6.2.1. Cardinality of ColumnFamilies
> > >
> > > Where multiple ColumnFamilies exist in a single table, be aware of the
> > > cardinality (i.e., number of rows).
> > >       If ColumnFamilyA has 1 million rows and ColumnFamilyB has 1
> billion
> > > rows, ColumnFamilyA's data will likely be spread
> > >       across many, many regions (and RegionServers).  This makes mass
> > > scans for ColumnFamilyA less efficient.
> > >
> > >
> > >
> > >
> > >
> > > Š. anything that needs to be updated/added for this?
> > >
> > >
> > >
> > >
> > >
> > > On 4/8/13 12:39 AM, "lars hofhansl" <[EMAIL PROTECTED]> wrote:
> > >
> > > >I think the main problem is that all CFs have to be flushed if one
> gets
> > > >large enough to require a flush.
> > > >(Does anyone remember why exactly that is? And do we still need that
> now
> > > >that the memstoreTS is stored in the HFiles?)
> > > >
> > > >
> > > >So things are fine as long as all CFs have roughly the same size. But
> if
> > > >you have one that gets a lot of data and many others that are smaller,
> > > >we'd end up with a lot of unnecessary and small store files from the
> > > >smaller CFs.
> > > >
> > > >Anything else known that is bad about many column families?
> > > >
> > > >
> > > >-- Lars
> > > >
> > > >
> > > >
> > > >________________________________
> > > > From: Andrew Purtell <[EMAIL PROTECTED]>
> > > >To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>
> > > >Sent: Sunday, April 7, 2013 3:52 PM
> > > >Subject: Re: schema design: rows vs wide columns
> > > >
> > > >Is there a pointer to evidence/experiment backed analysis of this
> > > >question?
> > > >I'm sure there is some basis for this text in the book but I recommend
> > we
> > > >strike it. We could replace it with YCSB or LoadTestTool driven
> latency
> > > >graphs for different workloads maybe. Although that would also be a
> big
> > > >simplification of 'schema design' considerations, it would not be so
> > > >starkly lacking background.
> > > >
> > > >On Sunday, April 7, 2013, Ted Yu wrote:
> > > >
> > > >> From http://hbase.apache.org/book.html#number.of.cfs :
> > > >>
> > > >> HBase currently does not do well with anything above two or three
> > column
> > > >> families so keep the number of column families in your schema low.
> > > >>
> > > >> Cheers
> > > >>
> > > >> On Sun, Apr 7, 2013 at 3:04 PM, Stack <[EMAIL PROTECTED]
> <javascript:;>>
> > > >> wrote:
> > > >>
> > > >> > On Sun, Apr 7, 2013 at 11:58 AM, Ted <[EMAIL PROTECTED]
> > > >><javascript:;>>
> > > >> wrote:
> > > >> >
> > > >> > > With regard to number of column families, 3 is the recommended
> > > >>maximum.
> > > >> > >
> > > >> >
> > > >> > How did you come up w/ the number '3'?  Is it a 'hard' 3? Or does
> it
> > > >> > depend?  If the latter, on what does it depend?
> > > >> > Thanks,
> > > >> > St.Ack
> > > >> >
> > > >>
> > > >
> > > >
> > > >--
> > > >Best regards,
> > > >
> > > >   - Andy
> > > >
> > > >Problems worthy of attack prove their worth by hitting back. - Piet
> Hein
> > > >(via Tom White)
> > >
> > >
> > >
> > >
> >
>
+
Michael Segel 2013-04-16, 14:35
+
Adrien Mogenet 2013-04-28, 15:23
+
Stack 2013-04-07, 22:45
+
Michael Segel 2013-04-08, 11:17