Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
HBase >> mail # user >> schema design: rows vs wide columns


+
shawn du 2013-04-07, 08:03
+
Ted 2013-04-07, 18:58
+
Stack 2013-04-07, 22:04
+
Ted Yu 2013-04-07, 22:27
+
Andrew Purtell 2013-04-07, 22:52
+
Viral Bajaria 2013-04-07, 23:51
+
ramkrishna vasudevan 2013-04-08, 03:59
+
lars hofhansl 2013-04-08, 04:39
+
ramkrishna vasudevan 2013-04-08, 04:51
+
Doug Meil 2013-04-08, 14:21
+
Ted Yu 2013-04-16, 14:02
+
Jean-Marc Spaggiari 2013-04-16, 14:04
Copy link to this message
-
Re: schema design: rows vs wide columns
bq. Maybe we can explain why there is some impacts, or what to consider?

The above would be covered in the JIRA.

Thanks

On Tue, Apr 16, 2013 at 7:04 AM, Jean-Marc Spaggiari <
[EMAIL PROTECTED]> wrote:

> Can we add more details than just changing the maximum CF number? Maybe we
> can explain why there is some impacts, or what to consider?
>
> JM
>
> 2013/4/16 Ted Yu <[EMAIL PROTECTED]>
>
> > If there is no objection, I will create a JIRA to increase the maximum
> > number of column families described here:
> >
> > http://hbase.apache.org/book.html#number.of.cfs
> >
> > Cheers
> >
> > On Mon, Apr 8, 2013 at 7:21 AM, Doug Meil <[EMAIL PROTECTED]
> > >wrote:
> >
> > >
> > >
> > > For the record, the refGuide mentions potential issues of CF lumpiness
> > > that you mentioned:
> > >
> > > http://hbase.apache.org/book.html#number.of.cfs
> > >
> > >
> > > 6.2.1. Cardinality of ColumnFamilies
> > >
> > > Where multiple ColumnFamilies exist in a single table, be aware of the
> > > cardinality (i.e., number of rows).
> > >       If ColumnFamilyA has 1 million rows and ColumnFamilyB has 1
> billion
> > > rows, ColumnFamilyA's data will likely be spread
> > >       across many, many regions (and RegionServers).  This makes mass
> > > scans for ColumnFamilyA less efficient.
> > >
> > >
> > >
> > >
> > >
> > > Š. anything that needs to be updated/added for this?
> > >
> > >
> > >
> > >
> > >
> > > On 4/8/13 12:39 AM, "lars hofhansl" <[EMAIL PROTECTED]> wrote:
> > >
> > > >I think the main problem is that all CFs have to be flushed if one
> gets
> > > >large enough to require a flush.
> > > >(Does anyone remember why exactly that is? And do we still need that
> now
> > > >that the memstoreTS is stored in the HFiles?)
> > > >
> > > >
> > > >So things are fine as long as all CFs have roughly the same size. But
> if
> > > >you have one that gets a lot of data and many others that are smaller,
> > > >we'd end up with a lot of unnecessary and small store files from the
> > > >smaller CFs.
> > > >
> > > >Anything else known that is bad about many column families?
> > > >
> > > >
> > > >-- Lars
> > > >
> > > >
> > > >
> > > >________________________________
> > > > From: Andrew Purtell <[EMAIL PROTECTED]>
> > > >To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>
> > > >Sent: Sunday, April 7, 2013 3:52 PM
> > > >Subject: Re: schema design: rows vs wide columns
> > > >
> > > >Is there a pointer to evidence/experiment backed analysis of this
> > > >question?
> > > >I'm sure there is some basis for this text in the book but I recommend
> > we
> > > >strike it. We could replace it with YCSB or LoadTestTool driven
> latency
> > > >graphs for different workloads maybe. Although that would also be a
> big
> > > >simplification of 'schema design' considerations, it would not be so
> > > >starkly lacking background.
> > > >
> > > >On Sunday, April 7, 2013, Ted Yu wrote:
> > > >
> > > >> From http://hbase.apache.org/book.html#number.of.cfs :
> > > >>
> > > >> HBase currently does not do well with anything above two or three
> > column
> > > >> families so keep the number of column families in your schema low.
> > > >>
> > > >> Cheers
> > > >>
> > > >> On Sun, Apr 7, 2013 at 3:04 PM, Stack <[EMAIL PROTECTED]
> <javascript:;>>
> > > >> wrote:
> > > >>
> > > >> > On Sun, Apr 7, 2013 at 11:58 AM, Ted <[EMAIL PROTECTED]
> > > >><javascript:;>>
> > > >> wrote:
> > > >> >
> > > >> > > With regard to number of column families, 3 is the recommended
> > > >>maximum.
> > > >> > >
> > > >> >
> > > >> > How did you come up w/ the number '3'?  Is it a 'hard' 3? Or does
> it
> > > >> > depend?  If the latter, on what does it depend?
> > > >> > Thanks,
> > > >> > St.Ack
> > > >> >
> > > >>
> > > >
> > > >
> > > >--
> > > >Best regards,
> > > >
> > > >   - Andy
> > > >
> > > >Problems worthy of attack prove their worth by hitting back. - Piet
> Hein
> > > >(via Tom White)
> > >
> > >
> > >
> > >
> >
>
+
Michael Segel 2013-04-16, 14:35
+
Adrien Mogenet 2013-04-28, 15:23
+
Stack 2013-04-07, 22:45
+
Michael Segel 2013-04-08, 11:17
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB