Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
HBase, mail # user - schema design: rows vs wide columns


+
shawn du 2013-04-07, 08:03
+
Ted 2013-04-07, 18:58
+
Stack 2013-04-07, 22:04
+
Ted Yu 2013-04-07, 22:27
+
Andrew Purtell 2013-04-07, 22:52
+
Viral Bajaria 2013-04-07, 23:51
+
ramkrishna vasudevan 2013-04-08, 03:59
+
lars hofhansl 2013-04-08, 04:39
+
ramkrishna vasudevan 2013-04-08, 04:51
Copy link to this message
-
Re: schema design: rows vs wide columns
Doug Meil 2013-04-08, 14:21


For the record, the refGuide mentions potential issues of CF lumpiness
that you mentioned:

http://hbase.apache.org/book.html#number.of.cfs
 

6.2.1. Cardinality of ColumnFamilies

Where multiple ColumnFamilies exist in a single table, be aware of the
cardinality (i.e., number of rows).
      If ColumnFamilyA has 1 million rows and ColumnFamilyB has 1 billion
rows, ColumnFamilyA's data will likely be spread
      across many, many regions (and RegionServers).  This makes mass
scans for ColumnFamilyA less efficient.
      
Š. anything that needs to be updated/added for this?

On 4/8/13 12:39 AM, "lars hofhansl" <[EMAIL PROTECTED]> wrote:

>I think the main problem is that all CFs have to be flushed if one gets
>large enough to require a flush.
>(Does anyone remember why exactly that is? And do we still need that now
>that the memstoreTS is stored in the HFiles?)
>
>
>So things are fine as long as all CFs have roughly the same size. But if
>you have one that gets a lot of data and many others that are smaller,
>we'd end up with a lot of unnecessary and small store files from the
>smaller CFs.
>
>Anything else known that is bad about many column families?
>
>
>-- Lars
>
>
>
>________________________________
> From: Andrew Purtell <[EMAIL PROTECTED]>
>To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>
>Sent: Sunday, April 7, 2013 3:52 PM
>Subject: Re: schema design: rows vs wide columns
>
>Is there a pointer to evidence/experiment backed analysis of this
>question?
>I'm sure there is some basis for this text in the book but I recommend we
>strike it. We could replace it with YCSB or LoadTestTool driven latency
>graphs for different workloads maybe. Although that would also be a big
>simplification of 'schema design' considerations, it would not be so
>starkly lacking background.
>
>On Sunday, April 7, 2013, Ted Yu wrote:
>
>> From http://hbase.apache.org/book.html#number.of.cfs :
>>
>> HBase currently does not do well with anything above two or three column
>> families so keep the number of column families in your schema low.
>>
>> Cheers
>>
>> On Sun, Apr 7, 2013 at 3:04 PM, Stack <[EMAIL PROTECTED] <javascript:;>>
>> wrote:
>>
>> > On Sun, Apr 7, 2013 at 11:58 AM, Ted <[EMAIL PROTECTED]
>><javascript:;>>
>> wrote:
>> >
>> > > With regard to number of column families, 3 is the recommended
>>maximum.
>> > >
>> >
>> > How did you come up w/ the number '3'?  Is it a 'hard' 3? Or does it
>> > depend?  If the latter, on what does it depend?
>> > Thanks,
>> > St.Ack
>> >
>>
>
>
>--
>Best regards,
>
>   - Andy
>
>Problems worthy of attack prove their worth by hitting back. - Piet Hein
>(via Tom White)
+
Ted Yu 2013-04-16, 14:02
+
Jean-Marc Spaggiari 2013-04-16, 14:04
+
Ted Yu 2013-04-16, 14:08
+
Michael Segel 2013-04-16, 14:35
+
Adrien Mogenet 2013-04-28, 15:23
+
Stack 2013-04-07, 22:45
+
Michael Segel 2013-04-08, 11:17