Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase, mail # user - How many column families in one table ?


Copy link to this message
-
Re: How many column families in one table ?
Kevin O'dell 2013-08-05, 14:44
Pablo,

  That is correct.
On Mon, Aug 5, 2013 at 10:00 AM, Pablo Medina <[EMAIL PROTECTED]>wrote:

> Lars,
>
> when you say 'when one memstore needs to be flushed all other column
> families are flushed', are you referring to other column families of the
> same table, right?
>
>
>
>
> 2013/8/4 Rohit Kelkar <[EMAIL PROTECTED]>
>
> > Regarding slow scan- only fetch the columns /qualifiers that you need. It
> > may be that you are fetching a whole lot of data that you don't need. Try
> > scan.addColumn() and let us know.
> >
> > - R
> >
> > On Sunday, August 4, 2013, lars hofhansl wrote:
> >
> > > BigTable has one more level of abstraction: Locality Groups
> > > A Column Family in HBase is both a Column Faimily and a Locality Group:
> > It
> > > is a group of columns *and* it defines storage parameters (compression,
> > > versions, TTL, etc).
> > >
> > > As to how many make sense. It depends.
> > > If you can group your columns such that a scan is often limited to a
> > > single Column Family, you'll get huge benefit by using more Column
> > Families.
> > > The main consideration for many Column Families and that each has its
> own
> > > store files, and hence scanning involves more seeking for each Column
> > > Families included in a scan.
> > >
> > > They are also flushed together; when one memstore (which is per Column
> > > Family) needs to be flushed all other Column Families are also flushed
> > > leading to many small files until they are compacted. If all your
> Column
> > > Faimilies are roughly the same size this is less of a problem. It's
> also
> > > possible to mitigate this by tweaking the compaction policies.
> > >
> > >
> > > -- Lars
> > >
> > >
> > >
> > > ________________________________
> > >  From: Vimal Jain <[EMAIL PROTECTED] <javascript:;>>
> > > To: [EMAIL PROTECTED] <javascript:;>
> > > Sent: Saturday, August 3, 2013 11:28 PM
> > > Subject: Re: How many column families in one table ?
> > >
> > >
> > > Hi,
> > > I have tested read performance after reducing number of column families
> > > from 14 to 3 and yes there is improvement.
> > > Meanwhile i was going through the paper published by google on
> BigTable.
> > > It says
> > >
> > > "It is our intent that the number of distinct column
> > > families in a table be small (in the hundreds at most), and
> > > that families rarely change during operation."
> > >
> > > So Is that theoretical value ( 100 CFs )  or its possible but not with
> > the
> > > current version of Hbase ?
> > >
> > >
> > > On Tue, Jul 2, 2013 at 12:48 AM, Viral Bajaria <
> [EMAIL PROTECTED]
> > <javascript:;>
> > > >wrote:
> > >
> > > > On Mon, Jul 1, 2013 at 10:06 AM, Vimal Jain <[EMAIL PROTECTED]
> > <javascript:;>>
> > > wrote:
> > > >
> > > > > Sorry for the typo .. please ignore previous mail.. Here is the
> > > corrected
> > > > > one..
> > > > > 1)I have around 140 columns for each row , out of 140 , around 100
> > > > columns
> > > > > hold java primitive data type , remaining 40 columns  contain
> > > serialized
> > > > > java object as byte array(Inside each object is an ArrayList). Yes
> ,
> > I
> > > do
> > > > > delete data but the frequency is very less ( 1 out of 5K operations
> > ).
> > > I
> > > > > dont run any compaction.
> > > > >
> > > >
> > > > This answers the type of data in each cell not the size of data. Can
> > you
> > > > figure out the average size of data that you insert in that size. For
> > > > example what is the length of the byte array ? Also for java
> primitive,
> > > is
> > > > it 8-byte long ? 4-byte int ?
> > > > In addition to that, what is in the row key ? How long is that in
> > bytes ?
> > > > Same for column family, can you share the names of the column family
> ?
> > > How
> > > > about qualifiers ?
> > > >
> > > > If you have disabled major compactions, you should run it once a few
> > days
> > > > (if not once a day) to consolidate the # of files that each scan will
> > > have
> > > > to open.
> > > >
> > > > 2) I had ran scan keeping in mind the CPU,IO and other system related

Kevin O'Dell
Systems Engineer, Cloudera