Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
HBase >> mail # user >> How many column families in one table ?


+
Vimal Jain 2013-06-28, 05:20
+
Michel Segel 2013-06-28, 05:37
+
Ted Yu 2013-06-28, 12:27
+
Vimal Jain 2013-06-28, 12:40
+
Otis Gospodnetic 2013-06-28, 18:36
+
Vimal Jain 2013-07-01, 08:44
+
Viral Bajaria 2013-07-01, 08:54
+
Vimal Jain 2013-07-01, 09:11
+
Vimal Jain 2013-07-01, 11:11
+
lars hofhansl 2013-07-01, 11:18
+
Vimal Jain 2013-07-01, 11:27
+
Ted Yu 2013-07-01, 11:39
+
Vimal Jain 2013-07-01, 11:44
+
lars hofhansl 2013-07-01, 15:38
+
Vimal Jain 2013-07-01, 17:03
+
Vimal Jain 2013-07-01, 17:06
+
Viral Bajaria 2013-07-01, 19:18
+
Vimal Jain 2013-08-04, 06:28
+
Kevin Odell 2013-08-04, 14:44
+
Inder Pall 2013-08-04, 14:55
+
Kevin Odell 2013-08-04, 15:04
+
Inder Pall 2013-08-04, 15:05
+
lars hofhansl 2013-08-04, 16:03
+
Rohit Kelkar 2013-08-04, 20:55
+
Pablo Medina 2013-08-05, 14:00
Copy link to this message
-
Re: How many column families in one table ?
Pablo,

  That is correct.
On Mon, Aug 5, 2013 at 10:00 AM, Pablo Medina <[EMAIL PROTECTED]>wrote:

> Lars,
>
> when you say 'when one memstore needs to be flushed all other column
> families are flushed', are you referring to other column families of the
> same table, right?
>
>
>
>
> 2013/8/4 Rohit Kelkar <[EMAIL PROTECTED]>
>
> > Regarding slow scan- only fetch the columns /qualifiers that you need. It
> > may be that you are fetching a whole lot of data that you don't need. Try
> > scan.addColumn() and let us know.
> >
> > - R
> >
> > On Sunday, August 4, 2013, lars hofhansl wrote:
> >
> > > BigTable has one more level of abstraction: Locality Groups
> > > A Column Family in HBase is both a Column Faimily and a Locality Group:
> > It
> > > is a group of columns *and* it defines storage parameters (compression,
> > > versions, TTL, etc).
> > >
> > > As to how many make sense. It depends.
> > > If you can group your columns such that a scan is often limited to a
> > > single Column Family, you'll get huge benefit by using more Column
> > Families.
> > > The main consideration for many Column Families and that each has its
> own
> > > store files, and hence scanning involves more seeking for each Column
> > > Families included in a scan.
> > >
> > > They are also flushed together; when one memstore (which is per Column
> > > Family) needs to be flushed all other Column Families are also flushed
> > > leading to many small files until they are compacted. If all your
> Column
> > > Faimilies are roughly the same size this is less of a problem. It's
> also
> > > possible to mitigate this by tweaking the compaction policies.
> > >
> > >
> > > -- Lars
> > >
> > >
> > >
> > > ________________________________
> > >  From: Vimal Jain <[EMAIL PROTECTED] <javascript:;>>
> > > To: [EMAIL PROTECTED] <javascript:;>
> > > Sent: Saturday, August 3, 2013 11:28 PM
> > > Subject: Re: How many column families in one table ?
> > >
> > >
> > > Hi,
> > > I have tested read performance after reducing number of column families
> > > from 14 to 3 and yes there is improvement.
> > > Meanwhile i was going through the paper published by google on
> BigTable.
> > > It says
> > >
> > > "It is our intent that the number of distinct column
> > > families in a table be small (in the hundreds at most), and
> > > that families rarely change during operation."
> > >
> > > So Is that theoretical value ( 100 CFs )  or its possible but not with
> > the
> > > current version of Hbase ?
> > >
> > >
> > > On Tue, Jul 2, 2013 at 12:48 AM, Viral Bajaria <
> [EMAIL PROTECTED]
> > <javascript:;>
> > > >wrote:
> > >
> > > > On Mon, Jul 1, 2013 at 10:06 AM, Vimal Jain <[EMAIL PROTECTED]
> > <javascript:;>>
> > > wrote:
> > > >
> > > > > Sorry for the typo .. please ignore previous mail.. Here is the
> > > corrected
> > > > > one..
> > > > > 1)I have around 140 columns for each row , out of 140 , around 100
> > > > columns
> > > > > hold java primitive data type , remaining 40 columns  contain
> > > serialized
> > > > > java object as byte array(Inside each object is an ArrayList). Yes
> ,
> > I
> > > do
> > > > > delete data but the frequency is very less ( 1 out of 5K operations
> > ).
> > > I
> > > > > dont run any compaction.
> > > > >
> > > >
> > > > This answers the type of data in each cell not the size of data. Can
> > you
> > > > figure out the average size of data that you insert in that size. For
> > > > example what is the length of the byte array ? Also for java
> primitive,
> > > is
> > > > it 8-byte long ? 4-byte int ?
> > > > In addition to that, what is in the row key ? How long is that in
> > bytes ?
> > > > Same for column family, can you share the names of the column family
> ?
> > > How
> > > > about qualifiers ?
> > > >
> > > > If you have disabled major compactions, you should run it once a few
> > days
> > > > (if not once a day) to consolidate the # of files that each scan will
> > > have
> > > > to open.
> > > >
> > > > 2) I had ran scan keeping in mind the CPU,IO and other system related

Kevin O'Dell
Systems Engineer, Cloudera
+
Michael Segel 2013-06-28, 12:42
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB