Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> Question on the number of column families


Copy link to this message
-
RE: Question on the number of column families
Thank you all.

Facts learned:

- Having 130 column families is too much. Don't do that.
- While scanning, an entire row will be read for filtering, unless HBASE-5416 technique is applied which makes only relevant column family is loaded. (But it seems that still one can't load just a column needed while scanning)
- Big row size is maybe not good.

Currently it seems appropriate to follow the one-column solution that Alok Singh suggested, in part since currently there is no reasonable grouping of the fields.

Here is my current thinking:

- One column family, one column. Field name will be included in rowkey.
- Eliminate filtering altogether (in most case) by properly ordering rowkey components.
- If a filtering is absolutely needed, add a 'dummy' column family and apply HBASE-5416 technique to minimize disk read, since the field value can be large(~5MB). (This dummy column thing may not be right, I'm not sure, since I have not read the filtering section of the book I'm reading yet)

Hope that I am not missing or misunderstanding something...
(I'm a total newbie. I've started to read a HBase book since last week...)
 
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB