Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> Question on the number of column families


Copy link to this message
-
RE: Question on the number of column families
Thank you all.

Facts learned:

- Having 130 column families is too much. Don't do that.
- While scanning, an entire row will be read for filtering, unless HBASE-5416 technique is applied which makes only relevant column family is loaded. (But it seems that still one can't load just a column needed while scanning)
- Big row size is maybe not good.

Currently it seems appropriate to follow the one-column solution that Alok Singh suggested, in part since currently there is no reasonable grouping of the fields.

Here is my current thinking:

- One column family, one column. Field name will be included in rowkey.
- Eliminate filtering altogether (in most case) by properly ordering rowkey components.
- If a filtering is absolutely needed, add a 'dummy' column family and apply HBASE-5416 technique to minimize disk read, since the field value can be large(~5MB). (This dummy column thing may not be right, I'm not sure, since I have not read the filtering section of the book I'm reading yet)

Hope that I am not missing or misunderstanding something...
(I'm a total newbie. I've started to read a HBase book since last week...)