According to http://hbase.apache.org/book/number.of.cfs.html, having more
than 2~3 column families are strongly discouraged.
BTW, in my case, records on a table have the following characteristics:
- The table is read-only. It is bulk-loaded once. When a new data is ready,
A new table is created and the old table is deleted.
- The size of the source data can be hundreds of gigabytes.
- A record has about 130 fields.
- The number of fields in a record is fixed.
- The names of the fields are also fixed. (it's like a table in RDBMS)
- About 40(it varies) fields mostly have value, while other fields are
mostly empty(null in RDBMS).
- It is unknown which field will be dense. It depends on the source data.
- Fields are accessed independently. Normally a user requests just one
field. A user can request several fields.
- The range on the range query is the same for all fields. (No wider, no
narrower, regardless the data density)
For me, it seems that it would be more efficient if there is one column
family for each field, since it would cost less disk I/O, for only the
needed column data will be read.
Can the table have 130 column families for this case?
Or the whole columns must be in one column family?