Subject: RE: Question on the number of column families
Now I finished reading the filtering section and the source code of TestJoinedScanners(0.94).
- While scanning, an entire row will be read even for a rowkey filtering. (Since a rowkey is not a physically separate entity and stored in KeyValue object, it's natural. Am I right?)
- The key API for the essential column family support is setLoadColumnFamiliesOnDemand().
So, now I have questions:
On rowkey filtering, which column family's KeyValue object is read?
If HBase just reads a KeyValue from a randomly selected (or just the first) column family, how is setLoadColumnFamiliesOnDemand() affected? Can HBase select a smaller column family intelligently?
If setLoadColumnFamiliesOnDemand() can be applied to a rowkey filtering, a 'dummy' column family can be used to minimize the scan cost.