HBase, mail # user - RE: Question on the number of column families - 2014-08-06, 11:01
 Search Hadoop and all its subprojects:

Switch to Threaded View
Copy link to this message
RE: Question on the number of column families
Hi Ted,

Now I finished reading the filtering section and the source code of TestJoinedScanners(0.94).

Facts learned:

- While scanning, an entire row will be read even for a rowkey filtering. (Since a rowkey is not a physically separate entity and stored in KeyValue object, it's natural. Am I right?)
- The key API for the essential column family support is setLoadColumnFamiliesOnDemand().

So, now I have questions:

On rowkey filtering, which column family's KeyValue object is read?
If HBase just reads a KeyValue from a randomly selected (or just the first) column family, how is setLoadColumnFamiliesOnDemand() affected? Can HBase select a smaller column family intelligently?

If setLoadColumnFamiliesOnDemand() can be applied to a rowkey filtering, a 'dummy' column family can be used to minimize the scan cost.

Thank you.
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB