As per my understanding of the Scan code in your scenario where you want to go with scanning of some CFs ( not all) You go with Scan#addFamily.
The FamilyFilter also doing the same thing. But there is a difference in the performance.
When one specify the CFs in the scan, the scanner will be created for only those many Stores. For the other CFs, there wont be any scanners and so those stores are not scanned. ( The HFile data is not fetched )
Instead when one use the FamilyFilter and not specify any specific columns (using Scan#addFamily) all the stores will get scanned and data will get fetched from HFiles. Later these KVs corresponding to which you needed (as per your FamilyFilter) only will get included in the Result and others just avoided. So there will be performance difference I feel.. Correct me if I am wrong pls...
>One thing I ran into when using the Scan.addFamily / Scan.addColumn is that those two methods overwrite each other.
In the Scan#addColumn javadoc it is clearly telling about this overwrites... So this seems intentionally done correct?
From: [EMAIL PROTECTED] [[EMAIL PROTECTED]] on behalf of Stack [[EMAIL PROTECTED]]
Sent: Wednesday, May 30, 2012 11:13 PM
To: [EMAIL PROTECTED]
Subject: Re: Scan addFamily vs FamilyFilter(EQUAL, ...)
On Wed, May 30, 2012 at 9:59 AM, Kevin <[EMAIL PROTECTED]> wrote:
> I am curious and trying to learn which method is best when wanting to limit
> a scan to a particular column or column family. The Scan class carries a
> Filter instance and a TreeMap of the family map and I am unsure how they
> get carried through to the server-side functionality. In terms of
> performance is there any difference between doing Scan.addFamily(x) and
> Scan.setFilter(new FamilyFilter(CompareFilter.CompareOp.EQUAL, x)?
There is probably not noticeable difference in performance but
Scan#addFamily is the more natural way of expressing column family