Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase, mail # user - Scan addFamily vs FamilyFilter(EQUAL, ...)


Copy link to this message
-
RE: Scan addFamily vs FamilyFilter(EQUAL, ...)
Ramkrishna.S.Vasudevan 2012-05-31, 11:38
Just to add on.
The java doc clearly says in FamilyFilter that

* If an already known column family is looked for, use {@link
org.apache.hadoop.hbase.client.Get#addFamily(byte[])}
* directly rather than a filter.

So addFamily should be better.

Regards
Ram

> -----Original Message-----
> From: Anoop Sam John [mailto:[EMAIL PROTECTED]]
> Sent: Thursday, May 31, 2012 11:49 AM
> To: [EMAIL PROTECTED]
> Subject: RE: Scan addFamily vs FamilyFilter(EQUAL, ...)
>
> Hi,
>      As per my understanding of the Scan code in your scenario where
> you want to go with scanning of some CFs ( not all)  You go with
> Scan#addFamily.
> The FamilyFilter also doing the same thing. But there is a difference
> in the performance.
> When one specify the CFs in the scan,  the scanner will be created for
> only those many Stores. For the other CFs, there wont be any scanners
> and so those stores are not scanned. ( The HFile data is not fetched )
> Instead when one use the FamilyFilter and not specify any specific
> columns (using Scan#addFamily) all the stores will get scanned and data
> will get fetched from HFiles. Later these KVs corresponding to which
> you needed (as per your FamilyFilter)  only will get included in the
> Result and others just avoided.  So there will be performance
> difference I feel..   Correct me if I am wrong pls...
>
> @Stack
> >One thing I ran into when using the Scan.addFamily / Scan.addColumn is
> that those two methods overwrite each other.
> In the Scan#addColumn javadoc it is clearly telling about this
> overwrites...   So this seems intentionally done correct?
>
>
> -Anoop-
> ________________________________________
> From: [EMAIL PROTECTED] [[EMAIL PROTECTED]] on behalf of Stack
> [[EMAIL PROTECTED]]
> Sent: Wednesday, May 30, 2012 11:13 PM
> To: [EMAIL PROTECTED]
> Subject: Re: Scan addFamily vs FamilyFilter(EQUAL, ...)
>
> On Wed, May 30, 2012 at 9:59 AM, Kevin <[EMAIL PROTECTED]>
> wrote:
> > I am curious and trying to learn which method is best when wanting to
> limit
> > a scan to a particular column or column family. The Scan class
> carries a
> > Filter instance and a TreeMap of the family map and I am unsure how
> they
> > get carried through to the server-side functionality. In terms of
> > performance is there any difference between doing Scan.addFamily(x)
> and
> > Scan.setFilter(new FamilyFilter(CompareFilter.CompareOp.EQUAL, x)?
> >
>
> There is probably not noticeable difference in performance but
> Scan#addFamily is the more natural way of expressing column family
> scoping.
> St.Ack