Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase, mail # user - Scan.addFamiliy reduces results


Copy link to this message
-
Re: Scan.addFamiliy reduces results
Daniel Gómez Ferro 2012-03-15, 19:25
As I told you in the other message, if you don't addColumn() the column you are filtering on, by default it will return any row that doesn't contain the said column: http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/SingleColumnValueFilter.html#setFilterIfMissing(boolean)

So when you uncomment the addColumn(), the filter kicks in and actually filters values. When the addColumn() is commented, all rows are returned.

On Mar 15, 2012, at 20:05 , Peter Wolf wrote:

> Huh!  That's what I was afraid you'd say.  I'm still confused :-(
>
> If "it will give all rows that contain _any_ of these families", then
> why does adding a family give me *less* rows?
>
> Leaving my row start/stop and filtering code constant, and just
> un-commenting an addFamily() dramatically reduces the number of results
> returned from a scan.
>
> P
>
>
>
> On 3/15/12 2:42 PM, Himanshu Vashishtha wrote:
>> " Let's also say there are 1000 rows with A,B,C and 500 rows with only B and C.
>>
>> If I add families A, B and C and scan with no filter will I get 1500,
>> 1000 or 500 results?"
>>
>> In this case, you will get 1000 rows. In case you add only B, you will
>> get 500 rows.
>>
>> It's not like if you add families A, B and C, it will give you _only_
>> those rows that have _all_ three families; rather it will give all
>> rows that contain _any_ of these families.
>>
>> Hope this helps.
>>
>> Experts are welcome to chime in if I am missing something :)
>>
>> Thanks,
>> Himanshu
>>
>>
>> On Thu, Mar 15, 2012 at 11:48 AM, Peter Wolf<[EMAIL PROTECTED]>  wrote:
>>> Hi Lars, still confused...
>>>
>>> My table *should* have values for families A, B and C.  Let's say I have a
>>> bug, and some rows only have values for B and C.  Let's also say there are
>>> 1000 rows with A,B,C and 500 rows with only B and C.
>>>
>>> If I add families A, B and C and scan with no filter will I get 1500, 1000
>>> or 500 results?
>>>
>>> Many thanks
>>> P
>>>
>>>
>>>
>>>
>>> On 3/15/12 1:17 PM, lars hofhansl wrote:
>>>> Hi haijia,
>>>>
>>>> In that case HBase will still return the data for columns in family B and
>>>> C.But if you only added family A then HBase would only return "rows" for
>>>> which family A has any columns.
>>>>
>>>> -- Lars
>>>> ________________________________
>>>>
>>>> From: Haijia Zhou<[EMAIL PROTECTED]>
>>>> To: [EMAIL PROTECTED]; lars hofhansl<[EMAIL PROTECTED]>
>>>> Sent: Thursday, March 15, 2012 10:12 AM
>>>> Subject: Re: Scan.addFamiliy reduces results
>>>>
>>>>
>>>> I have the same confusion. Say if I added three column families A, B anc C
>>>> to the scan, now if a row has data for column family B and C but no data for
>>>> A, then it won't be returned  in the next() method?
>>>> What if the requirement is to get row data regardless of whether there's
>>>> data for a specific column family or not?
>>>>
>>>>
>>>> On Thu, Mar 15, 2012 at 1:04 PM, lars hofhansl<[EMAIL PROTECTED]>
>>>>  wrote:
>>>>
>>>> Hi Peter,
>>>>> for HBase you have keep in mind that it is a sparse columnar (or
>>>>> KeyValue) store: (rowkey, columnfamily, column, TS) ->    value
>>>>>
>>>>> A scan only returns those KeyValues that match the scan. So when you set
>>>>> families on your scan you'll only get those rows for which the scan found
>>>>> any columns.
>>>>>
>>>>> Makes sense?
>>>>>
>>>>> -- Lars
>>>>>
>>>>>
>>>>>
>>>>> ________________________________
>>>>>  From: Peter Wolf<[EMAIL PROTECTED]>
>>>>> To: [EMAIL PROTECTED]
>>>>> Sent: Thursday, March 15, 2012 9:52 AM
>>>>> Subject: Re: Scan.addFamiliy reduces results
>>>>>
>>>>>
>>>>> Thanks Doug,
>>>>>
>>>>> I had read that, and I just read it again.  But I am missing something...
>>>>>
>>>>> Why does adding a family reduce the number of results?  Is there an
>>>>> implied filter of some form?  Does addFamily add some constraint on
>>>>> which rows are returned?
>>>>>
>>>>> Note that all my rows *ought* to have values in all the families.
>>>>>
>>>>> Thanks
>>>