Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> Hbase Count Aggregate Function


Copy link to this message
-
Re: Hbase Count Aggregate Function
RowCount method accepts scan object where you can attach your custom filter.

Cheers

On Dec 25, 2012, at 8:42 AM, Dalia Sobhy <[EMAIL PROTECTED]> wrote:

>
> Do you mean I implement a new rowCount method in Aggregation Client Class.
>
> I cannot understand, could u illustrate with a code sample Ram?
>
>>> Date: Tue, 25 Dec 2012 00:21:14 +0530
>>> Subject: Re: Hbase Count Aggregate Function
>>> From: [EMAIL PROTECTED]
>>> To: [EMAIL PROTECTED]
>>>
>>> Hi
>>> You could have custom filter implemented which is similar to
>>> FirstKeyOnlyfilter.
>>> Implement the filterKeyValue method such that it should match your keyvalue
>>> (the specific qualifier that you are looking for).
>>>
>>> Deploy it in your cluster.  It should work.
>>>
>>> Regards
>>> Ram
>>>
>>> On Mon, Dec 24, 2012 at 10:35 PM, Dalia Sobhy <[EMAIL PROTECTED]>wrote:
>>>
>>>>
>>>> So do you have a suggestion how to enable/work the filter?
>>>>
>>>>> Date: Mon, 24 Dec 2012 22:22:49 +0530
>>>>> Subject: Re: Hbase Count Aggregate Function
>>>>> From: [EMAIL PROTECTED]
>>>>> To: [EMAIL PROTECTED]
>>>>>
>>>>> Okie, seeing the shell script and the code I feel that while you use this
>>>>> counter, the user's filter is not taken into account.
>>>>> It adds a FirstKeyOnlyFilter and proceeds with the scan. :(.
>>>>>
>>>>> Regards
>>>>> Ram
>>>>>
>>>>> On Mon, Dec 24, 2012 at 10:11 PM, Dalia Sobhy <
>>>> [EMAIL PROTECTED]>wrote:
>>>>>
>>>>>>
>>>>>> yeah scan gives the correct number of rows, while count returns the
>>>> total
>>>>>> number of rows.
>>>>>>
>>>>>> Both are using the same filter, I even tried it using Java API, using
>>>> row
>>>>>> count method.
>>>>>>
>>>>>> rowCount = aggregationClient.rowCount(TABLE_NAME, null, scan);
>>>>>>
>>>>>> I get the total number of rows not the number of rows filtered.
>>>>>>
>>>>>> So any idea ??
>>>>>>
>>>>>> Thanks Ram :)
>>>>>>
>>>>>>> Date: Mon, 24 Dec 2012 21:57:54 +0530
>>>>>>> Subject: Re: Hbase Count Aggregate Function
>>>>>>> From: [EMAIL PROTECTED]
>>>>>>> To: [EMAIL PROTECTED]
>>>>>>>
>>>>>>> So you find that scan with a filter and count with the same filter is
>>>>>>> giving you different results?
>>>>>>>
>>>>>>> Regards
>>>>>>> Ram
>>>>>>>
>>>>>>> On Mon, Dec 24, 2012 at 8:33 PM, Dalia Sobhy <
>>>> [EMAIL PROTECTED]
>>>>>>> wrote:
>>>>>>>
>>>>>>>>
>>>>>>>> Dear all,
>>>>>>>>
>>>>>>>> I have 50,000 row with diagnosis qualifier = "cardiac", and another
>>>>>> 50,000
>>>>>>>> rows with "renal".
>>>>>>>>
>>>>>>>> When I type this in Hbase shell,
>>>>>>>>
>>>>>>>> import org.apache.hadoop.hbase.filter.CompareFilter
>>>>>>>> import org.apache.hadoop.hbase.filter.SingleColumnValueFilter
>>>>>>>> import org.apache.hadoop.hbase.filter.SubstringComparator
>>>>>>>> import org.apache.hadoop.hbase.util.Bytes
>>>>>>>>
>>>>>>>> scan 'patient', { COLUMNS => "info:diagnosis", FILTER =>
>>>>>>>>    SingleColumnValueFilter.new(Bytes.toBytes('info'),
>>>>>>>>         Bytes.toBytes('diagnosis'),
>>>>>>>>         CompareFilter::CompareOp.valueOf('EQUAL'),
>>>>>>>>         SubstringComparator.new('cardiac'))}
>>>>>>>>
>>>>>>>> Output = 50,000 row
>>>>>>>>
>>>>>>>> import org.apache.hadoop.hbase.filter.CompareFilter
>>>>>>>> import org.apache.hadoop.hbase.filter.SingleColumnValueFilter
>>>>>>>> import org.apache.hadoop.hbase.filter.SubstringComparator
>>>>>>>> import org.apache.hadoop.hbase.util.Bytes
>>>>>>>>
>>>>>>>> count 'patient', { COLUMNS => "info:diagnosis", FILTER =>
>>>>>>>>    SingleColumnValueFilter.new(Bytes.toBytes('info'),
>>>>>>>>         Bytes.toBytes('diagnosis'),
>>>>>>>>         CompareFilter::CompareOp.valueOf('EQUAL'),
>>>>>>>>         SubstringComparator.new('cardiac'))}
>>>>>>>> Output = 100,000 row
>>>>>>>>
>>>>>>>> Even though I tried it using Hbase Java API, Aggregation Client
>>>>>> Instance,
>>>>>>>> and I enabled the Coprocessor aggregation for the table.
>>>>>>>> rowCount = aggregationClient.rowCount(TABLE_NAME, null, scan)