Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
HBase, mail # user - Hbase Count Aggregate Function


+
Dalia Sobhy 2012-12-24, 15:03
+
ramkrishna vasudevan 2012-12-24, 16:27
+
Dalia Sobhy 2012-12-24, 16:41
Copy link to this message
-
Re: Hbase Count Aggregate Function
ramkrishna vasudevan 2012-12-24, 16:52
Okie, seeing the shell script and the code I feel that while you use this
counter, the user's filter is not taken into account.
It adds a FirstKeyOnlyFilter and proceeds with the scan. :(.

Regards
Ram

On Mon, Dec 24, 2012 at 10:11 PM, Dalia Sobhy <[EMAIL PROTECTED]>wrote:

>
> yeah scan gives the correct number of rows, while count returns the total
> number of rows.
>
> Both are using the same filter, I even tried it using Java API, using row
> count method.
>
> rowCount = aggregationClient.rowCount(TABLE_NAME, null, scan);
>
> I get the total number of rows not the number of rows filtered.
>
> So any idea ??
>
> Thanks Ram :)
>
> > Date: Mon, 24 Dec 2012 21:57:54 +0530
> > Subject: Re: Hbase Count Aggregate Function
> > From: [EMAIL PROTECTED]
> > To: [EMAIL PROTECTED]
> >
> > So you find that scan with a filter and count with the same filter is
> > giving you different results?
> >
> > Regards
> > Ram
> >
> > On Mon, Dec 24, 2012 at 8:33 PM, Dalia Sobhy <[EMAIL PROTECTED]
> >wrote:
> >
> > >
> > > Dear all,
> > >
> > > I have 50,000 row with diagnosis qualifier = "cardiac", and another
> 50,000
> > > rows with "renal".
> > >
> > > When I type this in Hbase shell,
> > >
> > > import org.apache.hadoop.hbase.filter.CompareFilter
> > > import org.apache.hadoop.hbase.filter.SingleColumnValueFilter
> > > import org.apache.hadoop.hbase.filter.SubstringComparator
> > > import org.apache.hadoop.hbase.util.Bytes
> > >
> > > scan 'patient', { COLUMNS => "info:diagnosis", FILTER =>
> > >     SingleColumnValueFilter.new(Bytes.toBytes('info'),
> > >          Bytes.toBytes('diagnosis'),
> > >          CompareFilter::CompareOp.valueOf('EQUAL'),
> > >          SubstringComparator.new('cardiac'))}
> > >
> > > Output = 50,000 row
> > >
> > > import org.apache.hadoop.hbase.filter.CompareFilter
> > > import org.apache.hadoop.hbase.filter.SingleColumnValueFilter
> > > import org.apache.hadoop.hbase.filter.SubstringComparator
> > > import org.apache.hadoop.hbase.util.Bytes
> > >
> > > count 'patient', { COLUMNS => "info:diagnosis", FILTER =>
> > >     SingleColumnValueFilter.new(Bytes.toBytes('info'),
> > >          Bytes.toBytes('diagnosis'),
> > >          CompareFilter::CompareOp.valueOf('EQUAL'),
> > >          SubstringComparator.new('cardiac'))}
> > > Output = 100,000 row
> > >
> > > Even though I tried it using Hbase Java API, Aggregation Client
> Instance,
> > > and I enabled the Coprocessor aggregation for the table.
> > > rowCount = aggregationClient.rowCount(TABLE_NAME, null, scan)
> > >
> > > Also when measuring the improved performance on case of adding more
> nodes
> > > the operation takes the same time.
> > >
> > > So any advice please?
> > >
> > > I have been throughout all this mess from a couple of weeks
> > >
> > > Thanks,
>
>
+
Dalia Sobhy 2012-12-24, 17:05
+
ramkrishna vasudevan 2012-12-24, 18:51
+
Dalia Sobhy 2012-12-24, 19:20
+
Dalia Sobhy 2012-12-25, 16:42
+
yuzhihong@... 2012-12-25, 16:57
+
ramkrishna vasudevan 2012-12-25, 17:14
+
Dalia Sobhy 2012-12-25, 17:55
+
ramkrishna vasudevan 2012-12-26, 15:41
+
Dalia Sobhy 2013-01-01, 21:44
+
ramkrishna vasudevan 2013-01-02, 04:09
+
Dalia Sobhy 2012-12-25, 17:45
+
Dalia Sobhy 2012-12-24, 19:25
+
Jean-Marc Spaggiari 2012-12-24, 15:51