|
Dalia Sobhy
2012-12-24, 15:03
ramkrishna vasudevan
2012-12-24, 16:27
Dalia Sobhy
2012-12-24, 16:41
ramkrishna vasudevan
2012-12-24, 16:52
Dalia Sobhy
2012-12-24, 17:05
ramkrishna vasudevan
2012-12-24, 18:51
Dalia Sobhy
2012-12-24, 19:20
Dalia Sobhy
2012-12-25, 16:42
yuzhihong@...
2012-12-25, 16:57
ramkrishna vasudevan
2012-12-25, 17:14
Dalia Sobhy
2012-12-25, 17:55
ramkrishna vasudevan
2012-12-26, 15:41
Dalia Sobhy
2013-01-01, 21:44
ramkrishna vasudevan
2013-01-02, 04:09
Dalia Sobhy
2012-12-25, 17:45
Dalia Sobhy
2012-12-24, 19:25
Jean-Marc Spaggiari
2012-12-24, 15:51
|
-
Hbase Count Aggregate FunctionDalia Sobhy 2012-12-24, 15:03
Dear all, I have 50,000 row with diagnosis qualifier = "cardiac", and another 50,000 rows with "renal". When I type this in Hbase shell, import org.apache.hadoop.hbase.filter.CompareFilter import org.apache.hadoop.hbase.filter.SingleColumnValueFilter import org.apache.hadoop.hbase.filter.SubstringComparator import org.apache.hadoop.hbase.util.Bytes scan 'patient', { COLUMNS => "info:diagnosis", FILTER => SingleColumnValueFilter.new(Bytes.toBytes('info'), Bytes.toBytes('diagnosis'), CompareFilter::CompareOp.valueOf('EQUAL'), SubstringComparator.new('cardiac'))} Output = 50,000 row import org.apache.hadoop.hbase.filter.CompareFilter import org.apache.hadoop.hbase.filter.SingleColumnValueFilter import org.apache.hadoop.hbase.filter.SubstringComparator import org.apache.hadoop.hbase.util.Bytes count 'patient', { COLUMNS => "info:diagnosis", FILTER => SingleColumnValueFilter.new(Bytes.toBytes('info'), Bytes.toBytes('diagnosis'), CompareFilter::CompareOp.valueOf('EQUAL'), SubstringComparator.new('cardiac'))} Output = 100,000 row Even though I tried it using Hbase Java API, Aggregation Client Instance, and I enabled the Coprocessor aggregation for the table. rowCount = aggregationClient.rowCount(TABLE_NAME, null, scan) Also when measuring the improved performance on case of adding more nodes the operation takes the same time. So any advice please? I have been throughout all this mess from a couple of weeks Thanks, +
Dalia Sobhy 2012-12-24, 15:03
-
Re: Hbase Count Aggregate Functionramkrishna vasudevan 2012-12-24, 16:27
So you find that scan with a filter and count with the same filter is
giving you different results? Regards Ram On Mon, Dec 24, 2012 at 8:33 PM, Dalia Sobhy <[EMAIL PROTECTED]>wrote: > > Dear all, > > I have 50,000 row with diagnosis qualifier = "cardiac", and another 50,000 > rows with "renal". > > When I type this in Hbase shell, > > import org.apache.hadoop.hbase.filter.CompareFilter > import org.apache.hadoop.hbase.filter.SingleColumnValueFilter > import org.apache.hadoop.hbase.filter.SubstringComparator > import org.apache.hadoop.hbase.util.Bytes > > scan 'patient', { COLUMNS => "info:diagnosis", FILTER => > SingleColumnValueFilter.new(Bytes.toBytes('info'), > Bytes.toBytes('diagnosis'), > CompareFilter::CompareOp.valueOf('EQUAL'), > SubstringComparator.new('cardiac'))} > > Output = 50,000 row > > import org.apache.hadoop.hbase.filter.CompareFilter > import org.apache.hadoop.hbase.filter.SingleColumnValueFilter > import org.apache.hadoop.hbase.filter.SubstringComparator > import org.apache.hadoop.hbase.util.Bytes > > count 'patient', { COLUMNS => "info:diagnosis", FILTER => > SingleColumnValueFilter.new(Bytes.toBytes('info'), > Bytes.toBytes('diagnosis'), > CompareFilter::CompareOp.valueOf('EQUAL'), > SubstringComparator.new('cardiac'))} > Output = 100,000 row > > Even though I tried it using Hbase Java API, Aggregation Client Instance, > and I enabled the Coprocessor aggregation for the table. > rowCount = aggregationClient.rowCount(TABLE_NAME, null, scan) > > Also when measuring the improved performance on case of adding more nodes > the operation takes the same time. > > So any advice please? > > I have been throughout all this mess from a couple of weeks > > Thanks, +
ramkrishna vasudevan 2012-12-24, 16:27
-
RE: Hbase Count Aggregate FunctionDalia Sobhy 2012-12-24, 16:41
yeah scan gives the correct number of rows, while count returns the total number of rows. Both are using the same filter, I even tried it using Java API, using row count method. rowCount = aggregationClient.rowCount(TABLE_NAME, null, scan); I get the total number of rows not the number of rows filtered. So any idea ?? Thanks Ram :) > Date: Mon, 24 Dec 2012 21:57:54 +0530 > Subject: Re: Hbase Count Aggregate Function > From: [EMAIL PROTECTED] > To: [EMAIL PROTECTED] > > So you find that scan with a filter and count with the same filter is > giving you different results? > > Regards > Ram > > On Mon, Dec 24, 2012 at 8:33 PM, Dalia Sobhy <[EMAIL PROTECTED]>wrote: > > > > > Dear all, > > > > I have 50,000 row with diagnosis qualifier = "cardiac", and another 50,000 > > rows with "renal". > > > > When I type this in Hbase shell, > > > > import org.apache.hadoop.hbase.filter.CompareFilter > > import org.apache.hadoop.hbase.filter.SingleColumnValueFilter > > import org.apache.hadoop.hbase.filter.SubstringComparator > > import org.apache.hadoop.hbase.util.Bytes > > > > scan 'patient', { COLUMNS => "info:diagnosis", FILTER => > > SingleColumnValueFilter.new(Bytes.toBytes('info'), > > Bytes.toBytes('diagnosis'), > > CompareFilter::CompareOp.valueOf('EQUAL'), > > SubstringComparator.new('cardiac'))} > > > > Output = 50,000 row > > > > import org.apache.hadoop.hbase.filter.CompareFilter > > import org.apache.hadoop.hbase.filter.SingleColumnValueFilter > > import org.apache.hadoop.hbase.filter.SubstringComparator > > import org.apache.hadoop.hbase.util.Bytes > > > > count 'patient', { COLUMNS => "info:diagnosis", FILTER => > > SingleColumnValueFilter.new(Bytes.toBytes('info'), > > Bytes.toBytes('diagnosis'), > > CompareFilter::CompareOp.valueOf('EQUAL'), > > SubstringComparator.new('cardiac'))} > > Output = 100,000 row > > > > Even though I tried it using Hbase Java API, Aggregation Client Instance, > > and I enabled the Coprocessor aggregation for the table. > > rowCount = aggregationClient.rowCount(TABLE_NAME, null, scan) > > > > Also when measuring the improved performance on case of adding more nodes > > the operation takes the same time. > > > > So any advice please? > > > > I have been throughout all this mess from a couple of weeks > > > > Thanks, +
Dalia Sobhy 2012-12-24, 16:41
-
Re: Hbase Count Aggregate Functionramkrishna vasudevan 2012-12-24, 16:52
Okie, seeing the shell script and the code I feel that while you use this
counter, the user's filter is not taken into account. It adds a FirstKeyOnlyFilter and proceeds with the scan. :(. Regards Ram On Mon, Dec 24, 2012 at 10:11 PM, Dalia Sobhy <[EMAIL PROTECTED]>wrote: > > yeah scan gives the correct number of rows, while count returns the total > number of rows. > > Both are using the same filter, I even tried it using Java API, using row > count method. > > rowCount = aggregationClient.rowCount(TABLE_NAME, null, scan); > > I get the total number of rows not the number of rows filtered. > > So any idea ?? > > Thanks Ram :) > > > Date: Mon, 24 Dec 2012 21:57:54 +0530 > > Subject: Re: Hbase Count Aggregate Function > > From: [EMAIL PROTECTED] > > To: [EMAIL PROTECTED] > > > > So you find that scan with a filter and count with the same filter is > > giving you different results? > > > > Regards > > Ram > > > > On Mon, Dec 24, 2012 at 8:33 PM, Dalia Sobhy <[EMAIL PROTECTED] > >wrote: > > > > > > > > Dear all, > > > > > > I have 50,000 row with diagnosis qualifier = "cardiac", and another > 50,000 > > > rows with "renal". > > > > > > When I type this in Hbase shell, > > > > > > import org.apache.hadoop.hbase.filter.CompareFilter > > > import org.apache.hadoop.hbase.filter.SingleColumnValueFilter > > > import org.apache.hadoop.hbase.filter.SubstringComparator > > > import org.apache.hadoop.hbase.util.Bytes > > > > > > scan 'patient', { COLUMNS => "info:diagnosis", FILTER => > > > SingleColumnValueFilter.new(Bytes.toBytes('info'), > > > Bytes.toBytes('diagnosis'), > > > CompareFilter::CompareOp.valueOf('EQUAL'), > > > SubstringComparator.new('cardiac'))} > > > > > > Output = 50,000 row > > > > > > import org.apache.hadoop.hbase.filter.CompareFilter > > > import org.apache.hadoop.hbase.filter.SingleColumnValueFilter > > > import org.apache.hadoop.hbase.filter.SubstringComparator > > > import org.apache.hadoop.hbase.util.Bytes > > > > > > count 'patient', { COLUMNS => "info:diagnosis", FILTER => > > > SingleColumnValueFilter.new(Bytes.toBytes('info'), > > > Bytes.toBytes('diagnosis'), > > > CompareFilter::CompareOp.valueOf('EQUAL'), > > > SubstringComparator.new('cardiac'))} > > > Output = 100,000 row > > > > > > Even though I tried it using Hbase Java API, Aggregation Client > Instance, > > > and I enabled the Coprocessor aggregation for the table. > > > rowCount = aggregationClient.rowCount(TABLE_NAME, null, scan) > > > > > > Also when measuring the improved performance on case of adding more > nodes > > > the operation takes the same time. > > > > > > So any advice please? > > > > > > I have been throughout all this mess from a couple of weeks > > > > > > Thanks, > > +
ramkrishna vasudevan 2012-12-24, 16:52
-
RE: Hbase Count Aggregate FunctionDalia Sobhy 2012-12-24, 17:05
So do you have a suggestion how to enable/work the filter? > Date: Mon, 24 Dec 2012 22:22:49 +0530 > Subject: Re: Hbase Count Aggregate Function > From: [EMAIL PROTECTED] > To: [EMAIL PROTECTED] > > Okie, seeing the shell script and the code I feel that while you use this > counter, the user's filter is not taken into account. > It adds a FirstKeyOnlyFilter and proceeds with the scan. :(. > > Regards > Ram > > On Mon, Dec 24, 2012 at 10:11 PM, Dalia Sobhy <[EMAIL PROTECTED]>wrote: > > > > > yeah scan gives the correct number of rows, while count returns the total > > number of rows. > > > > Both are using the same filter, I even tried it using Java API, using row > > count method. > > > > rowCount = aggregationClient.rowCount(TABLE_NAME, null, scan); > > > > I get the total number of rows not the number of rows filtered. > > > > So any idea ?? > > > > Thanks Ram :) > > > > > Date: Mon, 24 Dec 2012 21:57:54 +0530 > > > Subject: Re: Hbase Count Aggregate Function > > > From: [EMAIL PROTECTED] > > > To: [EMAIL PROTECTED] > > > > > > So you find that scan with a filter and count with the same filter is > > > giving you different results? > > > > > > Regards > > > Ram > > > > > > On Mon, Dec 24, 2012 at 8:33 PM, Dalia Sobhy <[EMAIL PROTECTED] > > >wrote: > > > > > > > > > > > Dear all, > > > > > > > > I have 50,000 row with diagnosis qualifier = "cardiac", and another > > 50,000 > > > > rows with "renal". > > > > > > > > When I type this in Hbase shell, > > > > > > > > import org.apache.hadoop.hbase.filter.CompareFilter > > > > import org.apache.hadoop.hbase.filter.SingleColumnValueFilter > > > > import org.apache.hadoop.hbase.filter.SubstringComparator > > > > import org.apache.hadoop.hbase.util.Bytes > > > > > > > > scan 'patient', { COLUMNS => "info:diagnosis", FILTER => > > > > SingleColumnValueFilter.new(Bytes.toBytes('info'), > > > > Bytes.toBytes('diagnosis'), > > > > CompareFilter::CompareOp.valueOf('EQUAL'), > > > > SubstringComparator.new('cardiac'))} > > > > > > > > Output = 50,000 row > > > > > > > > import org.apache.hadoop.hbase.filter.CompareFilter > > > > import org.apache.hadoop.hbase.filter.SingleColumnValueFilter > > > > import org.apache.hadoop.hbase.filter.SubstringComparator > > > > import org.apache.hadoop.hbase.util.Bytes > > > > > > > > count 'patient', { COLUMNS => "info:diagnosis", FILTER => > > > > SingleColumnValueFilter.new(Bytes.toBytes('info'), > > > > Bytes.toBytes('diagnosis'), > > > > CompareFilter::CompareOp.valueOf('EQUAL'), > > > > SubstringComparator.new('cardiac'))} > > > > Output = 100,000 row > > > > > > > > Even though I tried it using Hbase Java API, Aggregation Client > > Instance, > > > > and I enabled the Coprocessor aggregation for the table. > > > > rowCount = aggregationClient.rowCount(TABLE_NAME, null, scan) > > > > > > > > Also when measuring the improved performance on case of adding more > > nodes > > > > the operation takes the same time. > > > > > > > > So any advice please? > > > > > > > > I have been throughout all this mess from a couple of weeks > > > > > > > > Thanks, > > > > +
Dalia Sobhy 2012-12-24, 17:05
-
Re: Hbase Count Aggregate Functionramkrishna vasudevan 2012-12-24, 18:51
Hi
You could have custom filter implemented which is similar to FirstKeyOnlyfilter. Implement the filterKeyValue method such that it should match your keyvalue (the specific qualifier that you are looking for). Deploy it in your cluster. It should work. Regards Ram On Mon, Dec 24, 2012 at 10:35 PM, Dalia Sobhy <[EMAIL PROTECTED]>wrote: > > So do you have a suggestion how to enable/work the filter? > > > Date: Mon, 24 Dec 2012 22:22:49 +0530 > > Subject: Re: Hbase Count Aggregate Function > > From: [EMAIL PROTECTED] > > To: [EMAIL PROTECTED] > > > > Okie, seeing the shell script and the code I feel that while you use this > > counter, the user's filter is not taken into account. > > It adds a FirstKeyOnlyFilter and proceeds with the scan. :(. > > > > Regards > > Ram > > > > On Mon, Dec 24, 2012 at 10:11 PM, Dalia Sobhy < > [EMAIL PROTECTED]>wrote: > > > > > > > > yeah scan gives the correct number of rows, while count returns the > total > > > number of rows. > > > > > > Both are using the same filter, I even tried it using Java API, using > row > > > count method. > > > > > > rowCount = aggregationClient.rowCount(TABLE_NAME, null, scan); > > > > > > I get the total number of rows not the number of rows filtered. > > > > > > So any idea ?? > > > > > > Thanks Ram :) > > > > > > > Date: Mon, 24 Dec 2012 21:57:54 +0530 > > > > Subject: Re: Hbase Count Aggregate Function > > > > From: [EMAIL PROTECTED] > > > > To: [EMAIL PROTECTED] > > > > > > > > So you find that scan with a filter and count with the same filter is > > > > giving you different results? > > > > > > > > Regards > > > > Ram > > > > > > > > On Mon, Dec 24, 2012 at 8:33 PM, Dalia Sobhy < > [EMAIL PROTECTED] > > > >wrote: > > > > > > > > > > > > > > Dear all, > > > > > > > > > > I have 50,000 row with diagnosis qualifier = "cardiac", and another > > > 50,000 > > > > > rows with "renal". > > > > > > > > > > When I type this in Hbase shell, > > > > > > > > > > import org.apache.hadoop.hbase.filter.CompareFilter > > > > > import org.apache.hadoop.hbase.filter.SingleColumnValueFilter > > > > > import org.apache.hadoop.hbase.filter.SubstringComparator > > > > > import org.apache.hadoop.hbase.util.Bytes > > > > > > > > > > scan 'patient', { COLUMNS => "info:diagnosis", FILTER => > > > > > SingleColumnValueFilter.new(Bytes.toBytes('info'), > > > > > Bytes.toBytes('diagnosis'), > > > > > CompareFilter::CompareOp.valueOf('EQUAL'), > > > > > SubstringComparator.new('cardiac'))} > > > > > > > > > > Output = 50,000 row > > > > > > > > > > import org.apache.hadoop.hbase.filter.CompareFilter > > > > > import org.apache.hadoop.hbase.filter.SingleColumnValueFilter > > > > > import org.apache.hadoop.hbase.filter.SubstringComparator > > > > > import org.apache.hadoop.hbase.util.Bytes > > > > > > > > > > count 'patient', { COLUMNS => "info:diagnosis", FILTER => > > > > > SingleColumnValueFilter.new(Bytes.toBytes('info'), > > > > > Bytes.toBytes('diagnosis'), > > > > > CompareFilter::CompareOp.valueOf('EQUAL'), > > > > > SubstringComparator.new('cardiac'))} > > > > > Output = 100,000 row > > > > > > > > > > Even though I tried it using Hbase Java API, Aggregation Client > > > Instance, > > > > > and I enabled the Coprocessor aggregation for the table. > > > > > rowCount = aggregationClient.rowCount(TABLE_NAME, null, scan) > > > > > > > > > > Also when measuring the improved performance on case of adding more > > > nodes > > > > > the operation takes the same time. > > > > > > > > > > So any advice please? > > > > > > > > > > I have been throughout all this mess from a couple of weeks > > > > > > > > > > Thanks, > > > > > > > > +
ramkrishna vasudevan 2012-12-24, 18:51
-
RE: Hbase Count Aggregate FunctionDalia Sobhy 2012-12-24, 19:20
Do you mean I implement a new rowCount method in Aggregation Client Class. I cannot understand, could u illustrate with a code sample Ram? Thanks, > Date: Tue, 25 Dec 2012 00:21:14 +0530 > Subject: Re: Hbase Count Aggregate Function > From: [EMAIL PROTECTED] > To: [EMAIL PROTECTED] > > Hi > You could have custom filter implemented which is similar to > FirstKeyOnlyfilter. > Implement the filterKeyValue method such that it should match your keyvalue > (the specific qualifier that you are looking for). > > Deploy it in your cluster. It should work. > > Regards > Ram > > On Mon, Dec 24, 2012 at 10:35 PM, Dalia Sobhy <[EMAIL PROTECTED]>wrote: > > > > > So do you have a suggestion how to enable/work the filter? > > > > > Date: Mon, 24 Dec 2012 22:22:49 +0530 > > > Subject: Re: Hbase Count Aggregate Function > > > From: [EMAIL PROTECTED] > > > To: [EMAIL PROTECTED] > > > > > > Okie, seeing the shell script and the code I feel that while you use this > > > counter, the user's filter is not taken into account. > > > It adds a FirstKeyOnlyFilter and proceeds with the scan. :(. > > > > > > Regards > > > Ram > > > > > > On Mon, Dec 24, 2012 at 10:11 PM, Dalia Sobhy < > > [EMAIL PROTECTED]>wrote: > > > > > > > > > > > yeah scan gives the correct number of rows, while count returns the > > total > > > > number of rows. > > > > > > > > Both are using the same filter, I even tried it using Java API, using > > row > > > > count method. > > > > > > > > rowCount = aggregationClient.rowCount(TABLE_NAME, null, scan); > > > > > > > > I get the total number of rows not the number of rows filtered. > > > > > > > > So any idea ?? > > > > > > > > Thanks Ram :) > > > > > > > > > Date: Mon, 24 Dec 2012 21:57:54 +0530 > > > > > Subject: Re: Hbase Count Aggregate Function > > > > > From: [EMAIL PROTECTED] > > > > > To: [EMAIL PROTECTED] > > > > > > > > > > So you find that scan with a filter and count with the same filter is > > > > > giving you different results? > > > > > > > > > > Regards > > > > > Ram > > > > > > > > > > On Mon, Dec 24, 2012 at 8:33 PM, Dalia Sobhy < > > [EMAIL PROTECTED] > > > > >wrote: > > > > > > > > > > > > > > > > > Dear all, > > > > > > > > > > > > I have 50,000 row with diagnosis qualifier = "cardiac", and another > > > > 50,000 > > > > > > rows with "renal". > > > > > > > > > > > > When I type this in Hbase shell, > > > > > > > > > > > > import org.apache.hadoop.hbase.filter.CompareFilter > > > > > > import org.apache.hadoop.hbase.filter.SingleColumnValueFilter > > > > > > import org.apache.hadoop.hbase.filter.SubstringComparator > > > > > > import org.apache.hadoop.hbase.util.Bytes > > > > > > > > > > > > scan 'patient', { COLUMNS => "info:diagnosis", FILTER => > > > > > > SingleColumnValueFilter.new(Bytes.toBytes('info'), > > > > > > Bytes.toBytes('diagnosis'), > > > > > > CompareFilter::CompareOp.valueOf('EQUAL'), > > > > > > SubstringComparator.new('cardiac'))} > > > > > > > > > > > > Output = 50,000 row > > > > > > > > > > > > import org.apache.hadoop.hbase.filter.CompareFilter > > > > > > import org.apache.hadoop.hbase.filter.SingleColumnValueFilter > > > > > > import org.apache.hadoop.hbase.filter.SubstringComparator > > > > > > import org.apache.hadoop.hbase.util.Bytes > > > > > > > > > > > > count 'patient', { COLUMNS => "info:diagnosis", FILTER => > > > > > > SingleColumnValueFilter.new(Bytes.toBytes('info'), > > > > > > Bytes.toBytes('diagnosis'), > > > > > > CompareFilter::CompareOp.valueOf('EQUAL'), > > > > > > SubstringComparator.new('cardiac'))} > > > > > > Output = 100,000 row > > > > > > > > > > > > Even though I tried it using Hbase Java API, Aggregation Client > > > > Instance, > > > > > > and I enabled the Coprocessor aggregation for the table. > > > > > > rowCount = aggregationClient.rowCount(TABLE_NAME, null, scan) > > > > > > > > > > > > Also when measuring the improved performance on case of adding more +
Dalia Sobhy 2012-12-24, 19:20
-
RE: Hbase Count Aggregate FunctionDalia Sobhy 2012-12-25, 16:42
Do you mean I implement a new rowCount method in Aggregation Client Class. I cannot understand, could u illustrate with a code sample Ram? > > Date: Tue, 25 Dec 2012 00:21:14 +0530 > > Subject: Re: Hbase Count Aggregate Function > > From: [EMAIL PROTECTED] > > To: [EMAIL PROTECTED] > > > > Hi > > You could have custom filter implemented which is similar to > > FirstKeyOnlyfilter. > > Implement the filterKeyValue method such that it should match your keyvalue > > (the specific qualifier that you are looking for). > > > > Deploy it in your cluster. It should work. > > > > Regards > > Ram > > > > On Mon, Dec 24, 2012 at 10:35 PM, Dalia Sobhy <[EMAIL PROTECTED]>wrote: > > > > > > > > So do you have a suggestion how to enable/work the filter? > > > > > > > Date: Mon, 24 Dec 2012 22:22:49 +0530 > > > > Subject: Re: Hbase Count Aggregate Function > > > > From: [EMAIL PROTECTED] > > > > To: [EMAIL PROTECTED] > > > > > > > > Okie, seeing the shell script and the code I feel that while you use this > > > > counter, the user's filter is not taken into account. > > > > It adds a FirstKeyOnlyFilter and proceeds with the scan. :(. > > > > > > > > Regards > > > > Ram > > > > > > > > On Mon, Dec 24, 2012 at 10:11 PM, Dalia Sobhy < > > > [EMAIL PROTECTED]>wrote: > > > > > > > > > > > > > > yeah scan gives the correct number of rows, while count returns the > > > total > > > > > number of rows. > > > > > > > > > > Both are using the same filter, I even tried it using Java API, using > > > row > > > > > count method. > > > > > > > > > > rowCount = aggregationClient.rowCount(TABLE_NAME, null, scan); > > > > > > > > > > I get the total number of rows not the number of rows filtered. > > > > > > > > > > So any idea ?? > > > > > > > > > > Thanks Ram :) > > > > > > > > > > > Date: Mon, 24 Dec 2012 21:57:54 +0530 > > > > > > Subject: Re: Hbase Count Aggregate Function > > > > > > From: [EMAIL PROTECTED] > > > > > > To: [EMAIL PROTECTED] > > > > > > > > > > > > So you find that scan with a filter and count with the same filter is > > > > > > giving you different results? > > > > > > > > > > > > Regards > > > > > > Ram > > > > > > > > > > > > On Mon, Dec 24, 2012 at 8:33 PM, Dalia Sobhy < > > > [EMAIL PROTECTED] > > > > > >wrote: > > > > > > > > > > > > > > > > > > > > Dear all, > > > > > > > > > > > > > > I have 50,000 row with diagnosis qualifier = "cardiac", and another > > > > > 50,000 > > > > > > > rows with "renal". > > > > > > > > > > > > > > When I type this in Hbase shell, > > > > > > > > > > > > > > import org.apache.hadoop.hbase.filter.CompareFilter > > > > > > > import org.apache.hadoop.hbase.filter.SingleColumnValueFilter > > > > > > > import org.apache.hadoop.hbase.filter.SubstringComparator > > > > > > > import org.apache.hadoop.hbase.util.Bytes > > > > > > > > > > > > > > scan 'patient', { COLUMNS => "info:diagnosis", FILTER => > > > > > > > SingleColumnValueFilter.new(Bytes.toBytes('info'), > > > > > > > Bytes.toBytes('diagnosis'), > > > > > > > CompareFilter::CompareOp.valueOf('EQUAL'), > > > > > > > SubstringComparator.new('cardiac'))} > > > > > > > > > > > > > > Output = 50,000 row > > > > > > > > > > > > > > import org.apache.hadoop.hbase.filter.CompareFilter > > > > > > > import org.apache.hadoop.hbase.filter.SingleColumnValueFilter > > > > > > > import org.apache.hadoop.hbase.filter.SubstringComparator > > > > > > > import org.apache.hadoop.hbase.util.Bytes > > > > > > > > > > > > > > count 'patient', { COLUMNS => "info:diagnosis", FILTER => > > > > > > > SingleColumnValueFilter.new(Bytes.toBytes('info'), > > > > > > > Bytes.toBytes('diagnosis'), > > > > > > > CompareFilter::CompareOp.valueOf('EQUAL'), > > > > > > > SubstringComparator.new('cardiac'))} > > > > > > > Output = 100,000 row > > > > > > > > > > > > > > Even though I tried it using Hbase Java API, Aggregation Client > > > > > Instance, +
Dalia Sobhy 2012-12-25, 16:42
-
Re: Hbase Count Aggregate Functionyuzhihong@... 2012-12-25, 16:57
RowCount method accepts scan object where you can attach your custom filter.
Cheers On Dec 25, 2012, at 8:42 AM, Dalia Sobhy <[EMAIL PROTECTED]> wrote: > > Do you mean I implement a new rowCount method in Aggregation Client Class. > > I cannot understand, could u illustrate with a code sample Ram? > >>> Date: Tue, 25 Dec 2012 00:21:14 +0530 >>> Subject: Re: Hbase Count Aggregate Function >>> From: [EMAIL PROTECTED] >>> To: [EMAIL PROTECTED] >>> >>> Hi >>> You could have custom filter implemented which is similar to >>> FirstKeyOnlyfilter. >>> Implement the filterKeyValue method such that it should match your keyvalue >>> (the specific qualifier that you are looking for). >>> >>> Deploy it in your cluster. It should work. >>> >>> Regards >>> Ram >>> >>> On Mon, Dec 24, 2012 at 10:35 PM, Dalia Sobhy <[EMAIL PROTECTED]>wrote: >>> >>>> >>>> So do you have a suggestion how to enable/work the filter? >>>> >>>>> Date: Mon, 24 Dec 2012 22:22:49 +0530 >>>>> Subject: Re: Hbase Count Aggregate Function >>>>> From: [EMAIL PROTECTED] >>>>> To: [EMAIL PROTECTED] >>>>> >>>>> Okie, seeing the shell script and the code I feel that while you use this >>>>> counter, the user's filter is not taken into account. >>>>> It adds a FirstKeyOnlyFilter and proceeds with the scan. :(. >>>>> >>>>> Regards >>>>> Ram >>>>> >>>>> On Mon, Dec 24, 2012 at 10:11 PM, Dalia Sobhy < >>>> [EMAIL PROTECTED]>wrote: >>>>> >>>>>> >>>>>> yeah scan gives the correct number of rows, while count returns the >>>> total >>>>>> number of rows. >>>>>> >>>>>> Both are using the same filter, I even tried it using Java API, using >>>> row >>>>>> count method. >>>>>> >>>>>> rowCount = aggregationClient.rowCount(TABLE_NAME, null, scan); >>>>>> >>>>>> I get the total number of rows not the number of rows filtered. >>>>>> >>>>>> So any idea ?? >>>>>> >>>>>> Thanks Ram :) >>>>>> >>>>>>> Date: Mon, 24 Dec 2012 21:57:54 +0530 >>>>>>> Subject: Re: Hbase Count Aggregate Function >>>>>>> From: [EMAIL PROTECTED] >>>>>>> To: [EMAIL PROTECTED] >>>>>>> >>>>>>> So you find that scan with a filter and count with the same filter is >>>>>>> giving you different results? >>>>>>> >>>>>>> Regards >>>>>>> Ram >>>>>>> >>>>>>> On Mon, Dec 24, 2012 at 8:33 PM, Dalia Sobhy < >>>> [EMAIL PROTECTED] >>>>>>> wrote: >>>>>>> >>>>>>>> >>>>>>>> Dear all, >>>>>>>> >>>>>>>> I have 50,000 row with diagnosis qualifier = "cardiac", and another >>>>>> 50,000 >>>>>>>> rows with "renal". >>>>>>>> >>>>>>>> When I type this in Hbase shell, >>>>>>>> >>>>>>>> import org.apache.hadoop.hbase.filter.CompareFilter >>>>>>>> import org.apache.hadoop.hbase.filter.SingleColumnValueFilter >>>>>>>> import org.apache.hadoop.hbase.filter.SubstringComparator >>>>>>>> import org.apache.hadoop.hbase.util.Bytes >>>>>>>> >>>>>>>> scan 'patient', { COLUMNS => "info:diagnosis", FILTER => >>>>>>>> SingleColumnValueFilter.new(Bytes.toBytes('info'), >>>>>>>> Bytes.toBytes('diagnosis'), >>>>>>>> CompareFilter::CompareOp.valueOf('EQUAL'), >>>>>>>> SubstringComparator.new('cardiac'))} >>>>>>>> >>>>>>>> Output = 50,000 row >>>>>>>> >>>>>>>> import org.apache.hadoop.hbase.filter.CompareFilter >>>>>>>> import org.apache.hadoop.hbase.filter.SingleColumnValueFilter >>>>>>>> import org.apache.hadoop.hbase.filter.SubstringComparator >>>>>>>> import org.apache.hadoop.hbase.util.Bytes >>>>>>>> >>>>>>>> count 'patient', { COLUMNS => "info:diagnosis", FILTER => >>>>>>>> SingleColumnValueFilter.new(Bytes.toBytes('info'), >>>>>>>> Bytes.toBytes('diagnosis'), >>>>>>>> CompareFilter::CompareOp.valueOf('EQUAL'), >>>>>>>> SubstringComparator.new('cardiac'))} >>>>>>>> Output = 100,000 row >>>>>>>> >>>>>>>> Even though I tried it using Hbase Java API, Aggregation Client >>>>>> Instance, >>>>>>>> and I enabled the Coprocessor aggregation for the table. >>>>>>>> rowCount = aggregationClient.rowCount(TABLE_NAME, null, scan) +
yuzhihong@... 2012-12-25, 16:57
-
Re: Hbase Count Aggregate Functionramkrishna vasudevan 2012-12-25, 17:14
@Dalia
I think the aggregation client should work with what you have passed. What i meant in the previous mail was with table.count() and now with AggregationClient. {code} if (scan.getFilter() == null && qualifier == null) scan.setFilter(new FirstKeyOnlyFilter()); {code} So as you have passed the filter then it should work as how the SCVF should work. I can check this out during free time (may be tomorrow). If not you can raise a bug. If it turns to be fine then we can close it out otherwise its better we fix it. I can understand your urgency in this. Regards Ram On Tue, Dec 25, 2012 at 10:27 PM, <[EMAIL PROTECTED]> wrote: > RowCount method accepts scan object where you can attach your custom > filter. > > Cheers > > > > On Dec 25, 2012, at 8:42 AM, Dalia Sobhy <[EMAIL PROTECTED]> > wrote: > > > > > Do you mean I implement a new rowCount method in Aggregation Client > Class. > > > > I cannot understand, could u illustrate with a code sample Ram? > > > >>> Date: Tue, 25 Dec 2012 00:21:14 +0530 > >>> Subject: Re: Hbase Count Aggregate Function > >>> From: [EMAIL PROTECTED] > >>> To: [EMAIL PROTECTED] > >>> > >>> Hi > >>> You could have custom filter implemented which is similar to > >>> FirstKeyOnlyfilter. > >>> Implement the filterKeyValue method such that it should match your > keyvalue > >>> (the specific qualifier that you are looking for). > >>> > >>> Deploy it in your cluster. It should work. > >>> > >>> Regards > >>> Ram > >>> > >>> On Mon, Dec 24, 2012 at 10:35 PM, Dalia Sobhy < > [EMAIL PROTECTED]>wrote: > >>> > >>>> > >>>> So do you have a suggestion how to enable/work the filter? > >>>> > >>>>> Date: Mon, 24 Dec 2012 22:22:49 +0530 > >>>>> Subject: Re: Hbase Count Aggregate Function > >>>>> From: [EMAIL PROTECTED] > >>>>> To: [EMAIL PROTECTED] > >>>>> > >>>>> Okie, seeing the shell script and the code I feel that while you use > this > >>>>> counter, the user's filter is not taken into account. > >>>>> It adds a FirstKeyOnlyFilter and proceeds with the scan. :(. > >>>>> > >>>>> Regards > >>>>> Ram > >>>>> > >>>>> On Mon, Dec 24, 2012 at 10:11 PM, Dalia Sobhy < > >>>> [EMAIL PROTECTED]>wrote: > >>>>> > >>>>>> > >>>>>> yeah scan gives the correct number of rows, while count returns the > >>>> total > >>>>>> number of rows. > >>>>>> > >>>>>> Both are using the same filter, I even tried it using Java API, > using > >>>> row > >>>>>> count method. > >>>>>> > >>>>>> rowCount = aggregationClient.rowCount(TABLE_NAME, null, scan); > >>>>>> > >>>>>> I get the total number of rows not the number of rows filtered. > >>>>>> > >>>>>> So any idea ?? > >>>>>> > >>>>>> Thanks Ram :) > >>>>>> > >>>>>>> Date: Mon, 24 Dec 2012 21:57:54 +0530 > >>>>>>> Subject: Re: Hbase Count Aggregate Function > >>>>>>> From: [EMAIL PROTECTED] > >>>>>>> To: [EMAIL PROTECTED] > >>>>>>> > >>>>>>> So you find that scan with a filter and count with the same filter > is > >>>>>>> giving you different results? > >>>>>>> > >>>>>>> Regards > >>>>>>> Ram > >>>>>>> > >>>>>>> On Mon, Dec 24, 2012 at 8:33 PM, Dalia Sobhy < > >>>> [EMAIL PROTECTED] > >>>>>>> wrote: > >>>>>>> > >>>>>>>> > >>>>>>>> Dear all, > >>>>>>>> > >>>>>>>> I have 50,000 row with diagnosis qualifier = "cardiac", and > another > >>>>>> 50,000 > >>>>>>>> rows with "renal". > >>>>>>>> > >>>>>>>> When I type this in Hbase shell, > >>>>>>>> > >>>>>>>> import org.apache.hadoop.hbase.filter.CompareFilter > >>>>>>>> import org.apache.hadoop.hbase.filter.SingleColumnValueFilter > >>>>>>>> import org.apache.hadoop.hbase.filter.SubstringComparator > >>>>>>>> import org.apache.hadoop.hbase.util.Bytes > >>>>>>>> > >>>>>>>> scan 'patient', { COLUMNS => "info:diagnosis", FILTER => > >>>>>>>> SingleColumnValueFilter.new(Bytes.toBytes('info'), > >>>>>>>> Bytes.toBytes('diagnosis'), > >>>>>>>> CompareFilter::CompareOp.valueOf('EQUAL'), > >>>>>>>> SubstringComparator.new('cardiac'))} +
ramkrishna vasudevan 2012-12-25, 17:14
-
RE: Hbase Count Aggregate FunctionDalia Sobhy 2012-12-25, 17:55
Is there a problem in letting ID (rowkey) "int" value?? > Date: Tue, 25 Dec 2012 22:44:00 +0530 > Subject: Re: Hbase Count Aggregate Function > From: [EMAIL PROTECTED] > To: [EMAIL PROTECTED] > > @Dalia > > I think the aggregation client should work with what you have passed. What > i meant in the previous mail was with table.count() and now with > AggregationClient. > {code} > if (scan.getFilter() == null && qualifier == null) > scan.setFilter(new FirstKeyOnlyFilter()); > {code} > > So as you have passed the filter then it should work as how the SCVF should > work. I can check this out during free time (may be tomorrow). > If not you can raise a bug. If it turns to be fine then we can close it > out otherwise its better we fix it. > I can understand your urgency in this. > > Regards > Ram > > > > > > On Tue, Dec 25, 2012 at 10:27 PM, <[EMAIL PROTECTED]> wrote: > > > RowCount method accepts scan object where you can attach your custom > > filter. > > > > Cheers > > > > > > > > On Dec 25, 2012, at 8:42 AM, Dalia Sobhy <[EMAIL PROTECTED]> > > wrote: > > > > > > > > Do you mean I implement a new rowCount method in Aggregation Client > > Class. > > > > > > I cannot understand, could u illustrate with a code sample Ram? > > > > > >>> Date: Tue, 25 Dec 2012 00:21:14 +0530 > > >>> Subject: Re: Hbase Count Aggregate Function > > >>> From: [EMAIL PROTECTED] > > >>> To: [EMAIL PROTECTED] > > >>> > > >>> Hi > > >>> You could have custom filter implemented which is similar to > > >>> FirstKeyOnlyfilter. > > >>> Implement the filterKeyValue method such that it should match your > > keyvalue > > >>> (the specific qualifier that you are looking for). > > >>> > > >>> Deploy it in your cluster. It should work. > > >>> > > >>> Regards > > >>> Ram > > >>> > > >>> On Mon, Dec 24, 2012 at 10:35 PM, Dalia Sobhy < > > [EMAIL PROTECTED]>wrote: > > >>> > > >>>> > > >>>> So do you have a suggestion how to enable/work the filter? > > >>>> > > >>>>> Date: Mon, 24 Dec 2012 22:22:49 +0530 > > >>>>> Subject: Re: Hbase Count Aggregate Function > > >>>>> From: [EMAIL PROTECTED] > > >>>>> To: [EMAIL PROTECTED] > > >>>>> > > >>>>> Okie, seeing the shell script and the code I feel that while you use > > this > > >>>>> counter, the user's filter is not taken into account. > > >>>>> It adds a FirstKeyOnlyFilter and proceeds with the scan. :(. > > >>>>> > > >>>>> Regards > > >>>>> Ram > > >>>>> > > >>>>> On Mon, Dec 24, 2012 at 10:11 PM, Dalia Sobhy < > > >>>> [EMAIL PROTECTED]>wrote: > > >>>>> > > >>>>>> > > >>>>>> yeah scan gives the correct number of rows, while count returns the > > >>>> total > > >>>>>> number of rows. > > >>>>>> > > >>>>>> Both are using the same filter, I even tried it using Java API, > > using > > >>>> row > > >>>>>> count method. > > >>>>>> > > >>>>>> rowCount = aggregationClient.rowCount(TABLE_NAME, null, scan); > > >>>>>> > > >>>>>> I get the total number of rows not the number of rows filtered. > > >>>>>> > > >>>>>> So any idea ?? > > >>>>>> > > >>>>>> Thanks Ram :) > > >>>>>> > > >>>>>>> Date: Mon, 24 Dec 2012 21:57:54 +0530 > > >>>>>>> Subject: Re: Hbase Count Aggregate Function > > >>>>>>> From: [EMAIL PROTECTED] > > >>>>>>> To: [EMAIL PROTECTED] > > >>>>>>> > > >>>>>>> So you find that scan with a filter and count with the same filter > > is > > >>>>>>> giving you different results? > > >>>>>>> > > >>>>>>> Regards > > >>>>>>> Ram > > >>>>>>> > > >>>>>>> On Mon, Dec 24, 2012 at 8:33 PM, Dalia Sobhy < > > >>>> [EMAIL PROTECTED] > > >>>>>>> wrote: > > >>>>>>> > > >>>>>>>> > > >>>>>>>> Dear all, > > >>>>>>>> > > >>>>>>>> I have 50,000 row with diagnosis qualifier = "cardiac", and > > another > > >>>>>> 50,000 > > >>>>>>>> rows with "renal". > > >>>>>>>> > > >>>>>>>> When I type this in Hbase shell, > > >>>>>>>> > > >>>>>>>> import org.apache.hadoop.hbase.filter.CompareFilter > > >>>>>>>> import org.apache.hadoop.hbase.filter.SingleColumnValueFilter +
Dalia Sobhy 2012-12-25, 17:55
-
Re: Hbase Count Aggregate Functionramkrishna vasudevan 2012-12-26, 15:41
Dalia,
I tried out this eg, {code} private static final byte[] TEST_TABLE = Bytes.toBytes("TestTable"); private static final byte[] TEST_FAMILY = Bytes.toBytes("TestFamily"); private static final byte[] TEST_QUALIFIER Bytes.toBytes("TestQualifier"); private static final byte[] TEST_MULTI_CQ = Bytes.toBytes("TestMultiCQ"); private static byte[] ROW = Bytes.toBytes("testRow"); private static final int ROWSIZE = 20; private static final int rowSeperator1 = 5; private static final int rowSeperator2 = 12; private static byte[][] ROWS = makeN(ROW, ROWSIZE); for (int i = 0; i < ROWSIZE; i++) { Put put = new Put(ROWS[i]); put.setWriteToWAL(false); Long l = new Long(i); put.add(TEST_FAMILY, TEST_QUALIFIER, Bytes.toBytes(l)); table.put(put); Put p2 = new Put(ROWS[i]); put.setWriteToWAL(false); p2.add(TEST_FAMILY, Bytes.add(TEST_MULTI_CQ, Bytes.toBytes(l)), Bytes .toBytes(l * 10)); table.put(p2); AggregationClient aClient = new AggregationClient(conf); Scan scan = new Scan(); scan.addColumn(TEST_FAMILY, TEST_QUALIFIER); final ColumnInterpreter<Long, Long> ci = new LongColumnInterpreter(); SingleColumnValueFilter scvf = new SingleColumnValueFilter(TEST_FAMILY, TEST_QUALIFIER, CompareOp.EQUAL, Bytes.toBytes(4l)); scan.setFilter(scvf); long rowCount = aClient.rowCount(TEST_TABLE, ci, scan); assertEquals(ROWSIZE, rowCount); } {code} So this assertion is failing and it is working as expected. If you want to try out check out the testcase in TestAggregateProtocol.testRowCountAllTable(). Just modify the testcase so that you pass a SingleColumnValueFilter. It is working fine. Please check and let me know. May be am doing some mistake. Regards Ram On Tue, Dec 25, 2012 at 11:25 PM, Dalia Sobhy <[EMAIL PROTECTED]>wrote: > > Is there a problem in letting ID (rowkey) "int" value?? > > > Date: Tue, 25 Dec 2012 22:44:00 +0530 > > Subject: Re: Hbase Count Aggregate Function > > From: [EMAIL PROTECTED] > > To: [EMAIL PROTECTED] > > > > @Dalia > > > > I think the aggregation client should work with what you have passed. > What > > i meant in the previous mail was with table.count() and now with > > AggregationClient. > > {code} > > if (scan.getFilter() == null && qualifier == null) > > scan.setFilter(new FirstKeyOnlyFilter()); > > {code} > > > > So as you have passed the filter then it should work as how the SCVF > should > > work. I can check this out during free time (may be tomorrow). > > If not you can raise a bug. If it turns to be fine then we can close it > > out otherwise its better we fix it. > > I can understand your urgency in this. > > > > Regards > > Ram > > > > > > > > > > > > On Tue, Dec 25, 2012 at 10:27 PM, <[EMAIL PROTECTED]> wrote: > > > > > RowCount method accepts scan object where you can attach your custom > > > filter. > > > > > > Cheers > > > > > > > > > > > > On Dec 25, 2012, at 8:42 AM, Dalia Sobhy <[EMAIL PROTECTED]> > > > wrote: > > > > > > > > > > > Do you mean I implement a new rowCount method in Aggregation Client > > > Class. > > > > > > > > I cannot understand, could u illustrate with a code sample Ram? > > > > > > > >>> Date: Tue, 25 Dec 2012 00:21:14 +0530 > > > >>> Subject: Re: Hbase Count Aggregate Function > > > >>> From: [EMAIL PROTECTED] > > > >>> To: [EMAIL PROTECTED] > > > >>> > > > >>> Hi > > > >>> You could have custom filter implemented which is similar to > > > >>> FirstKeyOnlyfilter. > > > >>> Implement the filterKeyValue method such that it should match your > > > keyvalue > > > >>> (the specific qualifier that you are looking for). > > > >>> > > > >>> Deploy it in your cluster. It should work. > > > >>> > > > >>> Regards > > > >>> Ram > > > >>> > > > >>> On Mon, Dec 24, 2012 at 10:35 PM, Dalia Sobhy < > > > [EMAIL PROTECTED]>wrote: > > > >>> > > > >>>> > > > >>>> So do you have a suggestion how to enable/work the filter? +
ramkrishna vasudevan 2012-12-26, 15:41
-
RE: Hbase Count Aggregate FunctionDalia Sobhy 2013-01-01, 21:44
Thanks Ram, Issue is resolved i forgot to add scan.addFilter(fliterlist); Thats why it was not filtering !!! > Date: Wed, 26 Dec 2012 21:11:32 +0530 > Subject: Re: Hbase Count Aggregate Function > From: [EMAIL PROTECTED] > To: [EMAIL PROTECTED] > > Dalia, > > I tried out this eg, > > {code} > private static final byte[] TEST_TABLE = Bytes.toBytes("TestTable"); > private static final byte[] TEST_FAMILY = Bytes.toBytes("TestFamily"); > private static final byte[] TEST_QUALIFIER > Bytes.toBytes("TestQualifier"); > private static final byte[] TEST_MULTI_CQ = Bytes.toBytes("TestMultiCQ"); > > private static byte[] ROW = Bytes.toBytes("testRow"); > private static final int ROWSIZE = 20; > private static final int rowSeperator1 = 5; > private static final int rowSeperator2 = 12; > private static byte[][] ROWS = makeN(ROW, ROWSIZE); > for (int i = 0; i < ROWSIZE; i++) { > Put put = new Put(ROWS[i]); > put.setWriteToWAL(false); > Long l = new Long(i); > put.add(TEST_FAMILY, TEST_QUALIFIER, Bytes.toBytes(l)); > table.put(put); > Put p2 = new Put(ROWS[i]); > put.setWriteToWAL(false); > p2.add(TEST_FAMILY, Bytes.add(TEST_MULTI_CQ, Bytes.toBytes(l)), Bytes > .toBytes(l * 10)); > table.put(p2); > > AggregationClient aClient = new AggregationClient(conf); > Scan scan = new Scan(); > scan.addColumn(TEST_FAMILY, TEST_QUALIFIER); > final ColumnInterpreter<Long, Long> ci = new LongColumnInterpreter(); > SingleColumnValueFilter scvf = new SingleColumnValueFilter(TEST_FAMILY, > TEST_QUALIFIER, CompareOp.EQUAL, > Bytes.toBytes(4l)); > scan.setFilter(scvf); > long rowCount = aClient.rowCount(TEST_TABLE, ci, > scan); > assertEquals(ROWSIZE, rowCount); > } > {code} > > So this assertion is failing and it is working as expected. If you want to > try out check out the testcase > in TestAggregateProtocol.testRowCountAllTable(). > Just modify the testcase so that you pass a SingleColumnValueFilter. It is > working fine. > > Please check and let me know. May be am doing some mistake. > > Regards > Ram > > On Tue, Dec 25, 2012 at 11:25 PM, Dalia Sobhy <[EMAIL PROTECTED]>wrote: > > > > > Is there a problem in letting ID (rowkey) "int" value?? > > > > > Date: Tue, 25 Dec 2012 22:44:00 +0530 > > > Subject: Re: Hbase Count Aggregate Function > > > From: [EMAIL PROTECTED] > > > To: [EMAIL PROTECTED] > > > > > > @Dalia > > > > > > I think the aggregation client should work with what you have passed. > > What > > > i meant in the previous mail was with table.count() and now with > > > AggregationClient. > > > {code} > > > if (scan.getFilter() == null && qualifier == null) > > > scan.setFilter(new FirstKeyOnlyFilter()); > > > {code} > > > > > > So as you have passed the filter then it should work as how the SCVF > > should > > > work. I can check this out during free time (may be tomorrow). > > > If not you can raise a bug. If it turns to be fine then we can close it > > > out otherwise its better we fix it. > > > I can understand your urgency in this. > > > > > > Regards > > > Ram > > > > > > > > > > > > > > > > > > On Tue, Dec 25, 2012 at 10:27 PM, <[EMAIL PROTECTED]> wrote: > > > > > > > RowCount method accepts scan object where you can attach your custom > > > > filter. > > > > > > > > Cheers > > > > > > > > > > > > > > > > On Dec 25, 2012, at 8:42 AM, Dalia Sobhy <[EMAIL PROTECTED]> > > > > wrote: > > > > > > > > > > > > > > Do you mean I implement a new rowCount method in Aggregation Client > > > > Class. > > > > > > > > > > I cannot understand, could u illustrate with a code sample Ram? > > > > > > > > > >>> Date: Tue, 25 Dec 2012 00:21:14 +0530 > > > > >>> Subject: Re: Hbase Count Aggregate Function > > > > >>> From: [EMAIL PROTECTED] > > > > >>> To: [EMAIL PROTECTED] > > > > >>> > > > > >>> Hi > > > > >>> You could have custom filter implemented which is similar to +
Dalia Sobhy 2013-01-01, 21:44
-
Re: Hbase Count Aggregate Functionramkrishna vasudevan 2013-01-02, 04:09
Oh...Oops..
Regards Ram On Wed, Jan 2, 2013 at 3:14 AM, Dalia Sobhy <[EMAIL PROTECTED]>wrote: > > Thanks Ram, > > Issue is resolved i forgot to add > scan.addFilter(fliterlist); > > Thats why it was not filtering !!! > > > Date: Wed, 26 Dec 2012 21:11:32 +0530 > > Subject: Re: Hbase Count Aggregate Function > > From: [EMAIL PROTECTED] > > To: [EMAIL PROTECTED] > > > > Dalia, > > > > I tried out this eg, > > > > {code} > > private static final byte[] TEST_TABLE = Bytes.toBytes("TestTable"); > > private static final byte[] TEST_FAMILY = Bytes.toBytes("TestFamily"); > > private static final byte[] TEST_QUALIFIER > > Bytes.toBytes("TestQualifier"); > > private static final byte[] TEST_MULTI_CQ > Bytes.toBytes("TestMultiCQ"); > > > > private static byte[] ROW = Bytes.toBytes("testRow"); > > private static final int ROWSIZE = 20; > > private static final int rowSeperator1 = 5; > > private static final int rowSeperator2 = 12; > > private static byte[][] ROWS = makeN(ROW, ROWSIZE); > > for (int i = 0; i < ROWSIZE; i++) { > > Put put = new Put(ROWS[i]); > > put.setWriteToWAL(false); > > Long l = new Long(i); > > put.add(TEST_FAMILY, TEST_QUALIFIER, Bytes.toBytes(l)); > > table.put(put); > > Put p2 = new Put(ROWS[i]); > > put.setWriteToWAL(false); > > p2.add(TEST_FAMILY, Bytes.add(TEST_MULTI_CQ, Bytes.toBytes(l)), > Bytes > > .toBytes(l * 10)); > > table.put(p2); > > > > AggregationClient aClient = new AggregationClient(conf); > > Scan scan = new Scan(); > > scan.addColumn(TEST_FAMILY, TEST_QUALIFIER); > > final ColumnInterpreter<Long, Long> ci = new LongColumnInterpreter(); > > SingleColumnValueFilter scvf = new > SingleColumnValueFilter(TEST_FAMILY, > > TEST_QUALIFIER, CompareOp.EQUAL, > > Bytes.toBytes(4l)); > > scan.setFilter(scvf); > > long rowCount = aClient.rowCount(TEST_TABLE, ci, > > scan); > > assertEquals(ROWSIZE, rowCount); > > } > > {code} > > > > So this assertion is failing and it is working as expected. If you want > to > > try out check out the testcase > > in TestAggregateProtocol.testRowCountAllTable(). > > Just modify the testcase so that you pass a SingleColumnValueFilter. It > is > > working fine. > > > > Please check and let me know. May be am doing some mistake. > > > > Regards > > Ram > > > > On Tue, Dec 25, 2012 at 11:25 PM, Dalia Sobhy < > [EMAIL PROTECTED]>wrote: > > > > > > > > Is there a problem in letting ID (rowkey) "int" value?? > > > > > > > Date: Tue, 25 Dec 2012 22:44:00 +0530 > > > > Subject: Re: Hbase Count Aggregate Function > > > > From: [EMAIL PROTECTED] > > > > To: [EMAIL PROTECTED] > > > > > > > > @Dalia > > > > > > > > I think the aggregation client should work with what you have passed. > > > What > > > > i meant in the previous mail was with table.count() and now with > > > > AggregationClient. > > > > {code} > > > > if (scan.getFilter() == null && qualifier == null) > > > > scan.setFilter(new FirstKeyOnlyFilter()); > > > > {code} > > > > > > > > So as you have passed the filter then it should work as how the SCVF > > > should > > > > work. I can check this out during free time (may be tomorrow). > > > > If not you can raise a bug. If it turns to be fine then we can > close it > > > > out otherwise its better we fix it. > > > > I can understand your urgency in this. > > > > > > > > Regards > > > > Ram > > > > > > > > > > > > > > > > > > > > > > > > On Tue, Dec 25, 2012 at 10:27 PM, <[EMAIL PROTECTED]> wrote: > > > > > > > > > RowCount method accepts scan object where you can attach your > custom > > > > > filter. > > > > > > > > > > Cheers > > > > > > > > > > > > > > > > > > > > On Dec 25, 2012, at 8:42 AM, Dalia Sobhy < > [EMAIL PROTECTED]> > > > > > wrote: > > > > > > > > > > > > > > > > > Do you mean I implement a new rowCount method in Aggregation > Client > > > > > Class. > > > > > > +
ramkrishna vasudevan 2013-01-02, 04:09
-
RE: Hbase Count Aggregate FunctionDalia Sobhy 2012-12-25, 17:45
Thanks Ram, I have tried it alot. I even tried to it by hbase shell, by scanning using filters. By using scan , it returns the right number. But still the aggregationClient RowCount method returns the wrong number as if it cannot see the filter. Although I have sent it false values to return zero, it returned the total number of rows in the table. So what do you think ?? > Date: Tue, 25 Dec 2012 22:44:00 +0530 > Subject: Re: Hbase Count Aggregate Function > From: [EMAIL PROTECTED] > To: [EMAIL PROTECTED] > > @Dalia > > I think the aggregation client should work with what you have passed. What > i meant in the previous mail was with table.count() and now with > AggregationClient. > {code} > if (scan.getFilter() == null && qualifier == null) > scan.setFilter(new FirstKeyOnlyFilter()); > {code} > > So as you have passed the filter then it should work as how the SCVF should > work. I can check this out during free time (may be tomorrow). > If not you can raise a bug. If it turns to be fine then we can close it > out otherwise its better we fix it. > I can understand your urgency in this. > > Regards > Ram > > > > > > On Tue, Dec 25, 2012 at 10:27 PM, <[EMAIL PROTECTED]> wrote: > > > RowCount method accepts scan object where you can attach your custom > > filter. > > > > Cheers > > > > > > > > On Dec 25, 2012, at 8:42 AM, Dalia Sobhy <[EMAIL PROTECTED]> > > wrote: > > > > > > > > Do you mean I implement a new rowCount method in Aggregation Client > > Class. > > > > > > I cannot understand, could u illustrate with a code sample Ram? > > > > > >>> Date: Tue, 25 Dec 2012 00:21:14 +0530 > > >>> Subject: Re: Hbase Count Aggregate Function > > >>> From: [EMAIL PROTECTED] > > >>> To: [EMAIL PROTECTED] > > >>> > > >>> Hi > > >>> You could have custom filter implemented which is similar to > > >>> FirstKeyOnlyfilter. > > >>> Implement the filterKeyValue method such that it should match your > > keyvalue > > >>> (the specific qualifier that you are looking for). > > >>> > > >>> Deploy it in your cluster. It should work. > > >>> > > >>> Regards > > >>> Ram > > >>> > > >>> On Mon, Dec 24, 2012 at 10:35 PM, Dalia Sobhy < > > [EMAIL PROTECTED]>wrote: > > >>> > > >>>> > > >>>> So do you have a suggestion how to enable/work the filter? > > >>>> > > >>>>> Date: Mon, 24 Dec 2012 22:22:49 +0530 > > >>>>> Subject: Re: Hbase Count Aggregate Function > > >>>>> From: [EMAIL PROTECTED] > > >>>>> To: [EMAIL PROTECTED] > > >>>>> > > >>>>> Okie, seeing the shell script and the code I feel that while you use > > this > > >>>>> counter, the user's filter is not taken into account. > > >>>>> It adds a FirstKeyOnlyFilter and proceeds with the scan. :(. > > >>>>> > > >>>>> Regards > > >>>>> Ram > > >>>>> > > >>>>> On Mon, Dec 24, 2012 at 10:11 PM, Dalia Sobhy < > > >>>> [EMAIL PROTECTED]>wrote: > > >>>>> > > >>>>>> > > >>>>>> yeah scan gives the correct number of rows, while count returns the > > >>>> total > > >>>>>> number of rows. > > >>>>>> > > >>>>>> Both are using the same filter, I even tried it using Java API, > > using > > >>>> row > > >>>>>> count method. > > >>>>>> > > >>>>>> rowCount = aggregationClient.rowCount(TABLE_NAME, null, scan); > > >>>>>> > > >>>>>> I get the total number of rows not the number of rows filtered. > > >>>>>> > > >>>>>> So any idea ?? > > >>>>>> > > >>>>>> Thanks Ram :) > > >>>>>> > > >>>>>>> Date: Mon, 24 Dec 2012 21:57:54 +0530 > > >>>>>>> Subject: Re: Hbase Count Aggregate Function > > >>>>>>> From: [EMAIL PROTECTED] > > >>>>>>> To: [EMAIL PROTECTED] > > >>>>>>> > > >>>>>>> So you find that scan with a filter and count with the same filter > > is > > >>>>>>> giving you different results? > > >>>>>>> > > >>>>>>> Regards > > >>>>>>> Ram > > >>>>>>> > > >>>>>>> On Mon, Dec 24, 2012 at 8:33 PM, Dalia Sobhy < > > >>>> [EMAIL PROTECTED] > > >>>>>>> wrote: > > >>>>>>> > > >>>>>>>> > > >>>>>>>> Dear all, +
Dalia Sobhy 2012-12-25, 17:45
-
RE: Hbase Count Aggregate FunctionDalia Sobhy 2012-12-24, 19:25
This is my function: public long CountByDiagnosis(String diagnosis) throws IOException { customConf.setStrings("hbase.zookeeper.quorum",hbaseZookeeperQuorum); customConf.setLong("hbase.rpc.timeout", 600000); customConf.setLong("hbase.client.scanner.caching", 1000); configuration = HBaseConfiguration.create(customConf); aggregationClient = new AggregationClient(configuration); scan.addFamily(CF); //Filter by a particular Diagnosis SingleColumnValueFilter filter1 = new SingleColumnValueFilter( CF, Column, CompareOp.EQUAL, Bytes.toBytes(diagnosis) ); scan.setFilter(filter1); long rowCount = -1; //Count the number of patients suffering from cardiac diagnosis try { rowCount = aggregationClient.rowCount(TABLE_NAME, null, scan); } catch (Throwable e) { e.printStackTrace(); } return rowCount; } > Date: Tue, 25 Dec 2012 00:21:14 +0530 > Subject: Re: Hbase Count Aggregate Function > From: [EMAIL PROTECTED] > To: [EMAIL PROTECTED] > > Hi > You could have custom filter implemented which is similar to > FirstKeyOnlyfilter. > Implement the filterKeyValue method such that it should match your keyvalue > (the specific qualifier that you are looking for). > > Deploy it in your cluster. It should work. > > Regards > Ram > > On Mon, Dec 24, 2012 at 10:35 PM, Dalia Sobhy <[EMAIL PROTECTED]>wrote: > > > > > So do you have a suggestion how to enable/work the filter? > > > > > Date: Mon, 24 Dec 2012 22:22:49 +0530 > > > Subject: Re: Hbase Count Aggregate Function > > > From: [EMAIL PROTECTED] > > > To: [EMAIL PROTECTED] > > > > > > Okie, seeing the shell script and the code I feel that while you use this > > > counter, the user's filter is not taken into account. > > > It adds a FirstKeyOnlyFilter and proceeds with the scan. :(. > > > > > > Regards > > > Ram > > > > > > On Mon, Dec 24, 2012 at 10:11 PM, Dalia Sobhy < > > [EMAIL PROTECTED]>wrote: > > > > > > > > > > > yeah scan gives the correct number of rows, while count returns the > > total > > > > number of rows. > > > > > > > > Both are using the same filter, I even tried it using Java API, using > > row > > > > count method. > > > > > > > > rowCount = aggregationClient.rowCount(TABLE_NAME, null, scan); > > > > > > > > I get the total number of rows not the number of rows filtered. > > > > > > > > So any idea ?? > > > > > > > > Thanks Ram :) > > > > > > > > > Date: Mon, 24 Dec 2012 21:57:54 +0530 > > > > > Subject: Re: Hbase Count Aggregate Function > > > > > From: [EMAIL PROTECTED] > > > > > To: [EMAIL PROTECTED] > > > > > > > > > > So you find that scan with a filter and count with the same filter is > > > > > giving you different results? > > > > > > > > > > Regards > > > > > Ram > > > > > > > > > > On Mon, Dec 24, 2012 at 8:33 PM, Dalia Sobhy < > > [EMAIL PROTECTED] > > > > >wrote: > > > > > > > > > > > > > > > > > Dear all, > > > > > > > > > > > > I have 50,000 row with diagnosis qualifier = "cardiac", and another > > > > 50,000 > > > > > > rows with "renal". > > > > > > > > > > > > When I type this in Hbase shell, > > > > > > > > > > > > import org.apache.hadoop.hbase.filter.CompareFilter > > > > > > import org.apache.hadoop.hbase.filter.SingleColumnValueFilter > > > > > > import org.apache.hadoop.hbase.filter.SubstringComparator > > > > > > import org.apache.hadoop.hbase.util.Bytes > > > > > > > > > > > > scan 'patient', { COLUMNS => "info:diagnosis", FILTER => > > > > > > SingleColumnValueFilter.new(Bytes.toBytes('info'), > > > > > > Bytes.toBytes('diagnosis'), > > > > > > CompareFilter::CompareOp.valueOf('EQUAL'), > > > > > > SubstringComparator.new('cardiac'))} > > > > > > > > > > > > Output = 50,000 row > > > > > > > > > > > > import org.apache.hadoop.hbase.filter.CompareFilter > > > > > > import org.apache.hadoop.hbase.filter.SingleColumnValueFilter +
Dalia Sobhy 2012-12-24, 19:25
-
Re: Hbase Count Aggregate FunctionJean-Marc Spaggiari 2012-12-24, 15:51
Hi Dalia,
You already sent the same question yesterday ;) Just give some time to people to look at it. JM 2012/12/24, Dalia Sobhy <[EMAIL PROTECTED]>: > > Dear all, > > I have 50,000 row with diagnosis qualifier = "cardiac", and another 50,000 > rows with "renal". > > When I type this in Hbase shell, > > import org.apache.hadoop.hbase.filter.CompareFilter > import org.apache.hadoop.hbase.filter.SingleColumnValueFilter > import org.apache.hadoop.hbase.filter.SubstringComparator > import org.apache.hadoop.hbase.util.Bytes > > scan 'patient', { COLUMNS => "info:diagnosis", FILTER => > SingleColumnValueFilter.new(Bytes.toBytes('info'), > Bytes.toBytes('diagnosis'), > CompareFilter::CompareOp.valueOf('EQUAL'), > SubstringComparator.new('cardiac'))} > > Output = 50,000 row > > import org.apache.hadoop.hbase.filter.CompareFilter > import org.apache.hadoop.hbase.filter.SingleColumnValueFilter > import org.apache.hadoop.hbase.filter.SubstringComparator > import org.apache.hadoop.hbase.util.Bytes > > count 'patient', { COLUMNS => "info:diagnosis", FILTER => > SingleColumnValueFilter.new(Bytes.toBytes('info'), > Bytes.toBytes('diagnosis'), > CompareFilter::CompareOp.valueOf('EQUAL'), > SubstringComparator.new('cardiac'))} > Output = 100,000 row > > Even though I tried it using Hbase Java API, Aggregation Client Instance, > and I enabled the Coprocessor aggregation for the table. > rowCount = aggregationClient.rowCount(TABLE_NAME, null, scan) > > Also when measuring the improved performance on case of adding more nodes > the operation takes the same time. > > So any advice please? > > I have been throughout all this mess from a couple of weeks > > Thanks, +
Jean-Marc Spaggiari 2012-12-24, 15:51
|