Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase, mail # user - Hbase Count Aggregate Function


Copy link to this message
-
Hbase Count Aggregate Function
Dalia Sobhy 2012-12-24, 15:03

Dear all,
 
I have 50,000 row with diagnosis qualifier = "cardiac", and another 50,000 rows with "renal".
 
When I type this in Hbase shell,
 
import org.apache.hadoop.hbase.filter.CompareFilter
import org.apache.hadoop.hbase.filter.SingleColumnValueFilter
import org.apache.hadoop.hbase.filter.SubstringComparator
import org.apache.hadoop.hbase.util.Bytes
 
scan 'patient', { COLUMNS => "info:diagnosis", FILTER =>
    SingleColumnValueFilter.new(Bytes.toBytes('info'),
         Bytes.toBytes('diagnosis'),
         CompareFilter::CompareOp.valueOf('EQUAL'),
         SubstringComparator.new('cardiac'))}
 
Output = 50,000 row
 
import org.apache.hadoop.hbase.filter.CompareFilter
import org.apache.hadoop.hbase.filter.SingleColumnValueFilter
import org.apache.hadoop.hbase.filter.SubstringComparator
import org.apache.hadoop.hbase.util.Bytes
 
count 'patient', { COLUMNS => "info:diagnosis", FILTER =>
    SingleColumnValueFilter.new(Bytes.toBytes('info'),
         Bytes.toBytes('diagnosis'),
         CompareFilter::CompareOp.valueOf('EQUAL'),
         SubstringComparator.new('cardiac'))}
Output = 100,000 row
 
Even though I tried it using Hbase Java API, Aggregation Client Instance, and I enabled the Coprocessor aggregation for the table.
rowCount = aggregationClient.rowCount(TABLE_NAME, null, scan)
 
Also when measuring the improved performance on case of adding more nodes the operation takes the same time.
 
So any advice please?
 
I have been throughout all this mess from a couple of weeks
 
Thanks,