Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> Hbase Question


Copy link to this message
-
Hbase Question

Dear all,

I have 50,000 row with diagnosis qualifier = "cardiac", and another 50,000 rows with "renal".

When I type this in Hbase shell,

import org.apache.hadoop.hbase.filter.CompareFilter
import org.apache.hadoop.hbase.filter.SingleColumnValueFilter
import org.apache.hadoop.hbase.filter.SubstringComparator
import org.apache.hadoop.hbase.util.Bytes

scan 'patient', { COLUMNS => "info:diagnosis", FILTER =>
    SingleColumnValueFilter.new(Bytes.toBytes('info'),
         Bytes.toBytes('diagnosis'),
         CompareFilter::CompareOp.valueOf('EQUAL'),
         SubstringComparator.new('cardiac'))}

Output = 50,000 row

import org.apache.hadoop.hbase.filter.CompareFilter
import org.apache.hadoop.hbase.filter.SingleColumnValueFilter
import org.apache.hadoop.hbase.filter.SubstringComparator
import org.apache.hadoop.hbase.util.Bytes

count 'patient', { COLUMNS => "info:diagnosis", FILTER =>
    SingleColumnValueFilter.new(Bytes.toBytes('info'),
         Bytes.toBytes('diagnosis'),
         CompareFilter::CompareOp.valueOf('EQUAL'),
         SubstringComparator.new('cardiac'))}
Output = 100,000 row

Even though I tried it using Hbase Java API, Aggregation Client Instance, and I enabled the Coprocessor aggregation for the table.
rowCount = aggregationClient.rowCount(TABLE_NAME, null, scan)

Also when measuring the improved performance on case of adding more nodes the operation takes the same time.

So any advice please?

I have been throughout all this mess from a couple of weeks

Thanks,
 
     
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB