Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
HBase >> mail # user >> Hbase Question


+
Dalia Sobhy 2012-12-23, 23:26
+
周梦想 2012-12-25, 02:31
Copy link to this message
-
Re: Hbase Question
I think you can take a look at your row-key design and evenly
distribute your data in your cluster, as you mentioned even if you
added more nodes, there was no improvement of performance. Maybe you
have a node who is a hot spot, and the other nodes have no work to do.

regards!

Yong

On Tue, Dec 25, 2012 at 3:31 AM, 周梦想 <[EMAIL PROTECTED]> wrote:
> Hi Dalia,
>
> I think you can make a small sample of the table to do the test, then
> you'll find what's the difference of scan and count.
> because you can count it by human.
>
> Best regards,
> Andy
>
> 2012/12/24 Dalia Sobhy <[EMAIL PROTECTED]>
>
>>
>> Dear all,
>>
>> I have 50,000 row with diagnosis qualifier = "cardiac", and another 50,000
>> rows with "renal".
>>
>> When I type this in Hbase shell,
>>
>> import org.apache.hadoop.hbase.filter.CompareFilter
>> import org.apache.hadoop.hbase.filter.SingleColumnValueFilter
>> import org.apache.hadoop.hbase.filter.SubstringComparator
>> import org.apache.hadoop.hbase.util.Bytes
>>
>> scan 'patient', { COLUMNS => "info:diagnosis", FILTER =>
>>     SingleColumnValueFilter.new(Bytes.toBytes('info'),
>>          Bytes.toBytes('diagnosis'),
>>          CompareFilter::CompareOp.valueOf('EQUAL'),
>>          SubstringComparator.new('cardiac'))}
>>
>> Output = 50,000 row
>>
>> import org.apache.hadoop.hbase.filter.CompareFilter
>> import org.apache.hadoop.hbase.filter.SingleColumnValueFilter
>> import org.apache.hadoop.hbase.filter.SubstringComparator
>> import org.apache.hadoop.hbase.util.Bytes
>>
>> count 'patient', { COLUMNS => "info:diagnosis", FILTER =>
>>     SingleColumnValueFilter.new(Bytes.toBytes('info'),
>>          Bytes.toBytes('diagnosis'),
>>          CompareFilter::CompareOp.valueOf('EQUAL'),
>>          SubstringComparator.new('cardiac'))}
>> Output = 100,000 row
>>
>> Even though I tried it using Hbase Java API, Aggregation Client Instance,
>> and I enabled the Coprocessor aggregation for the table.
>> rowCount = aggregationClient.rowCount(TABLE_NAME, null, scan)
>>
>> Also when measuring the improved performance on case of adding more nodes
>> the operation takes the same time.
>>
>> So any advice please?
>>
>> I have been throughout all this mess from a couple of weeks
>>
>> Thanks,
>>
>>
>>
>>
+
Dalia Sobhy 2013-01-01, 21:40
+
Rami Mankevich 2013-04-09, 17:51