|
|
Dalia Sobhy 2012-12-23, 23:26
Dear all,
I have 50,000 row with diagnosis qualifier = "cardiac", and another 50,000 rows with "renal".
When I type this in Hbase shell,
import org.apache.hadoop.hbase.filter.CompareFilter import org.apache.hadoop.hbase.filter.SingleColumnValueFilter import org.apache.hadoop.hbase.filter.SubstringComparator import org.apache.hadoop.hbase.util.Bytes
scan 'patient', { COLUMNS => "info:diagnosis", FILTER => SingleColumnValueFilter.new(Bytes.toBytes('info'), Bytes.toBytes('diagnosis'), CompareFilter::CompareOp.valueOf('EQUAL'), SubstringComparator.new('cardiac'))}
Output = 50,000 row
import org.apache.hadoop.hbase.filter.CompareFilter import org.apache.hadoop.hbase.filter.SingleColumnValueFilter import org.apache.hadoop.hbase.filter.SubstringComparator import org.apache.hadoop.hbase.util.Bytes
count 'patient', { COLUMNS => "info:diagnosis", FILTER => SingleColumnValueFilter.new(Bytes.toBytes('info'), Bytes.toBytes('diagnosis'), CompareFilter::CompareOp.valueOf('EQUAL'), SubstringComparator.new('cardiac'))} Output = 100,000 row
Even though I tried it using Hbase Java API, Aggregation Client Instance, and I enabled the Coprocessor aggregation for the table. rowCount = aggregationClient.rowCount(TABLE_NAME, null, scan)
Also when measuring the improved performance on case of adding more nodes the operation takes the same time.
So any advice please?
I have been throughout all this mess from a couple of weeks
Thanks,
Hi Dalia,
I think you can make a small sample of the table to do the test, then you'll find what's the difference of scan and count. because you can count it by human.
Best regards, Andy
2012/12/24 Dalia Sobhy <[EMAIL PROTECTED]>
> > Dear all, > > I have 50,000 row with diagnosis qualifier = "cardiac", and another 50,000 > rows with "renal". > > When I type this in Hbase shell, > > import org.apache.hadoop.hbase.filter.CompareFilter > import org.apache.hadoop.hbase.filter.SingleColumnValueFilter > import org.apache.hadoop.hbase.filter.SubstringComparator > import org.apache.hadoop.hbase.util.Bytes > > scan 'patient', { COLUMNS => "info:diagnosis", FILTER => > SingleColumnValueFilter.new(Bytes.toBytes('info'), > Bytes.toBytes('diagnosis'), > CompareFilter::CompareOp.valueOf('EQUAL'), > SubstringComparator.new('cardiac'))} > > Output = 50,000 row > > import org.apache.hadoop.hbase.filter.CompareFilter > import org.apache.hadoop.hbase.filter.SingleColumnValueFilter > import org.apache.hadoop.hbase.filter.SubstringComparator > import org.apache.hadoop.hbase.util.Bytes > > count 'patient', { COLUMNS => "info:diagnosis", FILTER => > SingleColumnValueFilter.new(Bytes.toBytes('info'), > Bytes.toBytes('diagnosis'), > CompareFilter::CompareOp.valueOf('EQUAL'), > SubstringComparator.new('cardiac'))} > Output = 100,000 row > > Even though I tried it using Hbase Java API, Aggregation Client Instance, > and I enabled the Coprocessor aggregation for the table. > rowCount = aggregationClient.rowCount(TABLE_NAME, null, scan) > > Also when measuring the improved performance on case of adding more nodes > the operation takes the same time. > > So any advice please? > > I have been throughout all this mess from a couple of weeks > > Thanks, > > > >
I think you can take a look at your row-key design and evenly distribute your data in your cluster, as you mentioned even if you added more nodes, there was no improvement of performance. Maybe you have a node who is a hot spot, and the other nodes have no work to do.
regards!
Yong
On Tue, Dec 25, 2012 at 3:31 AM, 周梦想 <[EMAIL PROTECTED]> wrote: > Hi Dalia, > > I think you can make a small sample of the table to do the test, then > you'll find what's the difference of scan and count. > because you can count it by human. > > Best regards, > Andy > > 2012/12/24 Dalia Sobhy <[EMAIL PROTECTED]> > >> >> Dear all, >> >> I have 50,000 row with diagnosis qualifier = "cardiac", and another 50,000 >> rows with "renal". >> >> When I type this in Hbase shell, >> >> import org.apache.hadoop.hbase.filter.CompareFilter >> import org.apache.hadoop.hbase.filter.SingleColumnValueFilter >> import org.apache.hadoop.hbase.filter.SubstringComparator >> import org.apache.hadoop.hbase.util.Bytes >> >> scan 'patient', { COLUMNS => "info:diagnosis", FILTER => >> SingleColumnValueFilter.new(Bytes.toBytes('info'), >> Bytes.toBytes('diagnosis'), >> CompareFilter::CompareOp.valueOf('EQUAL'), >> SubstringComparator.new('cardiac'))} >> >> Output = 50,000 row >> >> import org.apache.hadoop.hbase.filter.CompareFilter >> import org.apache.hadoop.hbase.filter.SingleColumnValueFilter >> import org.apache.hadoop.hbase.filter.SubstringComparator >> import org.apache.hadoop.hbase.util.Bytes >> >> count 'patient', { COLUMNS => "info:diagnosis", FILTER => >> SingleColumnValueFilter.new(Bytes.toBytes('info'), >> Bytes.toBytes('diagnosis'), >> CompareFilter::CompareOp.valueOf('EQUAL'), >> SubstringComparator.new('cardiac'))} >> Output = 100,000 row >> >> Even though I tried it using Hbase Java API, Aggregation Client Instance, >> and I enabled the Coprocessor aggregation for the table. >> rowCount = aggregationClient.rowCount(TABLE_NAME, null, scan) >> >> Also when measuring the improved performance on case of adding more nodes >> the operation takes the same time. >> >> So any advice please? >> >> I have been throughout all this mess from a couple of weeks >> >> Thanks, >> >> >> >>
Dalia Sobhy 2013-01-01, 21:40
Dear yong,
How to distribute my data in the cluster ? Note that I am using cloudera manager 4.1
Thanks in advance:D
> Date: Fri, 28 Dec 2012 20:38:22 +0100 > Subject: Re: Hbase Question > From: [EMAIL PROTECTED] > To: [EMAIL PROTECTED] > > I think you can take a look at your row-key design and evenly > distribute your data in your cluster, as you mentioned even if you > added more nodes, there was no improvement of performance. Maybe you > have a node who is a hot spot, and the other nodes have no work to do. > > regards! > > Yong > > On Tue, Dec 25, 2012 at 3:31 AM, 周梦想 <[EMAIL PROTECTED]> wrote: > > Hi Dalia, > > > > I think you can make a small sample of the table to do the test, then > > you'll find what's the difference of scan and count. > > because you can count it by human. > > > > Best regards, > > Andy > > > > 2012/12/24 Dalia Sobhy <[EMAIL PROTECTED]> > > > >> > >> Dear all, > >> > >> I have 50,000 row with diagnosis qualifier = "cardiac", and another 50,000 > >> rows with "renal". > >> > >> When I type this in Hbase shell, > >> > >> import org.apache.hadoop.hbase.filter.CompareFilter > >> import org.apache.hadoop.hbase.filter.SingleColumnValueFilter > >> import org.apache.hadoop.hbase.filter.SubstringComparator > >> import org.apache.hadoop.hbase.util.Bytes > >> > >> scan 'patient', { COLUMNS => "info:diagnosis", FILTER => > >> SingleColumnValueFilter.new(Bytes.toBytes('info'), > >> Bytes.toBytes('diagnosis'), > >> CompareFilter::CompareOp.valueOf('EQUAL'), > >> SubstringComparator.new('cardiac'))} > >> > >> Output = 50,000 row > >> > >> import org.apache.hadoop.hbase.filter.CompareFilter > >> import org.apache.hadoop.hbase.filter.SingleColumnValueFilter > >> import org.apache.hadoop.hbase.filter.SubstringComparator > >> import org.apache.hadoop.hbase.util.Bytes > >> > >> count 'patient', { COLUMNS => "info:diagnosis", FILTER => > >> SingleColumnValueFilter.new(Bytes.toBytes('info'), > >> Bytes.toBytes('diagnosis'), > >> CompareFilter::CompareOp.valueOf('EQUAL'), > >> SubstringComparator.new('cardiac'))} > >> Output = 100,000 row > >> > >> Even though I tried it using Hbase Java API, Aggregation Client Instance, > >> and I enabled the Coprocessor aggregation for the table. > >> rowCount = aggregationClient.rowCount(TABLE_NAME, null, scan) > >> > >> Also when measuring the improved performance on case of adding more nodes > >> the operation takes the same time. > >> > >> So any advice please? > >> > >> I have been throughout all this mess from a couple of weeks > >> > >> Thanks, > >> > >> > >> > >>
Rami Mankevich 2013-04-09, 17:51
First of all - thanks for the quick response. Basically threads I want to open are for my own internal structure updates and I guess have no relations to HBase internal structures. All I want is initiations for some asynchronous structure updates as part of coprocessor execution in order not to block user reponse. The only reason I was asking is to be sure Hbase will not kill those threads. As I understand - shouldn't be any issue with that. Am I correct? In addition - Is there any Hbase Thread pool I can use? Thanks From: Andrew Purtell [mailto:[EMAIL PROTECTED]] Sent: Tuesday, April 09, 2013 6:53 PM To: Rami Mankevich Cc: [EMAIL PROTECTED] Subject: Re: Hbase question Hi Rami, It is no problem to create threads in a coprocessor as a generic answer. More specifically there could be issues depending on exactly what you want to do, since coprocessor code changes HBase internals. Perhaps you could say a bit more. I also encourage you to ask this question on [EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]> so other contributors can chime in too. On Tuesday, April 9, 2013, Rami Mankevich wrote: Hey According to the Hbase documentation you are one of contrinuters to the HBase project I would like to raise some question when nobody can basically advice me: In context of coprocessors I want to raise some threads. Do you see any problems with that? Thanks This message and the information contained herein is proprietary and confidential and subject to the Amdocs policy statement, you may review at http://www.amdocs.com/email_disclaimer.asp-- Best regards, - Andy Problems worthy of attack prove their worth by hitting back. - Piet Hein (via Tom White)
|
|