Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
HBase, mail # user - Hbase Question


+
Dalia Sobhy 2012-12-23, 23:26
+
周梦想 2012-12-25, 02:31
+
yonghu 2012-12-28, 19:38
Copy link to this message
-
RE: Hbase Question
Dalia Sobhy 2013-01-01, 21:40

Dear yong,

How to
distribute my data in the cluster ? Note that I am using cloudera manager 4.1

Thanks in advance:D

> Date: Fri, 28 Dec 2012 20:38:22 +0100
> Subject: Re: Hbase Question
> From: [EMAIL PROTECTED]
> To: [EMAIL PROTECTED]
>
> I think you can take a look at your row-key design and evenly
> distribute your data in your cluster, as you mentioned even if you
> added more nodes, there was no improvement of performance. Maybe you
> have a node who is a hot spot, and the other nodes have no work to do.
>
> regards!
>
> Yong
>
> On Tue, Dec 25, 2012 at 3:31 AM, 周梦想 <[EMAIL PROTECTED]> wrote:
> > Hi Dalia,
> >
> > I think you can make a small sample of the table to do the test, then
> > you'll find what's the difference of scan and count.
> > because you can count it by human.
> >
> > Best regards,
> > Andy
> >
> > 2012/12/24 Dalia Sobhy <[EMAIL PROTECTED]>
> >
> >>
> >> Dear all,
> >>
> >> I have 50,000 row with diagnosis qualifier = "cardiac", and another 50,000
> >> rows with "renal".
> >>
> >> When I type this in Hbase shell,
> >>
> >> import org.apache.hadoop.hbase.filter.CompareFilter
> >> import org.apache.hadoop.hbase.filter.SingleColumnValueFilter
> >> import org.apache.hadoop.hbase.filter.SubstringComparator
> >> import org.apache.hadoop.hbase.util.Bytes
> >>
> >> scan 'patient', { COLUMNS => "info:diagnosis", FILTER =>
> >>     SingleColumnValueFilter.new(Bytes.toBytes('info'),
> >>          Bytes.toBytes('diagnosis'),
> >>          CompareFilter::CompareOp.valueOf('EQUAL'),
> >>          SubstringComparator.new('cardiac'))}
> >>
> >> Output = 50,000 row
> >>
> >> import org.apache.hadoop.hbase.filter.CompareFilter
> >> import org.apache.hadoop.hbase.filter.SingleColumnValueFilter
> >> import org.apache.hadoop.hbase.filter.SubstringComparator
> >> import org.apache.hadoop.hbase.util.Bytes
> >>
> >> count 'patient', { COLUMNS => "info:diagnosis", FILTER =>
> >>     SingleColumnValueFilter.new(Bytes.toBytes('info'),
> >>          Bytes.toBytes('diagnosis'),
> >>          CompareFilter::CompareOp.valueOf('EQUAL'),
> >>          SubstringComparator.new('cardiac'))}
> >> Output = 100,000 row
> >>
> >> Even though I tried it using Hbase Java API, Aggregation Client Instance,
> >> and I enabled the Coprocessor aggregation for the table.
> >> rowCount = aggregationClient.rowCount(TABLE_NAME, null, scan)
> >>
> >> Also when measuring the improved performance on case of adding more nodes
> >> the operation takes the same time.
> >>
> >> So any advice please?
> >>
> >> I have been throughout all this mess from a couple of weeks
> >>
> >> Thanks,
> >>
> >>
> >>
> >>
     
+
Rami Mankevich 2013-04-09, 17:51