Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
HBase >> mail # user >> Hbase Question


+
Dalia Sobhy 2012-12-23, 23:26
+
周梦想 2012-12-25, 02:31
+
yonghu 2012-12-28, 19:38
Copy link to this message
-
RE: Hbase Question

Dear yong,

How to
distribute my data in the cluster ? Note that I am using cloudera manager 4.1

Thanks in advance:D

> Date: Fri, 28 Dec 2012 20:38:22 +0100
> Subject: Re: Hbase Question
> From: [EMAIL PROTECTED]
> To: [EMAIL PROTECTED]
>
> I think you can take a look at your row-key design and evenly
> distribute your data in your cluster, as you mentioned even if you
> added more nodes, there was no improvement of performance. Maybe you
> have a node who is a hot spot, and the other nodes have no work to do.
>
> regards!
>
> Yong
>
> On Tue, Dec 25, 2012 at 3:31 AM, 周梦想 <[EMAIL PROTECTED]> wrote:
> > Hi Dalia,
> >
> > I think you can make a small sample of the table to do the test, then
> > you'll find what's the difference of scan and count.
> > because you can count it by human.
> >
> > Best regards,
> > Andy
> >
> > 2012/12/24 Dalia Sobhy <[EMAIL PROTECTED]>
> >
> >>
> >> Dear all,
> >>
> >> I have 50,000 row with diagnosis qualifier = "cardiac", and another 50,000
> >> rows with "renal".
> >>
> >> When I type this in Hbase shell,
> >>
> >> import org.apache.hadoop.hbase.filter.CompareFilter
> >> import org.apache.hadoop.hbase.filter.SingleColumnValueFilter
> >> import org.apache.hadoop.hbase.filter.SubstringComparator
> >> import org.apache.hadoop.hbase.util.Bytes
> >>
> >> scan 'patient', { COLUMNS => "info:diagnosis", FILTER =>
> >>     SingleColumnValueFilter.new(Bytes.toBytes('info'),
> >>          Bytes.toBytes('diagnosis'),
> >>          CompareFilter::CompareOp.valueOf('EQUAL'),
> >>          SubstringComparator.new('cardiac'))}
> >>
> >> Output = 50,000 row
> >>
> >> import org.apache.hadoop.hbase.filter.CompareFilter
> >> import org.apache.hadoop.hbase.filter.SingleColumnValueFilter
> >> import org.apache.hadoop.hbase.filter.SubstringComparator
> >> import org.apache.hadoop.hbase.util.Bytes
> >>
> >> count 'patient', { COLUMNS => "info:diagnosis", FILTER =>
> >>     SingleColumnValueFilter.new(Bytes.toBytes('info'),
> >>          Bytes.toBytes('diagnosis'),
> >>          CompareFilter::CompareOp.valueOf('EQUAL'),
> >>          SubstringComparator.new('cardiac'))}
> >> Output = 100,000 row
> >>
> >> Even though I tried it using Hbase Java API, Aggregation Client Instance,
> >> and I enabled the Coprocessor aggregation for the table.
> >> rowCount = aggregationClient.rowCount(TABLE_NAME, null, scan)
> >>
> >> Also when measuring the improved performance on case of adding more nodes
> >> the operation takes the same time.
> >>
> >> So any advice please?
> >>
> >> I have been throughout all this mess from a couple of weeks
> >>
> >> Thanks,
> >>
> >>
> >>
> >>
     
+
Rami Mankevich 2013-04-09, 17:51
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB