Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> Hbase Performance Issue


Copy link to this message
-
Re: Hbase Performance Issue
Could you give us a region server log to look at during a job?
On Jan 4, 2014 4:35 PM, "Akhtar Muhammad Din" <[EMAIL PROTECTED]> wrote:

> Thanks guys for your precious time.
> Vladimir, as Ted rightly said i want to improve write performance currently
> (of course i want to read data as fast as possible later on)
> Kevin, my current understanding of bulk load is that you generate
> StoreFiles and later load through a command line program. I dont want to do
> any manual step. Our system is getting data after every 15 minutes, so
> requirement is to automate it through client API completely.
>
>
>
> On Sun, Jan 5, 2014 at 2:19 AM, Kevin O'dell <[EMAIL PROTECTED]
> >wrote:
>
> > Have you tried writing out an hfile and then bulk loading the data?
> > On Jan 4, 2014 4:01 PM, "Ted Yu" <[EMAIL PROTECTED]> wrote:
> >
> > > bq. Output is written to either Hbase
> > >
> > > Looks like Akhtar wants to boost write performance to HBase.
> > > MapReduce over snapshot files targets higher read throughput.
> > >
> > > Cheers
> > >
> > >
> > > On Sat, Jan 4, 2014 at 12:55 PM, Vladimir Rodionov
> > > <[EMAIL PROTECTED]>wrote:
> > >
> > > > You cay try MapReduce over snapshot files
> > > > https://issues.apache.org/jira/browse/HBASE-8369
> > > >
> > > > but you will need to patch 0.94.
> > > >
> > > > Best regards,
> > > > Vladimir Rodionov
> > > > Principal Platform Engineer
> > > > Carrier IQ, www.carrieriq.com
> > > > e-mail: [EMAIL PROTECTED]
> > > >
> > > > ________________________________________
> > > > From: Akhtar Muhammad Din [[EMAIL PROTECTED]]
> > > > Sent: Saturday, January 04, 2014 12:44 PM
> > > > To: [EMAIL PROTECTED]
> > > > Subject: Re: Hbase Performance Issue
> > > >
> > > > im  using CDH 4.5:
> > > > Hadoop:  2.0.0-cdh4.5.0
> > > > HBase:   0.94.6-cdh4.5.0
> > > >
> > > > Regards
> > > >
> > > >
> > > > On Sun, Jan 5, 2014 at 1:24 AM, Ted Yu <[EMAIL PROTECTED]> wrote:
> > > >
> > > > > What version of HBase / hdfs are you running with ?
> > > > >
> > > > > Cheers
> > > > >
> > > > >
> > > > >
> > > > > On Sat, Jan 4, 2014 at 12:17 PM, Akhtar Muhammad Din
> > > > > <[EMAIL PROTECTED]>wrote:
> > > > >
> > > > > > Hi,
> > > > > > I have been running a map reduce job that joins 2 datasets of 1.3
> > > and 4
> > > > > GB
> > > > > > in size. Joining is done at reduce side. Output is written to
> > either
> > > > > Hbase
> > > > > > or HDFS depending upon configuration. The problem I am having is
> > that
> > > > > Hbase
> > > > > > takes about 60-80 minutes to write the processed data, on the
> other
> > > > hand
> > > > > > HDFS takes only 3-5 mins to write the same data. I really want to
> > > > improve
> > > > > > the Hbase speed and bring it down to 1-2 min.
> > > > > >
> > > > > > I am using amazon EC2 instances, launched a cluster of size 3 and
> > > later
> > > > > 10,
> > > > > > have tried both c3.4xlarge and c3.8xlarge instances.
> > > > > >
> > > > > > I can see significant increase in performance while writing to
> HDFS
> > > as
> > > > i
> > > > > > use cluster with more nodes, having high specifications, but in
> the
> > > > case
> > > > > of
> > > > > > Hbase there was no significant change in performance.
> > > > > >
> > > > > > I have been going through different posts, articles and have read
> > > Hbase
> > > > > > book to solve the Hbase performance issue but have not been able
> to
> > > > > succeed
> > > > > > so far.
> > > > > > Here are the few things i have tried out so far:
> > > > > >
> > > > > > *Client Side*
> > > > > > - Turned off writing to WAL
> > > > > > - Experimented with write buffer size
> > > > > > - Turned off auto flush on table
> > > > > > - Used cache, experimented with different sizes
> > > > > >
> > > > > >
> > > > > > *Hbase Server Side*
> > > > > > - Increased region servers heap size to 8 GB
> > > > > > - Experimented with handlers count
> > > > > > - Increased Memstore flush size to 512 MB
> > > > > > - Experimented with hbase.hregion.max.filesize, tried different
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB