Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
HBase >> mail # user >> Poor HBase map-reduce scan performance


+
Bryan Keller 2013-05-01, 04:01
+
Ted Yu 2013-05-01, 04:17
+
Bryan Keller 2013-05-01, 04:31
+
Ted Yu 2013-05-01, 04:56
+
Bryan Keller 2013-05-01, 05:01
+
lars hofhansl 2013-05-01, 05:01
+
Bryan Keller 2013-05-01, 06:02
+
Michael Segel 2013-05-01, 14:24
+
lars hofhansl 2013-05-01, 06:21
+
Bryan Keller 2013-05-01, 15:00
+
Bryan Keller 2013-05-02, 01:01
+
lars hofhansl 2013-05-02, 04:41
+
Bryan Keller 2013-05-02, 04:49
+
Bryan Keller 2013-05-02, 17:54
+
Nicolas Liochon 2013-05-02, 18:00
+
lars hofhansl 2013-05-03, 00:46
+
Bryan Keller 2013-05-03, 07:17
+
Bryan Keller 2013-05-03, 10:44
+
lars hofhansl 2013-05-05, 01:33
+
Bryan Keller 2013-05-08, 17:15
+
Bryan Keller 2013-05-10, 15:46
+
Sandy Pratt 2013-05-22, 20:29
+
Ted Yu 2013-05-22, 20:39
+
Sandy Pratt 2013-05-22, 22:33
+
Ted Yu 2013-05-22, 22:57
+
Bryan Keller 2013-05-23, 15:45
+
Sandy Pratt 2013-05-23, 22:42
+
Ted Yu 2013-05-23, 22:47
+
Sandy Pratt 2013-06-05, 01:11
+
Sandy Pratt 2013-06-05, 08:09
+
yonghu 2013-06-05, 14:55
+
Ted Yu 2013-06-05, 16:12
+
yonghu 2013-06-05, 18:14
+
Sandy Pratt 2013-06-05, 18:57
+
Sandy Pratt 2013-06-05, 17:58
+
lars hofhansl 2013-06-06, 01:03
+
Bryan Keller 2013-06-25, 08:56
+
lars hofhansl 2013-06-28, 17:56
+
Bryan Keller 2013-07-01, 04:23
+
Ted Yu 2013-07-01, 04:32
+
lars hofhansl 2013-07-01, 10:59
+
Enis Söztutar 2013-07-01, 21:23
+
Bryan Keller 2013-07-01, 21:35
+
lars hofhansl 2013-05-25, 05:50
+
Enis Söztutar 2013-05-29, 20:29
+
Bryan Keller 2013-06-04, 17:01
+
Michael Segel 2013-05-06, 03:09
+
Matt Corgan 2013-05-01, 06:52
Copy link to this message
-
Re: Poor HBase map-reduce scan performance
@Lars, how have your calculated the 35K/row size? I'm not able to find the
same number.

@Bryan, Matt's idea below is good. With the hadoop test you always had data
locality. Which your HBase test, maybe not. Can you take a look at the JMX
console and tell us your locality % ? Also, over those 45 minutes, have you
monitored the CPWIO, GC activities, etc. to see if any of those might have
impacted the performances?

JM

2013/5/1 Matt Corgan <[EMAIL PROTECTED]>

> Not that it's a long-term solution, but try major-compacting before running
> the benchmark.  If the LSM tree is CPU bound in merging HFiles/KeyValues
> through the PriorityQueue, then reducing to a single file per region should
> help.  The merging of HFiles during a scan is not heavily optimized yet.
>
>
> On Tue, Apr 30, 2013 at 11:21 PM, lars hofhansl <[EMAIL PROTECTED]> wrote:
>
> > If you can, try 0.94.4+; it should significantly reduce the amount of
> > bytes copied around in RAM during scanning, especially if you have wide
> > rows and/or large key portions. That in turns makes scans scale better
> > across cores, since RAM is shared resource between cores (much like
> disk).
> >
> >
> > It's not hard to build the latest HBase against Cloudera's version of
> > Hadoop. I can send along a simple patch to pom.xml to do that.
> >
> > -- Lars
> >
> >
> >
> > ________________________________
> >  From: Bryan Keller <[EMAIL PROTECTED]>
> > To: [EMAIL PROTECTED]
> > Sent: Tuesday, April 30, 2013 11:02 PM
> > Subject: Re: Poor HBase map-reduce scan performance
> >
> >
> > The table has hashed keys so rows are evenly distributed amongst the
> > regionservers, and load on each regionserver is pretty much the same. I
> > also have per-table balancing turned on. I get mostly data local mappers
> > with only a few rack local (maybe 10 of the 250 mappers).
> >
> > Currently the table is a wide table schema, with lists of data structures
> > stored as columns with column prefixes grouping the data structures (e.g.
> > 1_name, 1_address, 1_city, 2_name, 2_address, 2_city). I was thinking of
> > moving those data structures to protobuf which would cut down on the
> number
> > of columns. The downside is I can't filter on one value with that, but it
> > is a tradeoff I would make for performance. I was also considering
> > restructuring the table into a tall table.
> >
> > Something interesting is that my old regionserver machines had five 15k
> > SCSI drives instead of 2 SSDs, and performance was about the same. Also,
> my
> > old network was 1gbit, now it is 10gbit. So neither network nor disk I/O
> > appear to be the bottleneck. The CPU is rather high for the regionserver
> so
> > it seems like the best candidate to investigate. I will try profiling it
> > tomorrow and will report back. I may revisit compression on vs off since
> > that is adding load to the CPU.
> >
> > I'll also come up with a sample program that generates data similar to my
> > table.
> >
> >
> > On Apr 30, 2013, at 10:01 PM, lars hofhansl <[EMAIL PROTECTED]> wrote:
> >
> > > Your average row is 35k so scanner caching would not make a huge
> > difference, although I would have expected some improvements by setting
> it
> > to 10 or 50 since you have a wide 10ge pipe.
> > >
> > > I assume your table is split sufficiently to touch all RegionServer...
> > Do you see the same load/IO on all region servers?
> > >
> > > A bunch of scan improvements went into HBase since 0.94.2.
> > > I blogged about some of these changes here:
> > http://hadoop-hbase.blogspot.com/2012/12/hbase-profiling.html
> > >
> > > In your case - since you have many columns, each of which carry the
> > rowkey - you might benefit a lot from HBASE-7279.
> > >
> > > In the end HBase *is* slower than straight HDFS for full scans. How
> > could it not be?
> > > So I would start by looking at HDFS first. Make sure Nagle's is
> disbaled
> > in both HBase and HDFS.
> > >
> > > And lastly SSDs are somewhat new territory for HBase. Maybe Andy
> Purtell
+
Bryan Keller 2013-05-01, 16:39
+
Naidu MS 2013-05-01, 07:25
+
ramkrishna vasudevan 2013-05-01, 07:27
+
ramkrishna vasudevan 2013-05-01, 07:29
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB