Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> Poor HBase map-reduce scan performance


Copy link to this message
-
Re: Poor HBase map-reduce scan performance
You can try Yourkit, they have evaluation licenses. There is one gotcha:
some classes are excluded by default, and this includes org.apache.* . So
you need to change the default config when using it with HBase.
On Thu, May 2, 2013 at 7:54 PM, Bryan Keller <[EMAIL PROTECTED]> wrote:

> I ran one of my regionservers through VisualVM. It looks like the top hot
> spots are HFileReaderV2$ScannerV2.getKeyValue() and ByteBuffer.allocate().
> It appears at first glance that memory allocations may be an issue.
> Decompression was next below that but less of an issue it seems.
>
> Would changing the block size, either HDFS or HBase, help here?
>
> Also, if anyone has tips on how else to profile, that would be
> appreciated. VisualVM can produce a lot of noise that is hard to sift
> through.
>
>
> On May 1, 2013, at 9:49 PM, Bryan Keller <[EMAIL PROTECTED]> wrote:
>
> > I used exactly 0.94.4, pulled from the tag in subversion.
> >
> > On May 1, 2013, at 9:41 PM, lars hofhansl <[EMAIL PROTECTED]> wrote:
> >
> >> Hmm... Did you actually use exactly version 0.94.4, or the latest
> 0.94.7.
> >> I would be very curious to see profiling data.
> >>
> >> -- Lars
> >>
> >>
> >>
> >> ----- Original Message -----
> >> From: Bryan Keller <[EMAIL PROTECTED]>
> >> To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>
> >> Cc:
> >> Sent: Wednesday, May 1, 2013 6:01 PM
> >> Subject: Re: Poor HBase map-reduce scan performance
> >>
> >> I tried running my test with 0.94.4, unfortunately performance was
> about the same. I'm planning on profiling the regionserver and trying some
> other things tonight and tomorrow and will report back.
> >>
> >> On May 1, 2013, at 8:00 AM, Bryan Keller <[EMAIL PROTECTED]> wrote:
> >>
> >>> Yes I would like to try this, if you can point me to the pom.xml patch
> that would save me some time.
> >>>
> >>> On Tuesday, April 30, 2013, lars hofhansl wrote:
> >>> If you can, try 0.94.4+; it should significantly reduce the amount of
> bytes copied around in RAM during scanning, especially if you have wide
> rows and/or large key portions. That in turns makes scans scale better
> across cores, since RAM is shared resource between cores (much like disk).
> >>>
> >>>
> >>> It's not hard to build the latest HBase against Cloudera's version of
> Hadoop. I can send along a simple patch to pom.xml to do that.
> >>>
> >>> -- Lars
> >>>
> >>>
> >>>
> >>> ________________________________
> >>>  From: Bryan Keller <[EMAIL PROTECTED]>
> >>> To: [EMAIL PROTECTED]
> >>> Sent: Tuesday, April 30, 2013 11:02 PM
> >>> Subject: Re: Poor HBase map-reduce scan performance
> >>>
> >>>
> >>> The table has hashed keys so rows are evenly distributed amongst the
> regionservers, and load on each regionserver is pretty much the same. I
> also have per-table balancing turned on. I get mostly data local mappers
> with only a few rack local (maybe 10 of the 250 mappers).
> >>>
> >>> Currently the table is a wide table schema, with lists of data
> structures stored as columns with column prefixes grouping the data
> structures (e.g. 1_name, 1_address, 1_city, 2_name, 2_address, 2_city). I
> was thinking of moving those data structures to protobuf which would cut
> down on the number of columns. The downside is I can't filter on one value
> with that, but it is a tradeoff I would make for performance. I was also
> considering restructuring the table into a tall table.
> >>>
> >>> Something interesting is that my old regionserver machines had five
> 15k SCSI drives instead of 2 SSDs, and performance was about the same.
> Also, my old network was 1gbit, now it is 10gbit. So neither network nor
> disk I/O appear to be the bottleneck. The CPU is rather high for the
> regionserver so it seems like the best candidate to investigate. I will try
> profiling it tomorrow and will report back. I may revisit compression on vs
> off since that is adding load to the CPU.
> >>>
> >>> I'll also come up with a sample program that generates data similar to
> my table.
> >>>
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB