Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
HBase >> mail # user >> Poor HBase map-reduce scan performance


+
Bryan Keller 2013-05-01, 04:01
+
Ted Yu 2013-05-01, 04:17
+
Bryan Keller 2013-05-01, 04:31
+
Ted Yu 2013-05-01, 04:56
+
Bryan Keller 2013-05-01, 05:01
+
lars hofhansl 2013-05-01, 05:01
+
Bryan Keller 2013-05-01, 06:02
+
Michael Segel 2013-05-01, 14:24
+
lars hofhansl 2013-05-01, 06:21
+
Bryan Keller 2013-05-01, 15:00
+
Bryan Keller 2013-05-02, 01:01
+
lars hofhansl 2013-05-02, 04:41
+
Bryan Keller 2013-05-02, 04:49
+
Bryan Keller 2013-05-02, 17:54
+
Nicolas Liochon 2013-05-02, 18:00
+
lars hofhansl 2013-05-03, 00:46
+
Bryan Keller 2013-05-03, 07:17
+
Bryan Keller 2013-05-03, 10:44
+
lars hofhansl 2013-05-05, 01:33
+
Bryan Keller 2013-05-08, 17:15
+
Bryan Keller 2013-05-10, 15:46
+
Sandy Pratt 2013-05-22, 20:29
+
Ted Yu 2013-05-22, 20:39
+
Sandy Pratt 2013-05-22, 22:33
+
Ted Yu 2013-05-22, 22:57
+
Bryan Keller 2013-05-23, 15:45
+
Sandy Pratt 2013-05-23, 22:42
+
Ted Yu 2013-05-23, 22:47
+
Sandy Pratt 2013-06-05, 01:11
+
Sandy Pratt 2013-06-05, 08:09
+
yonghu 2013-06-05, 14:55
+
Ted Yu 2013-06-05, 16:12
+
yonghu 2013-06-05, 18:14
+
Sandy Pratt 2013-06-05, 18:57
+
Sandy Pratt 2013-06-05, 17:58
+
lars hofhansl 2013-06-06, 01:03
+
Bryan Keller 2013-06-25, 08:56
+
lars hofhansl 2013-06-28, 17:56
+
Bryan Keller 2013-07-01, 04:23
+
Ted Yu 2013-07-01, 04:32
+
lars hofhansl 2013-07-01, 10:59
+
Enis Söztutar 2013-07-01, 21:23
+
Bryan Keller 2013-07-01, 21:35
+
lars hofhansl 2013-05-25, 05:50
+
Enis Söztutar 2013-05-29, 20:29
+
Bryan Keller 2013-06-04, 17:01
Copy link to this message
-
Re: Poor HBase map-reduce scan performance
You really don't want to mess around with the block size.

Sure larger blocks are better for sequential scans, but the minute you do a lot of random ad hoc fetches... you're kinda screwed.
On May 3, 2013, at 2:17 AM, Bryan Keller <[EMAIL PROTECTED]> wrote:

> I finally made some progress. I tried a very large HBase block size (16mb), and it significantly improved scan performance. I went from 45-50 min to 24 min. Not great but much better. Before I had it set to 128k. Scanning an equivalent sequence file takes 10 min. My random read performance will probably suffer with such a large block size (theoretically), so I probably can't keep it this big. I care about random read performance too. I've read having a block size this big is not recommended, is that correct?
>
> I haven't dug too deeply into the code, are the block buffers reused or is each new block read a new allocation? Perhaps a buffer pool could help here if there isn't one already. When doing a scan, HBase could reuse previously allocated block buffers instead of allocating a new one for each block. Then block size shouldn't affect scan performance much.
>
> I'm not using a block encoder. Also, I'm still sifting through the profiler results, I'll see if I can make more sense of it and run some more experiments.
>
> On May 2, 2013, at 5:46 PM, lars hofhansl <[EMAIL PROTECTED]> wrote:
>
>> Interesting. If you can try 0.94.7 (but it'll probably not have changed that much from 0.94.4)
>>
>>
>> Do you have enabled one of the block encoders (FAST_DIFF, etc)? If so, try without. They currently need to reallocate a ByteBuffer for each single KV.
>> (Sine you see ScannerV2 rather than EncodedScannerV2 you probably have not enabled encoding, just checking).
>>
>>
>> And do you have a stack trace for the ByteBuffer.allocate(). That is a strange one since it never came up in my profiling (unless you enabled block encoding).
>> (You can get traces from VisualVM by creating a snapshot, but you'd have to drill in to find the allocate()).
>>
>>
>> During normal scanning (again, without encoding) there should be no allocation happening except for blocks read from disk (and they should all be the same size, thus allocation should be cheap).
>>
>> -- Lars
>>
>>
>>
>> ________________________________
>> From: Bryan Keller <[EMAIL PROTECTED]>
>> To: [EMAIL PROTECTED]
>> Sent: Thursday, May 2, 2013 10:54 AM
>> Subject: Re: Poor HBase map-reduce scan performance
>>
>>
>> I ran one of my regionservers through VisualVM. It looks like the top hot spots are HFileReaderV2$ScannerV2.getKeyValue() and ByteBuffer.allocate(). It appears at first glance that memory allocations may be an issue. Decompression was next below that but less of an issue it seems.
>>
>> Would changing the block size, either HDFS or HBase, help here?
>>
>> Also, if anyone has tips on how else to profile, that would be appreciated. VisualVM can produce a lot of noise that is hard to sift through.
>>
>>
>> On May 1, 2013, at 9:49 PM, Bryan Keller <[EMAIL PROTECTED]> wrote:
>>
>>> I used exactly 0.94.4, pulled from the tag in subversion.
>>>
>>> On May 1, 2013, at 9:41 PM, lars hofhansl <[EMAIL PROTECTED]> wrote:
>>>
>>>> Hmm... Did you actually use exactly version 0.94.4, or the latest 0.94.7.
>>>> I would be very curious to see profiling data.
>>>>
>>>> -- Lars
>>>>
>>>>
>>>>
>>>> ----- Original Message -----
>>>> From: Bryan Keller <[EMAIL PROTECTED]>
>>>> To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>
>>>> Cc:
>>>> Sent: Wednesday, May 1, 2013 6:01 PM
>>>> Subject: Re: Poor HBase map-reduce scan performance
>>>>
>>>> I tried running my test with 0.94.4, unfortunately performance was about the same. I'm planning on profiling the regionserver and trying some other things tonight and tomorrow and will report back.
>>>>
>>>> On May 1, 2013, at 8:00 AM, Bryan Keller <[EMAIL PROTECTED]> wrote:
>>>>
>>>>> Yes I would like to try this, if you can point me to the pom.xml patch that would save me some time.

The opinions expressed here are mine, while they may reflect a cognitive thought, that is purely accidental.
Use at your own risk.
Michael Segel
michael_segel (AT) hotmail.com
+
Matt Corgan 2013-05-01, 06:52
+
Jean-Marc Spaggiari 2013-05-01, 10:56
+
Bryan Keller 2013-05-01, 16:39
+
Naidu MS 2013-05-01, 07:25
+
ramkrishna vasudevan 2013-05-01, 07:27
+
ramkrishna vasudevan 2013-05-01, 07:29