Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
HBase >> mail # user >> Poor HBase map-reduce scan performance


+
Bryan Keller 2013-05-01, 04:01
+
Ted Yu 2013-05-01, 04:17
+
Bryan Keller 2013-05-01, 04:31
+
Ted Yu 2013-05-01, 04:56
+
Bryan Keller 2013-05-01, 05:01
+
lars hofhansl 2013-05-01, 05:01
+
Bryan Keller 2013-05-01, 06:02
+
Michael Segel 2013-05-01, 14:24
+
lars hofhansl 2013-05-01, 06:21
+
Bryan Keller 2013-05-01, 15:00
+
Bryan Keller 2013-05-02, 01:01
+
lars hofhansl 2013-05-02, 04:41
+
Bryan Keller 2013-05-02, 04:49
+
Bryan Keller 2013-05-02, 17:54
+
Nicolas Liochon 2013-05-02, 18:00
+
lars hofhansl 2013-05-03, 00:46
+
Bryan Keller 2013-05-03, 07:17
+
Bryan Keller 2013-05-03, 10:44
+
lars hofhansl 2013-05-05, 01:33
+
Bryan Keller 2013-05-08, 17:15
+
Bryan Keller 2013-05-10, 15:46
+
Sandy Pratt 2013-05-22, 20:29
+
Ted Yu 2013-05-22, 20:39
+
Sandy Pratt 2013-05-22, 22:33
+
Ted Yu 2013-05-22, 22:57
+
Bryan Keller 2013-05-23, 15:45
+
Sandy Pratt 2013-05-23, 22:42
+
Ted Yu 2013-05-23, 22:47
+
Sandy Pratt 2013-06-05, 01:11
+
Sandy Pratt 2013-06-05, 08:09
+
yonghu 2013-06-05, 14:55
+
Ted Yu 2013-06-05, 16:12
+
yonghu 2013-06-05, 18:14
+
Sandy Pratt 2013-06-05, 18:57
+
Sandy Pratt 2013-06-05, 17:58
+
lars hofhansl 2013-06-06, 01:03
+
Bryan Keller 2013-06-25, 08:56
+
lars hofhansl 2013-06-28, 17:56
+
Bryan Keller 2013-07-01, 04:23
+
Ted Yu 2013-07-01, 04:32
+
lars hofhansl 2013-07-01, 10:59
+
Enis Söztutar 2013-07-01, 21:23
+
Bryan Keller 2013-07-01, 21:35
+
lars hofhansl 2013-05-25, 05:50
+
Enis Söztutar 2013-05-29, 20:29
Copy link to this message
-
Re: Poor HBase map-reduce scan performance
Thanks Enis, I'll see if I can backport this patch - it is exactly what I was going to try. This should solve my scan performance problems if I can get it to work.

On May 29, 2013, at 1:29 PM, Enis Söztutar <[EMAIL PROTECTED]> wrote:

> Hi,
>
> Regarding running raw scans on top of Hfiles, you can try a version of the
> patch attached at https://issues.apache.org/jira/browse/HBASE-8369, which
> enables exactly this. However, the patch is for trunk.
>
> In that, we open one region from snapshot files in each record reader, and
> run a scan through using an internal region scanner. Since this bypasses
> the client + rpc + server daemon layers, it should be able to give optimum
> scan performance.
>
> There is also a tool called HFilePerformanceBenchmark that intends to
> measure raw performance for HFiles. I've had to do a lot of changes to make
> is workable, but it might be worth to take a look to see whether there is
> any perf difference between scanning a sequence file from hdfs vs scanning
> an hfile.
>
> Enis
>
>
> On Fri, May 24, 2013 at 10:50 PM, lars hofhansl <[EMAIL PROTECTED]> wrote:
>
>> Sorry. Haven't gotten to this, yet.
>>
>> Scanning in HBase being about 3x slower than straight HDFS is in the right
>> ballpark, though. It has to a bit more work.
>>
>> Generally, HBase is great at honing in to a subset (some 10-100m rows) of
>> the data. Raw scan performance is not (yet) a strength of HBase.
>>
>> So with HDFS you get to 75% of the theoretical maximum read throughput;
>> hence with HBase you to 25% of the theoretical cluster wide maximum disk
>> throughput?
>>
>>
>> -- Lars
>>
>>
>>
>> ----- Original Message -----
>> From: Bryan Keller <[EMAIL PROTECTED]>
>> To: [EMAIL PROTECTED]
>> Cc:
>> Sent: Friday, May 10, 2013 8:46 AM
>> Subject: Re: Poor HBase map-reduce scan performance
>>
>> FYI, I ran tests with compression on and off.
>>
>> With a plain HDFS sequence file and compression off, I am getting very
>> good I/O numbers, roughly 75% of theoretical max for reads. With snappy
>> compression on with a sequence file, I/O speed is about 3x slower. However
>> the file size is 3x smaller so it takes about the same time to scan.
>>
>> With HBase, the results are equivalent (just much slower than a sequence
>> file). Scanning a compressed table is about 3x slower I/O than an
>> uncompressed table, but the table is 3x smaller, so the time to scan is
>> about the same. Scanning an HBase table takes about 3x as long as scanning
>> the sequence file export of the table, either compressed or uncompressed.
>> The sequence file export file size ends up being just barely larger than
>> the table, either compressed or uncompressed
>>
>> So in sum, compression slows down I/O 3x, but the file is 3x smaller so
>> the time to scan is about the same. Adding in HBase slows things down
>> another 3x. So I'm seeing 9x faster I/O scanning an uncompressed sequence
>> file vs scanning a compressed table.
>>
>>
>> On May 8, 2013, at 10:15 AM, Bryan Keller <[EMAIL PROTECTED]> wrote:
>>
>>> Thanks for the offer Lars! I haven't made much progress speeding things
>> up.
>>>
>>> I finally put together a test program that populates a table that is
>> similar to my production dataset. I have a readme that should describe
>> things, hopefully enough to make it useable. There is code to populate a
>> test table, code to scan the table, and code to scan sequence files from an
>> export (to compare HBase w/ raw HDFS). I use a gradle build script.
>>>
>>> You can find the code here:
>>>
>>> https://dl.dropboxusercontent.com/u/6880177/hbasetest.zip
>>>
>>>
>>> On May 4, 2013, at 6:33 PM, lars hofhansl <[EMAIL PROTECTED]> wrote:
>>>
>>>> The blockbuffers are not reused, but that by itself should not be a
>> problem as they are all the same size (at least I have never identified
>> that as one in my profiling sessions).
>>>>
>>>> My offer still stands to do some profiling myself if there is an easy
>> way to generate data of similar shape.
+
Michael Segel 2013-05-06, 03:09
+
Matt Corgan 2013-05-01, 06:52
+
Jean-Marc Spaggiari 2013-05-01, 10:56
+
Bryan Keller 2013-05-01, 16:39
+
Naidu MS 2013-05-01, 07:25
+
ramkrishna vasudevan 2013-05-01, 07:27
+
ramkrishna vasudevan 2013-05-01, 07:29