Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase, mail # user - Poor HBase map-reduce scan performance


Copy link to this message
-
Re: Poor HBase map-reduce scan performance
Enis Söztutar 2013-07-01, 21:23
Bryan,

3.6x improvement seems exciting. The ballpark difference between HBase scan
and hdfs scan is in that order, so it is expected I guess.

I plan to get back to the trunk patch, add more tests etc next week. In the
mean time, if you have any changes to the patch, pls attach the patch.

Enis
On Mon, Jul 1, 2013 at 3:59 AM, lars hofhansl <[EMAIL PROTECTED]> wrote:

> Absolutely.
>
>
>
> ----- Original Message -----
> From: Ted Yu <[EMAIL PROTECTED]>
> To: [EMAIL PROTECTED]
> Cc:
> Sent: Sunday, June 30, 2013 9:32 PM
> Subject: Re: Poor HBase map-reduce scan performance
>
> Looking at the tail of HBASE-8369, there were some comments which are yet
> to be addressed.
>
> I think trunk patch should be finalized before backporting.
>
> Cheers
>
> On Mon, Jul 1, 2013 at 12:23 PM, Bryan Keller <[EMAIL PROTECTED]> wrote:
>
> > I'll attach my patch to HBASE-8369 tomorrow.
> >
> > On Jun 28, 2013, at 10:56 AM, lars hofhansl <[EMAIL PROTECTED]> wrote:
> >
> > > If we can make a clean patch with minimal impact to existing code I
> > would be supportive of a backport to 0.94.
> > >
> > > -- Lars
> > >
> > >
> > >
> > > ----- Original Message -----
> > > From: Bryan Keller <[EMAIL PROTECTED]>
> > > To: [EMAIL PROTECTED]; lars hofhansl <[EMAIL PROTECTED]>
> > > Cc:
> > > Sent: Tuesday, June 25, 2013 1:56 AM
> > > Subject: Re: Poor HBase map-reduce scan performance
> > >
> > > I tweaked Enis's snapshot input format and backported it to 0.94.6 and
> > have snapshot scanning functional on my system. Performance is
> dramatically
> > better, as expected i suppose. I'm seeing about 3.6x faster performance
> vs
> > TableInputFormat. Also, HBase doesn't get bogged down during a scan as
> the
> > regionserver is being bypassed. I'm very excited by this. There are some
> > issues with file permissions and library dependencies but nothing that
> > can't be worked out.
> > >
> > > On Jun 5, 2013, at 6:03 PM, lars hofhansl <[EMAIL PROTECTED]> wrote:
> > >
> > >> That's exactly the kind of pre-fetching I was investigating a bit ago
> > (made a patch, but ran out of time).
> > >> This pre-fetching is strictly client only, where the client keeps the
> > server busy while it is processing the previous batch, but filling up a
> 2nd
> > buffer.
> > >>
> > >>
> > >> -- Lars
> > >>
> > >>
> > >>
> > >> ________________________________
> > >> From: Sandy Pratt <[EMAIL PROTECTED]>
> > >> To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>
> > >> Sent: Wednesday, June 5, 2013 10:58 AM
> > >> Subject: Re: Poor HBase map-reduce scan performance
> > >>
> > >>
> > >> Yong,
> > >>
> > >> As a thought experiment, imagine how it impacts the throughput of TCP
> to
> > >> keep the window size at 1.  That means there's only one packet in
> flight
> > >> at a time, and total throughput is a fraction of what it could be.
> > >>
> > >> That's effectively what happens with RPC.  The server sends a batch,
> > then
> > >> does nothing while it waits for the client to ask for more.  During
> that
> > >> time, the pipe between them is empty.  Increasing the batch size can
> > help
> > >> a bit, in essence creating a really huge packet, but the problem
> > remains.
> > >> There will always be stalls in the pipe.
> > >>
> > >> What you want is for the window size to be large enough that the pipe
> is
> > >> saturated.  A streaming API accomplishes that by stuffing data down
> the
> > >> network pipe as quickly as possible.
> > >>
> > >> Sandy
> > >>
> > >> On 6/5/13 7:55 AM, "yonghu" <[EMAIL PROTECTED]> wrote:
> > >>
> > >>> Can anyone explain why client + rpc + server will decrease the
> > performance
> > >>> of scanning? I mean the Regionserver and Tasktracker are the same
> node
> > >>> when
> > >>> you use MapReduce to scan the HBase table. So, in my understanding,
> > there
> > >>> will be no rpc cost.
> > >>>
> > >>> Thanks!
> > >>>
> > >>> Yong
> > >>>
> > >>>
> > >>> On Wed, Jun 5, 2013 at 10:09 AM, Sandy Pratt <[EMAIL PROTECTED]>
> > wrote:
> > >>>
> > >>>> https://issues.apache.org/jira/browse/HBASE-8691