Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> Poor HBase map-reduce scan performance


Copy link to this message
-
Re: Poor HBase map-reduce scan performance
It seems to be in the ballpark of what I was getting at, but I haven't
fully digested the code yet, so I can't say for sure.

Here's what I'm getting at.  Looking at
o.a.h.h.client.ClientScanner.next() in the 94.2 source I have loaded, I
see there are three branches with respect to the cache:

public Result next() throws IOException {
  // If the scanner is closed and there's nothing left in the cache, next
is a no-op.
  if (cache.size() == 0 && this.closed) {
    return null;
  }

  if (cache.size() == 0) {
// Request more results from RS
  ...
  }

  if (cache.size() > 0) {
    return cache.poll();
  }

  ...
  return null;

}
I think that middle branch wants to change as follows (pseudo-code):

if the cache size is below a certain threshold then
  initiate asynchronous action to refill it
  if there is no result to return until the cache refill completes then
    block
  done
done

Or something along those lines.  I haven't grokked the patch well enough
yet to tell if that's what it does.  What I think is happening in the
0.94.2 code I've got is that it requests nothing until the cache is empty,
then blocks until it's non-empty.  We want to eagerly and asynchronously
refill the cache so that we ideally never have to block.
Sandy
On 5/22/13 1:39 PM, "Ted Yu" <[EMAIL PROTECTED]> wrote:

>Sandy:
>Do you think the following JIRA would help with what you expect in this
>regard ?
>
>HBASE-8420 Port HBASE-6874 Implement prefetching for scanners from 0.89-fb
>
>Cheers
>
>On Wed, May 22, 2013 at 1:29 PM, Sandy Pratt <[EMAIL PROTECTED]> wrote:
>
>> I found this thread on search-hadoop.com just now because I've been
>> wrestling with the same issue for a while and have as yet been unable to
>> solve it.  However, I think I have an idea of the problem.  My theory is
>> based on assumptions about what's going on in HBase and HDFS internally,
>> so please correct me if I'm wrong.
>>
>> Briefly, I think the issue is that sequential reads from HDFS are
>> pipelined, whereas sequential reads from HBase are not.  Therefore,
>> sequential reads from HDFS tend to keep the IO subsystem saturated,
>>while
>> sequential reads from HBase allow it to idle for a relatively large
>> proportion of time.
>>
>> To make this more concrete, suppose that I'm reading N bytes of data
>>from
>> a file in HDFS.  I issue the calls to open the file and begin to read
>> (from an InputStream, for example).  As I'm reading byte 1 of the stream
>> at my client, the datanode is reading byte M where 1 < M <= N from disk.
>> Thus, three activities tend to happen concurrently for the most part
>> (disregarding the beginning and end of the file): 1) processing at the
>> client; 2) streaming over the network from datanode to client; and 3)
>> reading data from disk at the datanode.  The proportion of time these
>> three activities overlap tends towards 100% as N -> infinity.
>>
>> Now suppose I read a batch of R records from HBase (where R = whatever
>> scanner caching happens to be).  As I understand it, I issue my call to
>> ResultScanner.next(), and this causes the RegionServer to block as if
>>on a
>> page fault while it loads enough HFile blocks from disk to cover the R
>> records I (implicitly) requested.  After the blocks are loaded into the
>> block cache on the RS, the RS returns R records to me over the network.
>> Then I process the R records locally.  When they are exhausted, this
>>cycle
>> repeats.  The notable upshot is that while the RS is faulting HFile
>>blocks
>> into the cache, my client is blocked.  Furthermore, while my client is
>> processing records, the RS is idle with respect to work on behalf of my
>> client.
>>
>> That last point is really the killer, if I'm correct in my assumptions.
>> It means that Scanner caching and larger block sizes work only to
>>amortize
>> the fixed overhead of disk IOs and RPCs -- they do nothing to keep the
>>IO
>> subsystems saturated during sequential reads.  What *should* happen is
>> that the RS should treat the Scanner caching value (R above) as a hint