Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase, mail # user - HBase cluster with heterogeneous resources


Copy link to this message
-
Re: HBase cluster with heterogeneous resources
Andrey Stepachev 2010-10-17, 07:59
https://issues.apache.org/jira/browse/HDFS-236

How bad is HDFS random access?

   - Random access in HDFS always seemed to have bad PR though hardly anyone
   used the interface. Claims/rumours range from "transfers a lot of excess
   data" (not true) to "we noticed it is 10 times slower than our non-hdfs app"
   (hard to see how if the app is I/O bound and/or is doing at least semi
   random reads).
   - It was good see HBase successfully used the interface for its speed up.
   It can not achieve competitive performance with out reasonable random access
   performance in HDFS (for HFile).
2010/10/16 William Kang <[EMAIL PROTECTED]>

> HDFS blocks are streaming files, which means you cannot random access
> those HDFS blocks quickly like other file systems. So that means if
> your HBase block is in the middle of a HDFS block, you have to
> traverse inside it to get to the middle. Right?
>
> Can somebody explain how HBase manage to fetch the HBase 64k block
> from the HDFS 64M block fast?
>
> On Sat, Oct 16, 2010 at 2:27 PM, Matt Corgan <[EMAIL PROTECTED]> wrote:
> > I could be wrong, but I don't think there's any performance benefit to
> > having a small hdfs block size.  If you are doing a random read fetching
> a
> > 1KB cell out of an HFile, it will not pull the entire 64MB hdfs block
> from
> > hdfs, it plucks only the small section of the hdfs file/block that
> contains
> > the HFile index and then the appropriate 64KB hbase block.  Maybe someone
> > more knowledgeable could elaborate on the exact number and size of hdfs
> > accesses.
> >
> >
> > On Sat, Oct 16, 2010 at 2:10 PM, Abhijit Pol <[EMAIL PROTECTED]>
> wrote:
> >
> >> >
> >> >
> >> > If this is your setup, your HDFS' namenode is bound to OOM soon.
> >> > (Namenode's
> >> > memory consumption is proportional to the number of blocks on HDFS)
> >> >
> >> >
> >> NN runs on master and we have 4GB for NN and that is good for long time
> >> given amount of blocks we have. DN has 1GB, TT 512MB and JT 1GB.
> >>
> >>
> >>
> >> > I guess you meant "hfile.min.blocksize.size" in ? That is a different
> >> > parameter from HDFS' block size, IMO. (need someone to confirm)
> >> >
> >> >
> >> yes, HBase and HDFS blocks are two different params. We are testing with
> >> 8k HBASE (default 64KB) and 64k HDFS (default 64MB) blocks sizes. Both
> >> these
> >> are much smaller than defaults, but we have random read heavy work load
> and
> >> smaller blocks should help, given smaller sizes are not exposing some
> other
> >> bottleneck.
> >>
> >> HBASE smaller blocks means larger indices and better random read
> >> performance. So make sense to trade some RAM for block index as we have
> >> plenty RAM on our machines.
> >>
> >
>