Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> HBase cluster with heterogeneous resources


Copy link to this message
-
Re: HBase cluster with heterogeneous resources
https://issues.apache.org/jira/browse/HDFS-236

How bad is HDFS random access?

   - Random access in HDFS always seemed to have bad PR though hardly anyone
   used the interface. Claims/rumours range from "transfers a lot of excess
   data" (not true) to "we noticed it is 10 times slower than our non-hdfs app"
   (hard to see how if the app is I/O bound and/or is doing at least semi
   random reads).
   - It was good see HBase successfully used the interface for its speed up.
   It can not achieve competitive performance with out reasonable random access
   performance in HDFS (for HFile).
2010/10/16 William Kang <[EMAIL PROTECTED]>

> HDFS blocks are streaming files, which means you cannot random access
> those HDFS blocks quickly like other file systems. So that means if
> your HBase block is in the middle of a HDFS block, you have to
> traverse inside it to get to the middle. Right?
>
> Can somebody explain how HBase manage to fetch the HBase 64k block
> from the HDFS 64M block fast?
>
> On Sat, Oct 16, 2010 at 2:27 PM, Matt Corgan <[EMAIL PROTECTED]> wrote:
> > I could be wrong, but I don't think there's any performance benefit to
> > having a small hdfs block size.  If you are doing a random read fetching
> a
> > 1KB cell out of an HFile, it will not pull the entire 64MB hdfs block
> from
> > hdfs, it plucks only the small section of the hdfs file/block that
> contains
> > the HFile index and then the appropriate 64KB hbase block.  Maybe someone
> > more knowledgeable could elaborate on the exact number and size of hdfs
> > accesses.
> >
> >
> > On Sat, Oct 16, 2010 at 2:10 PM, Abhijit Pol <[EMAIL PROTECTED]>
> wrote:
> >
> >> >
> >> >
> >> > If this is your setup, your HDFS' namenode is bound to OOM soon.
> >> > (Namenode's
> >> > memory consumption is proportional to the number of blocks on HDFS)
> >> >
> >> >
> >> NN runs on master and we have 4GB for NN and that is good for long time
> >> given amount of blocks we have. DN has 1GB, TT 512MB and JT 1GB.
> >>
> >>
> >>
> >> > I guess you meant "hfile.min.blocksize.size" in ? That is a different
> >> > parameter from HDFS' block size, IMO. (need someone to confirm)
> >> >
> >> >
> >> yes, HBase and HDFS blocks are two different params. We are testing with
> >> 8k HBASE (default 64KB) and 64k HDFS (default 64MB) blocks sizes. Both
> >> these
> >> are much smaller than defaults, but we have random read heavy work load
> and
> >> smaller blocks should help, given smaller sizes are not exposing some
> other
> >> bottleneck.
> >>
> >> HBASE smaller blocks means larger indices and better random read
> >> performance. So make sense to trade some RAM for block index as we have
> >> plenty RAM on our machines.
> >>
> >
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB