Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> Performance test results


Copy link to this message
-
Re: Performance test results
J-D,
I'll try what you suggest but it is worth pointing out that my data set has
over 300M rows, however in my read test I am random reading out of a subset
that contains only 0.5M rows (5000 rows in each of the 100 key ranges in the
table).

-eran

On Tue, May 3, 2011 at 23:29, Jean-Daniel Cryans <[EMAIL PROTECTED]>wrote:

> On Tue, May 3, 2011 at 6:20 AM, Eran Kutner <[EMAIL PROTECTED]> wrote:
> > Flushing, at least when I try it now, long after I stopped writing,
> doesn't
> > seem to have any effect.
>
> Bummer.
>
> >
> > In my log I see this:
> > 2011-05-03 08:57:55,384 DEBUG
> > org.apache.hadoop.hbase.io.hfile.LruBlockCache: LRU Stats: total=3.39 GB,
> > free=897.87 MB, max=4.27 GB, blocks=54637, accesses=89411811,
> hits=75769916,
> > hitRatio=84.74%%, cachingAccesses=83656318, cachingHits=75714473,
> > cachingHitsRatio=90.50%%, evictions=1135, evicted=7887205,
> > evictedPerRun=6949.0791015625
> >
> > and every 30 seconds or so something like this:
> > 2011-05-03 08:58:07,900 DEBUG
> > org.apache.hadoop.hbase.io.hfile.LruBlockCache: Block cache LRU eviction
> > started; Attempting to free 436.92 MB of total=3.63 GB
> > 2011-05-03 08:58:07,947 DEBUG
> > org.apache.hadoop.hbase.io.hfile.LruBlockCache: Block cache LRU eviction
> > completed; freed=436.95 MB, total=3.2 GB, single=931.65 MB, multi=2.68
> GB,
> > memory=3.69 KB
> >
> > Now, if the entire working set I'm reading is 100MB in size, why would it
> > have to evict 436MB just to get it filled back in 30 seconds?
>
> I was about to ask the same question... from what I can tell from the
> this log, it seems that your working dataset is much larger than 3GB
> (the fact that it's evicting means it could be a lot more) and that's
> only on that region server.
>
> First reason that comes in mind on why it would be so much bigger is
> that you would have uploaded your dataset more than once and since
> HBase keeps versions of the data, it could accumulate. That doesn't
> explain how it would grow into GBs since by default a family only
> keeps 3 versions... unless you set that higher than the default or you
> uploaded the same data tens of times within 24 hours and the major
> compactions didn't kick in.
>
> In any case, it would be interesting that you:
>
>  - truncate the table
>  - re-import the data
>  - force a flush
>  - wait a bit until the flushes are done (should take 2-3 seconds if
> your dataset is really 100MB)
>  - do a "hadoop dfs -dus" on the table's directory (should be under/hbase)
>  - if the number is way out of whack, review how you are inserting
> your data. Either way, please report back.
>
> >
> > Also, what is a good value for hfile.block.cache.size (I have it now on
> .35)
> > but with 12.5GB of RAM available for the region servers it seem I should
> be
> > able to get it much higher.
>
> Depends, you also have to account for the MemStores which by default
> can use up to 40% of the heap
> (hbase.regionserver.global.memstore.upperLimit) leaving currently for
> you only 100-40-35=25% of the heap to do stuff like serving requests,
> compacting, flushing, etc. It's hard to give a good number for what
> should be left to the rest of HBase tho...
>