Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> Help with continuous loading configuration


Copy link to this message
-
Re: Help with continuous loading configuration
You can set put.setWriteToWAL(false) to skip the write ahead logging which
slows down puts significantly.  But, you will lose data if a regionserver
crashes with data in its memstore.
On Wed, Nov 16, 2011 at 4:09 PM, Amit Jain <[EMAIL PROTECTED]> wrote:

> Hi Stack,
>
> Thanks for the feedback.  Comments inline ...
>
> On Wed, Nov 16, 2011 at 3:35 PM, Stack <[EMAIL PROTECTED]> wrote:
>
> > On Wed, Nov 16, 2011 at 3:26 PM, Amit Jain <[EMAIL PROTECTED]> wrote:
> > > Hi Lars,
> > >
> > > The keys are arriving in random order.  The HBase monitoring page shows
> > > evenly distributed load across all of the region servers.
> >
> > What kind of ops rates are you seeing?  They are running nice and
> > smooth across all servers?   No stuttering?   Whats your regionserver
> > logs look like?
> >
> > Are you presplitting your table or just letting hbase run and do up the
> > splits?
> >
>
> As far as I can tell, the operations look smooth across all servers.  We're
> not doing any pre-splitting, just letting HBase do the splits.
>
>
> > >  I didn't see
> > > anything weird in the gc logs, no mention of any failures.  I'm a
> little
> > > unclear about what the optimal values for the following properties
> should
> > > be:
> > >
> > > hbase.hstore.compactionThreshold
> >
> > Default is 3.  Look in regionserver logs.  See how many files you have
> > on average by region columnfamily (you could also look in filesystem).
> >  Are we constantly rewriting them?   If write only load mostly, you
> > might up this putting off compactions till more files around (but
> > looking in regionserver logs, if high write rate, we might be having
> > trouble keeping up with this default threshold anyways?).
> >
>
> Well, it looks like half of the regions are in the 25-32 file range and the
> other half just have 1 or 2 files.  This was when we ran it with a
> compactionThreshold of 15.
>
> How can I tell by looking at the region server logs if we're seeing a "high
> write rate" ?  We've got 48 clients sending load, 12 region servers total.
>  We're pushing the system pretty hard.
>
>
> > > hbase.hstore.blockingStoreFiles
> > >
> >
> > The higher this is, the bigger the price you'll pay if a server
> > crashes because this will be the upper bound on how many WAL logs we
> > need to split for the server before its regions come back on line
> > again.  Leave it default I'd say for now.
> >
>
> Ok, we'll leave it default for now.
>
>
> > > Is there some rule of thumb that I can use to determine good values for
> > > these properties?
> > >
> >
> > You've checked out this section of the book:
> > http://hbase.apache.org/book.html#performance
> >
> > Are you filling the machines?   Are they burning cpu?  Or io-bound?
> > If not, perhaps open the front gate wider by upping the number of
> > concurrent handlers.
> >
>
> I have read through that section of the HBase book.  There is plenty of CPU
> available.  How do I up the number of concurrent handlers?  Increase
> hbase.regionserver.handler.count ?
>
> - Amit
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB