Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
HBase >> mail # dev >> Stream-of-consciousness notes from HBase BoF presentations


+
Dave Wang 2012-06-15, 19:07
+
Stack 2012-06-16, 05:10
Copy link to this message
-
Re: Stream-of-consciousness notes from HBase BoF presentations
Thanks for the write up, Dave.

W.r.t. Slide 6 in J-D's talk, hbase-5699 would allow HLog's to be grouped by table, avoiding the issue to slow updating tables.

Cheers

On Jun 15, 2012, at 9:07 AM, Dave Wang <[EMAIL PROTECTED]> wrote:

> Please fill out if I missed anything, thanks.
>
> Schema thoughts - Ian
> * Build typed schema into HBase somehow?
> ** Easier for app developers
> * Idea of hashing schema into each row, perhaps have a system-level table
> for schema descriptions
>
> Performance tidbits - JD (slides at
> http://www.slideshare.net/jdcryans/performance-bof12)
>
> Write path:
> * Too much write contention (flattening out)
> ** increase memstore size
> ** rely only on memstore lower limit, but don't ever hit upper limit if
> possible
> ** HBASE-3484
> * Memstore vs. HLog size - make sure hlogs are not forcing flushes
> ** ideal is hlog.blocksize * maxlogs == just a bit above memstore.lowerLimit
> ** New balancer in trunk has balancer with weighted read vs. write requests
> * Too many regions/families - don't hit the global memstore size
> ** means global memstore size will already be reached before flush size, so
> many small files being flushed (bad)
> * write to many families with different data sizes
> ** flush size is based on region, not family
> ** HBASE-3149 about fixing this.
> * HLog compression (4608)
> ** breaks replication (5778)
> ** benefits with keys to values ratio is high (counters), since it doesn't
> compress the values
>
> Read path:
> * current LRU is sometimes better, sometimes worse.  Other algorithms may
> beat this (e.g. 2Q) in some cases.
> * make sure working data set fits in block cache of course
> * evictions start happening at 85% (DEFAULT_ACCEPTABLE_FACTOR) and evict
> down to 75% (DEFAULT_MIN_FACTOR).  Too aggressive.  Try 95% and 90%.
> * disable blockcache if you have a highly random read pattern.  disable per
> family, not throughout RS, since meta blocks (leaf, bloom) are really hot
> and you want BC on for those.
> * slabcache - double-caches everything, not flexible with regards to block
> sizes, doesn't help that much with GC since by default BC is 25% of heap by
> default.  Most heap is used by memstores.
> * GC issues tend to be caused by memstores and IPC queues (default 1000
> entries, lots of data sitting in there not processed yet sometimes)
> See Charlie Hunt's book re: GC walkthrough.  There's 20% of so of the heap
> just for HBase internals (compactions, etc.) BTW.
> ** GC topic is tough for general prescription, since JVMs change, garbage
> collectors change.
> * Overall heap size: 20-32 GB.
> * Turn on bloom filters (row) in 0.92+.  Should be on by default.
>
> Metrics and UI - Elliott - slides at
> http://www.slideshare.net/eclark00/hbase-birds-of-a-feather-metrics-ui
> * per-region metrics in 0.94
> ** on by default but need to turn on NullContextWithUpdateThread
> ** There's a patch to tcollector for OpenTSDB
> * JMX JSON: /jmx web page.  Info server needs to be running
> * New UI for master in trunk.
>
> Region load - Eugene
> * Uses d3 - Javascript library for making charts - used this to create bar
> graphs for regions/RS load
> * Streaming data possible - moving graphs with this
>
> HBase integration testing - Andy
> * Bigtop - Do for Hadoop what Debian did for Linux
> * gives you build and packaging infrastructure with RPM and DEBs
> * Deployment - puppet
> * Integration test - iTest
> * iTest has helper functions, shell-outs for test scripting
> * HBASE-6201 - mvn verify -> iTest -> MiniCluster and/or actual cluster
> * Automate the RC testing matrix into this
> * Add long-running ingestion tests and chaos monkeys - LoadTestTool, YCSB,
> GoraCI, Gremlins
> * Add full system application test cases, Kerberos setup/usage helper
> functions to iTest
>
> Benchpress - Marshall
> * Programmatic load ingestion tool
> * posting JSON to HTTP
> * unzip tarball, run shell script
> * Open source soon with Apache license
>
> Snapshots - Jesse
> * HBASE-6255
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB