Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # dev >> hbase hackaton notes


Copy link to this message
-
hbase hackaton notes
Hi devs,

We are having a nice hackaton today at Cloudera's offices down at Palo
Alto. There are 30+ ppl who showed up, including most of the committers. In
the morning, there were some discussions related to recent issues. Here are
my notes:

JD - hypertable performance comparison
 - their tuning is wrong
 - JD tested both same hypertable numbers, hbase tests finished, hbase
slow,
 - first do a lot splits, then slow the splits.
 - compactions are smarter for hypertable
 - smaller memstore is faster, as it fills up, it gets slower
 - client does not wait for flush commits, does that async. JD used async
client for getting comparable numbers

Matt - hotpads
 - talked about prefix compression, trie data encoding (HBASE-4676)
 - went over the chart in the jira ticket
 - random reads, bigger block sizes
 - does not work very well for md5 prefixed keys, you should partition
using a single byte
 - write speed is affected (order of magnitude slower for write compared to
None encoding), see attached pdf in the ticket
 - a lot of improvement options for the key-value heap/block cache/encoding
internal APIs

Todd - performance
 - demo of oprofile, ycsb test
 - uses hw counters, shows actual CPU clocks, L1, L2 cache hits/misses,
etc. Use a custom jvm agent for profiling java
 - crc32 from hadoop libzip, URI,  KeyValue comparator, etc

Jimmy- pb
 - remanining things: coprocessors, rpc engine, meta table, some minor
things
 - we should not expose too much rpc internals into coprocessors, and make
it not too difficult
 - continue discussion on jira

Jesse - mvn modules
 - cross module dependencies should be eliminated
 - hbase-server, hbase-client, hbase-shared at lower level, we should think
about mini-cluster

Lars, durable sync
 - hflush / hsync
 - hacky flush blocks on close mode
 - disk io is bursty as it is, we should smooth it out
 - maybe do it per column family configurable

David - testing
 - rc testing
 - aggregate tests results in a wiki or smt for each rc
 - binary/ source release issues
 - need to recompile hbase with hadoop 1,2. jenkins build for each.
 - 0.96, hadoop-1 and hadoop-2
 - compatibility tessts, we do not have any, we can add it to checklist

Andrew - async hbase
 - build sync client on top of async

Jesse - snaphots

go around the room for integrations

Huddle groups for topics above

Keep hacking,
Enis
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB