Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # dev >> hbase hackaton notes


Copy link to this message
-
hbase hackaton notes
Hi devs,

We are having a nice hackaton today at Cloudera's offices down at Palo
Alto. There are 30+ ppl who showed up, including most of the committers. In
the morning, there were some discussions related to recent issues. Here are
my notes:

JD - hypertable performance comparison
 - their tuning is wrong
 - JD tested both same hypertable numbers, hbase tests finished, hbase
slow,
 - first do a lot splits, then slow the splits.
 - compactions are smarter for hypertable
 - smaller memstore is faster, as it fills up, it gets slower
 - client does not wait for flush commits, does that async. JD used async
client for getting comparable numbers

Matt - hotpads
 - talked about prefix compression, trie data encoding (HBASE-4676)
 - went over the chart in the jira ticket
 - random reads, bigger block sizes
 - does not work very well for md5 prefixed keys, you should partition
using a single byte
 - write speed is affected (order of magnitude slower for write compared to
None encoding), see attached pdf in the ticket
 - a lot of improvement options for the key-value heap/block cache/encoding
internal APIs

Todd - performance
 - demo of oprofile, ycsb test
 - uses hw counters, shows actual CPU clocks, L1, L2 cache hits/misses,
etc. Use a custom jvm agent for profiling java
 - crc32 from hadoop libzip, URI,  KeyValue comparator, etc

Jimmy- pb
 - remanining things: coprocessors, rpc engine, meta table, some minor
things
 - we should not expose too much rpc internals into coprocessors, and make
it not too difficult
 - continue discussion on jira

Jesse - mvn modules
 - cross module dependencies should be eliminated
 - hbase-server, hbase-client, hbase-shared at lower level, we should think
about mini-cluster

Lars, durable sync
 - hflush / hsync
 - hacky flush blocks on close mode
 - disk io is bursty as it is, we should smooth it out
 - maybe do it per column family configurable

David - testing
 - rc testing
 - aggregate tests results in a wiki or smt for each rc
 - binary/ source release issues
 - need to recompile hbase with hadoop 1,2. jenkins build for each.
 - 0.96, hadoop-1 and hadoop-2
 - compatibility tessts, we do not have any, we can add it to checklist

Andrew - async hbase
 - build sync client on top of async

Jesse - snaphots

go around the room for integrations

Huddle groups for topics above

Keep hacking,
Enis