Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> Speculative execution and TableOutputFormat


Copy link to this message
-
Speculative execution and TableOutputFormat
I believe there is a problem with Hadoop's speculative execution (which is on by default), and HBase's TableOutputFormat. If I understand correctly, speculative execution can launch the same task on multiple nodes, but only "commit" the one that finishes first. The other tasks that didn't complete are killed.

I encountered some strange behavior with speculative execution and TableOutputFormat. It looks like context.write() will submit the rows to HBase (when the write buffer is full). But there is no "rollback" if the task that submitted the rows did not finish first and is later killed. The rows remain submitted.

My particular job uses a partitioner so one node will process all records that match the partition. The reducer selects among the records and persists these to HBase. With speculative execution turned on, the reducer for the partition is actually run on 2 nodes, and both end up inserting into HBase, even though the second reducer is eventually killed. The results were not what I wanted.

Turning off speculative execution resolved my issue. I think this should be set off by default when using TableOutputFormat, unless there is a better way to handle this.
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB