You are probably right, I am probably just desensitized to the flakiness of
Hive's testing methods. Generally I think the following issues contribute:
1) Hive unit testing framework is just plain flaky. There are many 1/10,000
times this fails items and thousands of tests.
2) Hive unit tests on a single machine take an incredible amount of time (>
15 hours) so we run them in parallel on many servers.
3) Many hive unit tests have race conditions exasperated by 1) slow CPUs in
virtualized environments 2) high cpu usage due to many tests running on the
4) We use Amazon Spot instances to keep costs down. They often die during a
run and we have to re-run the test which was running on that server.
At present Cloudera sponsors all the EC2 instances for Hive testing. If we
could get some more corporate sponsors such as MSFT or HWx to setup some
dedicated EC2 instances we could eliminate the "spot" instances which would
On Tue, Feb 11, 2014 at 1:05 PM, Remus Rusanu <[EMAIL PROTECTED]> wrote:
Apache MRUnit - Unit testing MapReduce - http://mrunit.apache.org