Hi Edward and Vinod,
I agree with what has been said here and it's an area I have taken a
I think the project modularity could be improved. For example the ql module
is quite large. Additionally using unit tests with mocked components as
opposed to .q file tests/tests which start some kind of server component
would improve unit test performance while increasing the precision of
tests. HIVE-4290 <https://issues.apache.org/jira/browse/HIVE-4290> should
also improve build and test times.
As some of you are aware, a few weeks ago I posted
HIVE-4675<https://issues.apache.org/jira/browse/HIVE-4675>which is a
new parallel unit test framework. I created that framework for
myself as I was frustrated by the build times for the Hive project. I've
implemented it at my employer, Cloudera, and it's been very successful. Too
ensure the framework is usable by the community at large, they have agreed
to sponsor some virtual test infrastructure. I should have more details
soon on exactly what that infrastructure will look like and I've created
HIVE-4739 <https://issues.apache.org/jira/browse/HIVE-4739> to track this
effort. If anyone is interested in sponsoring a portion of that
infrastructure please indicate as such on that JIRA and we can work out the
On Sun, Jun 16, 2013 at 2:31 PM, Vinod Kumar Vavilapalli <
[EMAIL PROTECTED]> wrote:
> This is from someone from Hadoop and who's been on and off in Hive.
> Dedicated test resources is good, but there are other (simpler?) things
> worth pursuing to begin with - suggestions from the peanut gallery:
> - Split the project into modules. Without thinking much, a simple split
> could be client, execution engine, metastore. We did the module split in
> Hadoop, it is initially a bit of pain but pays back a lot in future. And
> whenever there are isolated module changes, only those modules needs to be
> tested. Also has the added benefit of clear modularity.
> - A separate candidate suite of pre-commit tests. It can be a subset of
> all the tests, may be even hand-picked. Sure they won't catch some bugs,
> but it is a reasonable compromise that worked in Hadoop.
> - And wire the pre-commit tests with JIRA/Jenkins.
> On Jun 16, 2013, at 11:02 AM, Edward Capriolo wrote:
> > Hive's unit test suite has gotten larger as we have added more features
> > thus it takes longer to run. For a single machine duel core with solid
> > state disks I have to start a test run at night, and then check the next
> > morning to see if the run has finished. (I have been running tests for
> > maybe 2 hours and am up to escape.q)
> > ::opinion::
> > Also for a long time the distribution of which features get reviewed,
> > tested, and committed has been unfair. With more people involved in the
> > project this situation has gotten better however it is still not fair.
> > sometimes ends up happening is that a good feature, which is reviewed,
> > +1ed sits uncommitted for months or years.
> > Some committers or groups of commiters have an agenda and dedicated
> > resources, and others do not. This unbalances the project. It means that
> > small incremental improvements and new features not important to 'large
> > company with testing resources x' sit ready to be committed while other
> > people working in pairs further the project to their agenda. (This last
> > statement is not a condemnation of anyone, just possibly a fact of life)
> > ::suggestion::
> > 1) The project should sponsor an open and independent build/test farm
> > 2) Once a ticket is marked 'patch available' this build farm should
> > automatically notice this and begin testing the patch
> > 3) patches/issues which pass tests first should be considered 1st for
> > inclusions
> > We can use a hosted testing service such as:
> > http://www.cloudbees.com/platform/pricing/devcloud.cb
> > Q. Do any committers/interested parties like the idea?
Apache MRUnit - Unit testing MapReduce - http://mrunit.apache.org