Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive, mail # dev - hi

Copy link to this message
Re: hi
Jarek Jarcec Cecho 2013-04-18, 15:49
Hi Namit,
I like your proposal very much and I would take it a bit further:

>   1.  ... For any complex function, clear examples (input/output) would really help.

I'm concerned that examples in the code (comments) might very quickly become obsolete as it can very easily happen that someone will change the code without changing the example. What about using for this purpose normal unit tests? Developers will still be able to see the expected input/output, but in addition we will have automatic way how to detect (possibly incompatible) changes. Please note that I'm not suggesting to abandon the *.q file tests, just to also include unit tests for complex methods.


On Thu, Apr 18, 2013 at 12:31:10PM +0000, Namit Jain wrote:
> Hi,
> Since we are developing at a very fast pace, it would be really useful to think about maintainability and testing of the large codebase.
> Historically, we have not focussed on a few things, and they might soon bite us. I wanted to propose the following for all checkins:
>   1.  Javadoc for all public/private functions, except for setters/getters. For any complex function, clear examples (input/output) would really help.
>   2.  Convention for variable/function names – do we have any ?
>   3.  If possible, the test name (.q file) where the function is being invoked, or the query which would potentially test that scenario, if it is a query processor change.
>   4.  Specially, for query optimizations, it might be a good idea to have a simple working query at the top, and the expected changes. For e.g.. The operator tree for that query at each step, or a detailed explanation at the top.
>   5.  Comments in each test (.q file)– that should include the jira number,  what is it trying to test. Assumptions about each query.
>   6.  Reduce the output for each test – whenever query is outputting more than 10 results, it should have a reason. Otherwise, each query result should be bounded by 10 rows.
> In general, focussing on a lot of comments in the code will go a long way for everyone to follow along.
> Thanks,
> -namit