Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive >> mail # dev >> Tez branch and tez based patches


Copy link to this message
-
Tez branch and tez based patches
I have started to see several re factoring patches around tez.
https://issues.apache.org/jira/browse/HIVE-4843

This is the only mention on the hive list I can find with tez:
"Makes sense. I will create the branch soon.

Thanks,
Ashutosh
On Tue, Jun 11, 2013 at 7:44 PM, Gunther Hagleitner <
[EMAIL PROTECTED]> wrote:

> Hi,
>
> I am starting to work on integrating Tez into Hive (see HIVE-4660, design
> doc has already been uploaded - any feedback will be much appreciated).
> This will be a fair amount of work that will take time to stabilize/test.
> I'd like to propose creating a branch in order to be able to do this
> incrementally and collaboratively. In order to progress rapidly with this,
> I would also like to go "commit-then-review".
>
> Thanks,
> Gunther.
>"

These refactor-ings are largely destructive to a number of bugs and
language improvements in hive.The language improvements and bug fixes that
have been sitting in Jira for quite some time now marked patch-available
and are waiting for review.

There are a few things I want to point out:
1) Normally we create design docs in out wiki (which it is not)
2) Normally when the change is significantly complex we get multiple
committers to comment on it (which we did not)
On point 2 no one -1  the branch, but this is really something that should
have required a +1 from 3 committers.

I for one am not completely sold on Tez.
http://incubator.apache.org/projects/tez.html.
"directed-acyclic-graph of tasks for processing data" this description
sounds like many things which have never become popular. One to think of is
oozie "Oozie Workflow jobs are Directed Acyclical Graphs (DAGs) of
actions.". I am sure I can find a number of libraries/frameworks that make
this same claim. In general I do not feel like we have done our homework
and pre-requisites to justify all this work. If we have done the homework,
I am sure that it has not been communicated and accepted by hive developers
at large.

If we have a branch, why are we also committing on trunk? Scanning through
the tez doc the only language I keep finding language like "minimal changes
to the planner" yet, there is ALREADY lots of large changes going on!

Really none of the above would bother me accept for the fact that these
"minimal changes" are causing many "patch available" ready-for-review bugs
and core hive features to need to be re based.

I am sure I have mentioned this before, but I have to spend 12+ hours to
test a single patch on my laptop. A few days ago I was testing a new core
hive feature. After all the tests passed and before I was able to commit,
someone unleashed a tez patch on trunk which caused the thing I was testing
for 12 hours to need to be rebased.
I'm not cool with this.Next time that happens to me I will seriously
consider reverting the patch. Bug fixes and new hive features are more
important to me then integrating with incubator projects.
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB