Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive >> mail # dev >> Tez branch and tez based patches

Copy link to this message
Re: Tez branch and tez based patches
Also watched http://www.ustream.tv/recorded/36323173

I definitely see the win in being able to stream inter-stage output.

I see some cases where small intermediate results can be kept "In memory".
But I was somewhat under the impression that the map reduce spill settings
kept stuff in memory, isn't that what spill settings are?

There is a few bullet points that came up repeatedly that I do not follow:

Something was said to the effect of "Container reuse makes X faster".
Hadoop has jvm reuse. Not following what the difference is here? Not
everyone has a 10K node cluster.

"Joins in map reduce are hard" Really? I mean some of them are I guess, but
the typical join is very easy. Just shuffle by the join key. There was not
really enough low level details here saying why joins are better in tez.

"Chosing the number of maps and reduces is hard" Really? I do not find it
that hard, I think there are times when it's not perfect but I do not find
it hard. The talk did not really offer anything here technical on how tez
makes this better other then it could make it better.

The presentations mentioned streaming data, how do two nodes stream data
between a tasks and how it it reliable? If the sender or receiver dies does
the entire process have to start again?

Again one of the talks implied there is a prototype out there that launches
hive jobs into tez. I would like to see that, it might answer more
questions then a power point, and I could profile some common queries.

Random late night thoughts over,
On Tue, Jul 30, 2013 at 12:02 AM, Edward Capriolo <[EMAIL PROTECTED]>wrote:

> At ~25:00
> "There is a working prototype of hive which is using tez as the targeted
> runtime"
> Can I get a look at that code? Is it on github?
> Edward
> On Wed, Jul 17, 2013 at 3:35 PM, Alan Gates <[EMAIL PROTECTED]> wrote:
>> Answers to some of your questions inlined.
>> Alan.
>> On Jul 16, 2013, at 10:20 PM, Edward Capriolo wrote:
>> > There are some points I want to bring up. First, I am on the PMC. Here
>> is
>> > something I find relevant:
>> >
>> > http://www.apache.org/foundation/how-it-works.html
>> >
>> > ------------------------------
>> >
>> > The role of the PMC from a Foundation perspective is oversight. The main
>> > role of the PMC is not code and not coding - but to ensure that all
>> legal
>> > issues are addressed, that procedure is followed, and that each and
>> every
>> > release is the product of the community as a whole. That is key to our
>> > litigation protection mechanisms.
>> >
>> > Secondly the role of the PMC is to further the long term development and
>> > health of the community as a whole, and to ensure that balanced and wide
>> > scale peer review and collaboration does happen. Within the ASF we worry
>> > about any community which centers around a few individuals who are
>> working
>> > virtually uncontested. We believe that this is detrimental to quality,
>> > stability, and robustness of both code and long term social structures.
>> >
>> > --------------------------------
>> >
>> >
>> https://blogs.apache.org/comdev/entry/what_makes_apache_projects_different
>> >
>> > -------------------------------------
>> >
>> > All other decisions happen on the dev list, discussions on the private
>> list
>> > are kept to a minimum.
>> >
>> > "If it didn't happen on the dev list, it didn't happen" - which leads
>> to:
>> >
>> > a) Elections of committers and PMC members are published on the dev list
>> > once finalized.
>> >
>> > b) Out-of-band discussions (IRC etc.) are summarized on the dev list as
>> > soon as they have impact on the project, code or community.
>> > ---------------------------------
>> >
>> > https://issues.apache.org/jira/browse/HIVE-4660 ironically titled "Let
>> > their be Tez" has not be +1 ed by any committer. It was never discussed
>> on
>> > the dev or the user list (as far as I can tell).
>> As all JIRA creations and updates are sent to dev@hive, creating a JIRA