Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive >> mail # dev >> [DISCUSS] HCatalog becoming a subproject of Hive

Copy link to this message
Re: [DISCUSS] HCatalog becoming a subproject of Hive
I am not sure where we are on this discussion.  So far those who have chimed in seemed generally positive (Namit, Edward, Clark, Alexander).  Namit and I have different visions for what the committership might look like, so I'd like to hear from other Hive PMC members what their view is on this.  I have to say from an HCatalog perspective the proposition is much less attractive without some commit rights.

On a related note, people should be aware of these threads in the Incubator list:


For those not inclined to read all the mails in the threads I will summarize (though I urge all PMC members of Hive and PPMC members of HCat to read both mail threads because this is highly relevant to what we are discussing).  There are two salient points in these threads:

1) It is not wise to build a subproject that is distinct from the main project in the sense that it has separate community members interested in it.  Bertrand, Arun, Chris Mattman, and Greg Stein all spoke against this, and all are long time Apache contributors with a lot of experience.  They were all of the opinion that it was reasonable for one project to release separate products.

2) It is not wise to have committers that have access to parts of a project but not others.  Greg and Bertrand argued (and Arun seemed to imply) that splitting up committer lists by sections of the code did not work out well.

These insights cause me to question what we mean by subproject.  I had originally envisioned something that looked like Pig and Hive did when they were subprojects of Hadoop.  But this violates both 1 and 2 above.  Given this input from many of the "wise old timers" of Apache I think we should consider what we mean when we say subproject and how tightly we are willing to integrate these projects.  Personally I think it makes sense to continue to pursue integration, as I think HCat is really a set of interfaces on top of Hive and it makes sense to coalesce those into one project.  I guess this would mean HCat becomes just another set of jars that Hive releases when it releases, rather than a stand alone entity.  But I'm curious to hear what others think.  


On Nov 14, 2012, at 10:22 PM, Namit Jain wrote:

> The same criteria should be applied to all Hive committers. Only a
> committer should be able to commit code.
> I don¹t think we should bend this rule. Metastore is not a separate
> project, but a integral part of hive.
> -namit
> On 11/12/12 10:32 PM, "Alan Gates" <[EMAIL PROTECTED]> wrote:
>> I would suggest looking over the patch history of HCat committers.  I
>> think most of them have already contributed a number of patches to the
>> metastore.  All are certainly aware of how to run Hive unit tests and
>> have an understanding of how Hive works.  So I don't think it's fair to
>> say they would be unsafe with access to the metastore.  And the Hive PMC
>> is there to assure this does not happen.  If there are issues I am sure
>> they can deal with them.
>> Alan.
>> On Nov 6, 2012, at 8:06 PM, Namit Jain wrote:
>>> Alan, that would not be a good idea. Metastore code is part of hive
>>> code,
>>> and it
>>> would be safer if only Hive committers had commit access to that.
>>> On 11/6/12 11:25 PM, "Alan Gates" <[EMAIL PROTECTED]> wrote:
>>>> On Nov 4, 2012, at 8:35 PM, Namit Jain wrote:
>>>>> I like the idea of Hcatalog becoming a Hive sub-project. The
>>>>> enhancements/bugs in the serde/metastore areas can indirectly
>>>>> benefit the hive community, and it will be easier for the fix to be in
>>>>> one
>>>>> place. Having said that, I don't see serde/metastore
>>>>> moving out of hive into a separate component. Things are tied too