|
Alan Gates
2011-02-02, 21:18
Jeff Hammerbacher
2011-02-02, 22:08
Olga Natkovich
2011-02-02, 23:05
Julien Le Dem
2011-02-02, 23:09
Edward Capriolo
2011-02-02, 23:11
Daniel Dai
2011-02-02, 23:15
Benjamin Reed
2011-02-02, 23:26
Thejas M Nair
2011-02-02, 23:32
Milind Bhandarkar
2011-02-03, 00:57
Richard Ding
2011-02-03, 01:05
Alan Gates
2011-02-03, 04:58
Alan Gates
2011-02-03, 05:16
Santhosh Srinivasan
2011-02-03, 06:11
Edward Capriolo
2011-02-03, 16:10
Alan Gates
2011-02-03, 16:36
Ashutosh Chauhan
2011-02-03, 16:43
Jay Booth
2011-02-03, 16:52
John Sichi
2011-02-03, 19:38
Jeff Hammerbacher
2011-02-03, 21:15
Milind Bhandarkar
2011-02-03, 21:24
yongqiang he
2011-02-03, 21:41
Ashutosh Chauhan
2011-02-03, 21:49
John Sichi
2011-02-03, 22:49
Ashutosh Chauhan
2011-02-03, 22:58
Alan Gates
2011-02-03, 23:11
Alex Boisvert
2011-02-03, 23:29
John Sichi
2011-02-04, 00:07
John Sichi
2011-02-04, 00:30
Alex Boisvert
2011-02-04, 00:56
Alan Gates
2011-02-04, 01:09
John Sichi
2011-02-04, 02:00
John Sichi
2011-02-04, 02:07
Alan Gates
2011-02-08, 18:10
|
-
[VOTE] Sponsoring Howl as an Apache Incubator projectAlan Gates 2011-02-02, 21:18
Howl is a table management system built to provide metadata and
storage management across data processing tools in Hadoop (Pig, Hive, MapReduce, ...). You can learn more details at http://wiki.apache.org/pig/Howl . For the last six months the code has been hosted at github. The Howl team would like to move the project into the Apache Incubator. You can see the proposal for the project at http://wiki.apache.org/incubator/HowlProposal . In order to be accepted as an Incubator project Howl needs a Sponsoring project. I propose that we, the Pig project, sponsor Howl. By sponsoring Howl we are saying that we believe it is a good fit for the ASF and that we will assist the Howl project to succeed. You can read full details of sponsoring a project at http://incubator.apache.org/incubation/Roles_and_Responsibilities.html#Sponsor . Our bylaws don't explicitly cover such a vote, but I think lazy majority should be reasonable. All votes are welcome, PMC member votes will be binding. Clearly I'm +1. Alan.
-
Re: [VOTE] Sponsoring Howl as an Apache Incubator projectJeff Hammerbacher 2011-02-02, 22:08
Awesome! Huge +1.
On Wed, Feb 2, 2011 at 1:18 PM, Alan Gates <[EMAIL PROTECTED]> wrote: > Howl is a table management system built to provide metadata and storage > management across data processing tools in Hadoop (Pig, Hive, MapReduce, > ...). You can learn more details at http://wiki.apache.org/pig/Howl. For > the last six months the code has been hosted at github. The Howl team would > like to move the project into the Apache Incubator. You can see the > proposal for the project at http://wiki.apache.org/incubator/HowlProposal. > > In order to be accepted as an Incubator project Howl needs a Sponsoring > project. I propose that we, the Pig project, sponsor Howl. By sponsoring > Howl we are saying that we believe it is a good fit for the ASF and that we > will assist the Howl project to succeed. You can read full details of > sponsoring a project at > http://incubator.apache.org/incubation/Roles_and_Responsibilities.html#Sponsor > . > > Our bylaws don't explicitly cover such a vote, but I think lazy majority > should be reasonable. All votes are welcome, PMC member votes will be > binding. > > Clearly I'm +1. > > Alan. >
-
RE: [VOTE] Sponsoring Howl as an Apache Incubator projectOlga Natkovich 2011-02-02, 23:05
+1
-----Original Message----- From: Alan Gates [mailto:[EMAIL PROTECTED]] Sent: Wednesday, February 02, 2011 1:19 PM To: [EMAIL PROTECTED] Subject: [VOTE] Sponsoring Howl as an Apache Incubator project Howl is a table management system built to provide metadata and storage management across data processing tools in Hadoop (Pig, Hive, MapReduce, ...). You can learn more details at http://wiki.apache.org/pig/Howl . For the last six months the code has been hosted at github. The Howl team would like to move the project into the Apache Incubator. You can see the proposal for the project at http://wiki.apache.org/incubator/HowlProposal . In order to be accepted as an Incubator project Howl needs a Sponsoring project. I propose that we, the Pig project, sponsor Howl. By sponsoring Howl we are saying that we believe it is a good fit for the ASF and that we will assist the Howl project to succeed. You can read full details of sponsoring a project at http://incubator.apache.org/incubation/Roles_and_Responsibilities.html#Sponsor . Our bylaws don't explicitly cover such a vote, but I think lazy majority should be reasonable. All votes are welcome, PMC member votes will be binding. Clearly I'm +1. Alan.
-
Re: [VOTE] Sponsoring Howl as an Apache Incubator projectJulien Le Dem 2011-02-02, 23:09
+1
On 2/2/11 2:08 PM, "Jeff Hammerbacher" <[EMAIL PROTECTED]> wrote: Awesome! Huge +1. On Wed, Feb 2, 2011 at 1:18 PM, Alan Gates <[EMAIL PROTECTED]> wrote: > Howl is a table management system built to provide metadata and storage > management across data processing tools in Hadoop (Pig, Hive, MapReduce, > ...). You can learn more details at http://wiki.apache.org/pig/Howl. For > the last six months the code has been hosted at github. The Howl team would > like to move the project into the Apache Incubator. You can see the > proposal for the project at http://wiki.apache.org/incubator/HowlProposal. > > In order to be accepted as an Incubator project Howl needs a Sponsoring > project. I propose that we, the Pig project, sponsor Howl. By sponsoring > Howl we are saying that we believe it is a good fit for the ASF and that we > will assist the Howl project to succeed. You can read full details of > sponsoring a project at > http://incubator.apache.org/incubation/Roles_and_Responsibilities.html#Sponsor > . > > Our bylaws don't explicitly cover such a vote, but I think lazy majority > should be reasonable. All votes are welcome, PMC member votes will be > binding. > > Clearly I'm +1. > > Alan. >
-
Re: [VOTE] Sponsoring Howl as an Apache Incubator projectEdward Capriolo 2011-02-02, 23:11
On Wed, Feb 2, 2011 at 5:08 PM, Jeff Hammerbacher <[EMAIL PROTECTED]> wrote:
> Awesome! Huge +1. > > On Wed, Feb 2, 2011 at 1:18 PM, Alan Gates <[EMAIL PROTECTED]> wrote: > >> Howl is a table management system built to provide metadata and storage >> management across data processing tools in Hadoop (Pig, Hive, MapReduce, >> ...). You can learn more details at http://wiki.apache.org/pig/Howl. For >> the last six months the code has been hosted at github. The Howl team would >> like to move the project into the Apache Incubator. You can see the >> proposal for the project at http://wiki.apache.org/incubator/HowlProposal. >> >> In order to be accepted as an Incubator project Howl needs a Sponsoring >> project. I propose that we, the Pig project, sponsor Howl. By sponsoring >> Howl we are saying that we believe it is a good fit for the ASF and that we >> will assist the Howl project to succeed. You can read full details of >> sponsoring a project at >> http://incubator.apache.org/incubation/Roles_and_Responsibilities.html#Sponsor >> . >> >> Our bylaws don't explicitly cover such a vote, but I think lazy majority >> should be reasonable. All votes are welcome, PMC member votes will be >> binding. >> >> Clearly I'm +1. >> >> Alan. >> > I do think it is a great idea that hive/pig/ and map reduce share a meta store. However I am not sure I agree with the approach. IMHO Howl should be a hive sub project. "The initial release of Howl will allow interoperability of data between Pig, Map Reduce, and Hive" I believe the "The initial release of Howl should support hive" at this point hive should remove the /metastore code from inside hive and depend on howl. I say this because hive is very actively reworking the metastore right now for security, a new type of views, and indexes. I feel if the metastore branches from the hive as howl getting the two entities back together will be difficult. Having 99% of the same code base shared between hive and howl but not having compatibility between the two is my fear.
-
Re: [VOTE] Sponsoring Howl as an Apache Incubator projectDaniel Dai 2011-02-02, 23:15
+1
Olga Natkovich wrote: > +1 > > -----Original Message----- > From: Alan Gates [mailto:[EMAIL PROTECTED]] > Sent: Wednesday, February 02, 2011 1:19 PM > To: [EMAIL PROTECTED] > Subject: [VOTE] Sponsoring Howl as an Apache Incubator project > > Howl is a table management system built to provide metadata and > storage management across data processing tools in Hadoop (Pig, Hive, > MapReduce, ...). You can learn more details at http://wiki.apache.org/pig/Howl > . For the last six months the code has been hosted at github. The > Howl team would like to move the project into the Apache Incubator. > You can see the proposal for the project at http://wiki.apache.org/incubator/HowlProposal > . > > In order to be accepted as an Incubator project Howl needs a > Sponsoring project. I propose that we, the Pig project, sponsor > Howl. By sponsoring Howl we are saying that we believe it is a good > fit for the ASF and that we will assist the Howl project to succeed. > You can read full details of sponsoring a project at http://incubator.apache.org/incubation/Roles_and_Responsibilities.html#Sponsor > . > > Our bylaws don't explicitly cover such a vote, but I think lazy > majority should be reasonable. All votes are welcome, PMC member > votes will be binding. > > Clearly I'm +1. > > Alan. >
-
Re: [VOTE] Sponsoring Howl as an Apache Incubator projectBenjamin Reed 2011-02-02, 23:26
+1
On 02/02/2011 03:15 PM, Daniel Dai wrote: > +1 > Olga Natkovich wrote: >> +1 >> >> -----Original Message----- >> From: Alan Gates [mailto:[EMAIL PROTECTED]] >> Sent: Wednesday, February 02, 2011 1:19 PM >> To: [EMAIL PROTECTED] >> Subject: [VOTE] Sponsoring Howl as an Apache Incubator project >> >> Howl is a table management system built to provide metadata and >> storage management across data processing tools in Hadoop (Pig, Hive, >> MapReduce, ...). You can learn more details at http://wiki.apache.org/pig/Howl >> . For the last six months the code has been hosted at github. The >> Howl team would like to move the project into the Apache Incubator. >> You can see the proposal for the project at http://wiki.apache.org/incubator/HowlProposal >> . >> >> In order to be accepted as an Incubator project Howl needs a >> Sponsoring project. I propose that we, the Pig project, sponsor >> Howl. By sponsoring Howl we are saying that we believe it is a good >> fit for the ASF and that we will assist the Howl project to succeed. >> You can read full details of sponsoring a project at http://incubator.apache.org/incubation/Roles_and_Responsibilities.html#Sponsor >> . >> >> Our bylaws don't explicitly cover such a vote, but I think lazy >> majority should be reasonable. All votes are welcome, PMC member >> votes will be binding. >> >> Clearly I'm +1. >> >> Alan. >>
-
Re: [VOTE] Sponsoring Howl as an Apache Incubator projectThejas M Nair 2011-02-02, 23:32
+1
-Thejas On 2/2/11 1:18 PM, "Alan Gates" <[EMAIL PROTECTED]> wrote: Howl is a table management system built to provide metadata and storage management across data processing tools in Hadoop (Pig, Hive, MapReduce, ...). You can learn more details at http://wiki.apache.org/pig/Howl . For the last six months the code has been hosted at github. The Howl team would like to move the project into the Apache Incubator. You can see the proposal for the project at http://wiki.apache.org/incubator/HowlProposal . In order to be accepted as an Incubator project Howl needs a Sponsoring project. I propose that we, the Pig project, sponsor Howl. By sponsoring Howl we are saying that we believe it is a good fit for the ASF and that we will assist the Howl project to succeed. You can read full details of sponsoring a project at http://incubator.apache.org/incubation/Roles_and_Responsibilities.html#Sponsor . Our bylaws don't explicitly cover such a vote, but I think lazy majority should be reasonable. All votes are welcome, PMC member votes will be binding. Clearly I'm +1. Alan.
-
Re: [VOTE] Sponsoring Howl as an Apache Incubator projectMilind Bhandarkar 2011-02-03, 00:57
I feel that Howl should start as a contrib to Hadoop, and move to be a subproject of Hadoop once there is sufficient adoption, rather than going the incubator way. My reasons are as follows:
1. Howl is aimed at providing abstractions for facilitating interoperability between various systems built *on top of Hadoop*, and should not limit itself to Pig, Hive, and native MapReduce. So, any system that is hadoop compatible should be able to use Howl as a metadata store. 2. Having Howl as contrib of Hadoop will ensure that the input and output formats, compression codecs, underlying storage APIs etc remain in sync from release to release, and users do not have to worry about whether version x of Howl is compatible with version y of Hadoop or not. 3. Pig, Hive, Cascading, .. are all already dependent on Hadoop. Including Howl as Hadoop contrib means they do not add any more dependencies. 4. The roadmap of Howl includes authentication and authorization support. It is a standard industry practice that metadata security mechanisms match those for data security. Thus, a significant code can be shared with hadoop's authorization and authentication. 5. Hadoop-compatible file systems provide an abstraction over underlying storage systems. Howl currently provides a table abstraction over the file system. In future, when Hadoop provides blockpool abstraction (as part of federation), Howl will be able to take advantage of that and optimize. 6. Howl roadmap currently does not contain multi-tenancy features such as quotas. Since there is a strong correlation between number of tables, number of partitions in Howl and number of directories and files in HDFS, it could be streamlined if Howl is part of Hadoop. Thoughts ? - milind On Feb 2, 2011, at 1:18 PM, Alan Gates wrote: > Howl is a table management system built to provide metadata and storage management across data processing tools in Hadoop (Pig, Hive, MapReduce, ...). You can learn more details at http://wiki.apache.org/pig/Howl. For the last six months the code has been hosted at github. The Howl team would like to move the project into the Apache Incubator. You can see the proposal for the project at http://wiki.apache.org/incubator/HowlProposal. > > In order to be accepted as an Incubator project Howl needs a Sponsoring project. I propose that we, the Pig project, sponsor Howl. By sponsoring Howl we are saying that we believe it is a good fit for the ASF and that we will assist the Howl project to succeed. You can read full details of sponsoring a project at http://incubator.apache.org/incubation/Roles_and_Responsibilities.html#Sponsor. > > Our bylaws don't explicitly cover such a vote, but I think lazy majority should be reasonable. All votes are welcome, PMC member votes will be binding. > > Clearly I'm +1. > > Alan. --- Milind Bhandarkar [EMAIL PROTECTED]
-
Re: [VOTE] Sponsoring Howl as an Apache Incubator projectRichard Ding 2011-02-03, 01:05
+1
On 2/2/11 1:18 PM, "Alan Gates" <[EMAIL PROTECTED]> wrote: Howl is a table management system built to provide metadata and storage management across data processing tools in Hadoop (Pig, Hive, MapReduce, ...). You can learn more details at http://wiki.apache.org/pig/Howl . For the last six months the code has been hosted at github. The Howl team would like to move the project into the Apache Incubator. You can see the proposal for the project at http://wiki.apache.org/incubator/HowlProposal . In order to be accepted as an Incubator project Howl needs a Sponsoring project. I propose that we, the Pig project, sponsor Howl. By sponsoring Howl we are saying that we believe it is a good fit for the ASF and that we will assist the Howl project to succeed. You can read full details of sponsoring a project at http://incubator.apache.org/incubation/Roles_and_Responsibilities.html#Sponsor . Our bylaws don't explicitly cover such a vote, but I think lazy majority should be reasonable. All votes are welcome, PMC member votes will be binding. Clearly I'm +1. Alan.
-
Re: [VOTE] Sponsoring Howl as an Apache Incubator projectAlan Gates 2011-02-03, 04:58
I see a couple blockers that prevent this from being a contrib project
of Hadoop: 1) The Hadoop project is actively trying to remove the contrib projects it has, see http://tinyurl.com/6yl25jz. I doubt it's interested in any new ones. 2) The Hadoop project is producing a release every 2 or 3 years currently. As a new project Howl will be wanting to release every 2 or 3 months for a while. Being tied to something as slow moving as Hadoop for releases would make it hard for Howl get releases out the door. Alan. On Feb 2, 2011, at 4:57 PM, Milind Bhandarkar wrote: > I feel that Howl should start as a contrib to Hadoop, and move to be > a subproject of Hadoop once there is sufficient adoption, rather > than going the incubator way. My reasons are as follows: > > 1. Howl is aimed at providing abstractions for facilitating > interoperability between various systems built *on top of Hadoop*, > and should not limit itself to Pig, Hive, and native MapReduce. So, > any system that is hadoop compatible should be able to use Howl as a > metadata store. > > 2. Having Howl as contrib of Hadoop will ensure that the input and > output formats, compression codecs, underlying storage APIs etc > remain in sync from release to release, and users do not have to > worry about whether version x of Howl is compatible with version y > of Hadoop or not. > > 3. Pig, Hive, Cascading, .. are all already dependent on Hadoop. > Including Howl as Hadoop contrib means they do not add any more > dependencies. > > 4. The roadmap of Howl includes authentication and authorization > support. It is a standard industry practice that metadata security > mechanisms match those for data security. Thus, a significant code > can be shared with hadoop's authorization and authentication. > > 5. Hadoop-compatible file systems provide an abstraction over > underlying storage systems. Howl currently provides a table > abstraction over the file system. In future, when Hadoop provides > blockpool abstraction (as part of federation), Howl will be able to > take advantage of that and optimize. > > 6. Howl roadmap currently does not contain multi-tenancy features > such as quotas. Since there is a strong correlation between number > of tables, number of partitions in Howl and number of directories > and files in HDFS, it could be streamlined if Howl is part of Hadoop. > > Thoughts ? > > - milind > > > On Feb 2, 2011, at 1:18 PM, Alan Gates wrote: > >> Howl is a table management system built to provide metadata and >> storage management across data processing tools in Hadoop (Pig, >> Hive, MapReduce, ...). You can learn more details at http://wiki.apache.org/pig/Howl >> . For the last six months the code has been hosted at github. The >> Howl team would like to move the project into the Apache >> Incubator. You can see the proposal for the project at http://wiki.apache.org/incubator/HowlProposal >> . >> >> In order to be accepted as an Incubator project Howl needs a >> Sponsoring project. I propose that we, the Pig project, sponsor >> Howl. By sponsoring Howl we are saying that we believe it is a >> good fit for the ASF and that we will assist the Howl project to >> succeed. You can read full details of sponsoring a project at http://incubator.apache.org/incubation/Roles_and_Responsibilities.html#Sponsor >> . >> >> Our bylaws don't explicitly cover such a vote, but I think lazy >> majority should be reasonable. All votes are welcome, PMC member >> votes will be binding. >> >> Clearly I'm +1. >> >> Alan. > > --- > Milind Bhandarkar > [EMAIL PROTECTED] > > >
-
Re: [VOTE] Sponsoring Howl as an Apache Incubator projectAlan Gates 2011-02-03, 05:16
Edward,
I understand your concern with having a copy of the metastore code in Howl. However, let's separate code from governance. The reason Howl has a copy of Hive's metastore is not because we're proposing it for the Incubator, it is because in the course of developing it over the last six months we've found that Howl development needs to move much faster than Hive development can. This is appropriate, since Hive is a mature product and has at least one large customer that runs code in production very soon after it is checked in. Thus the Hive community is rightly cautious about checking in changes to the metastore. Howl, on the other hand, is new and innovating quickly, so it likes to get things checked in quickly. Over the last six months every patch Howl has made to the Hive metastore code has made it back into Hive code. But it generally takes a few weeks or more to get in. Whether Howl is a Hive subproject or an Incubator project it faces the same dilemma. The only other alternative that was suggested was to have Howl extern the metastore code from Hive and keep its patches in its build and apply them at build time. But this is very fragile, since any changes in the Hive metastore code could invalidate all those patches. We know that this is not sustainable in the long run, which is why the proposal calls out the need to resolve this one way or another as the project matures. As far as reaching an end state where Hive and Howl are not compatible, we would view that as a failure for Howl. The goal for Howl is to be a metastore for Pig, MapReduce, and Hive, not just 2 out 3. So we have a strong motivation to maintain that compatibility. In terms of governance, given that we have significant contributions coming from members of the Pig team, the Hive team, and the core Hadoop team it seemed that giving Howl its own space in the Incubator made more sense than adding it as a subproject of any one of those teams. Alan. On Feb 2, 2011, at 3:11 PM, Edward Capriolo wrote: > On Wed, Feb 2, 2011 at 5:08 PM, Jeff Hammerbacher > <[EMAIL PROTECTED]> wrote: >> Awesome! Huge +1. >> >> On Wed, Feb 2, 2011 at 1:18 PM, Alan Gates <[EMAIL PROTECTED]> >> wrote: >> >>> Howl is a table management system built to provide metadata and >>> storage >>> management across data processing tools in Hadoop (Pig, Hive, >>> MapReduce, >>> ...). You can learn more details at http://wiki.apache.org/pig/ >>> Howl. For >>> the last six months the code has been hosted at github. The Howl >>> team would >>> like to move the project into the Apache Incubator. You can see the >>> proposal for the project at http://wiki.apache.org/incubator/HowlProposal >>> . >>> >>> In order to be accepted as an Incubator project Howl needs a >>> Sponsoring >>> project. I propose that we, the Pig project, sponsor Howl. By >>> sponsoring >>> Howl we are saying that we believe it is a good fit for the ASF >>> and that we >>> will assist the Howl project to succeed. You can read full >>> details of >>> sponsoring a project at >>> http://incubator.apache.org/incubation/Roles_and_Responsibilities.html#Sponsor >>> . >>> >>> Our bylaws don't explicitly cover such a vote, but I think lazy >>> majority >>> should be reasonable. All votes are welcome, PMC member votes >>> will be >>> binding. >>> >>> Clearly I'm +1. >>> >>> Alan. >>> >> > > I do think it is a great idea that hive/pig/ and map reduce share a > meta store. However I am not sure I agree with the approach. IMHO Howl > should be a hive sub project. > > "The initial release of Howl will allow interoperability of data > between Pig, Map Reduce, and Hive" > I believe the "The initial release of Howl should support hive" > at this point hive should remove the /metastore code from inside hive > and depend on howl. > > I say this because hive is very actively reworking the metastore right > now for security, a new type of views, and indexes. I feel if the
-
RE: [VOTE] Sponsoring Howl as an Apache Incubator projectSanthosh Srinivasan 2011-02-03, 06:11
+1 for Howl as an incubator project.
-----Original Message----- From: Alan Gates [mailto:[EMAIL PROTECTED]] Sent: Wednesday, February 02, 2011 9:17 PM To: [EMAIL PROTECTED] Cc: [EMAIL PROTECTED] Subject: Re: [VOTE] Sponsoring Howl as an Apache Incubator project Edward, I understand your concern with having a copy of the metastore code in Howl. However, let's separate code from governance. The reason Howl has a copy of Hive's metastore is not because we're proposing it for the Incubator, it is because in the course of developing it over the last six months we've found that Howl development needs to move much faster than Hive development can. This is appropriate, since Hive is a mature product and has at least one large customer that runs code in production very soon after it is checked in. Thus the Hive community is rightly cautious about checking in changes to the metastore. Howl, on the other hand, is new and innovating quickly, so it likes to get things checked in quickly. Over the last six months every patch Howl has made to the Hive metastore code has made it back into Hive code. But it generally takes a few weeks or more to get in. Whether Howl is a Hive subproject or an Incubator project it faces the same dilemma. The only other alternative that was suggested was to have Howl extern the metastore code from Hive and keep its patches in its build and apply them at build time. But this is very fragile, since any changes in the Hive metastore code could invalidate all those patches. We know that this is not sustainable in the long run, which is why the proposal calls out the need to resolve this one way or another as the project matures. As far as reaching an end state where Hive and Howl are not compatible, we would view that as a failure for Howl. The goal for Howl is to be a metastore for Pig, MapReduce, and Hive, not just 2 out 3. So we have a strong motivation to maintain that compatibility. In terms of governance, given that we have significant contributions coming from members of the Pig team, the Hive team, and the core Hadoop team it seemed that giving Howl its own space in the Incubator made more sense than adding it as a subproject of any one of those teams. Alan. On Feb 2, 2011, at 3:11 PM, Edward Capriolo wrote: > On Wed, Feb 2, 2011 at 5:08 PM, Jeff Hammerbacher > <[EMAIL PROTECTED]> wrote: >> Awesome! Huge +1. >> >> On Wed, Feb 2, 2011 at 1:18 PM, Alan Gates <[EMAIL PROTECTED]> >> wrote: >> >>> Howl is a table management system built to provide metadata and >>> storage management across data processing tools in Hadoop (Pig, >>> Hive, MapReduce, ...). You can learn more details at >>> http://wiki.apache.org/pig/ Howl. For the last six months the code >>> has been hosted at github. The Howl team would like to move the >>> project into the Apache Incubator. You can see the proposal for the >>> project at http://wiki.apache.org/incubator/HowlProposal >>> . >>> >>> In order to be accepted as an Incubator project Howl needs a >>> Sponsoring project. I propose that we, the Pig project, sponsor >>> Howl. By sponsoring Howl we are saying that we believe it is a good >>> fit for the ASF and that we will assist the Howl project to succeed. >>> You can read full details of sponsoring a project at >>> http://incubator.apache.org/incubation/Roles_and_Responsibilities.ht >>> ml#Sponsor >>> . >>> >>> Our bylaws don't explicitly cover such a vote, but I think lazy >>> majority should be reasonable. All votes are welcome, PMC member >>> votes will be binding. >>> >>> Clearly I'm +1. >>> >>> Alan. >>> >> > > I do think it is a great idea that hive/pig/ and map reduce share a > meta store. However I am not sure I agree with the approach. IMHO Howl > should be a hive sub project. > > "The initial release of Howl will allow interoperability of data > between Pig, Map Reduce, and Hive" > I believe the "The initial release of Howl should support hive" > at this point hive should remove the /metastore code from inside hive
-
Re: [VOTE] Sponsoring Howl as an Apache Incubator projectEdward Capriolo 2011-02-03, 16:10
On Thu, Feb 3, 2011 at 12:16 AM, Alan Gates <[EMAIL PROTECTED]> wrote:
> Edward, > > I understand your concern with having a copy of the metastore code in Howl. > However, let's separate code from governance. The reason Howl has a copy > of Hive's metastore is not because we're proposing it for the Incubator, it > is because in the course of developing it over the last six months we've > found that Howl development needs to move much faster than Hive development > can. This is appropriate, since Hive is a mature product and has at least > one large customer that runs code in production very soon after it is > checked in. Thus the Hive community is rightly cautious about checking in > changes to the metastore. Howl, on the other hand, is new and innovating > quickly, so it likes to get things checked in quickly. Over the last six > months every patch Howl has made to the Hive metastore code has made it back > into Hive code. But it generally takes a few weeks or more to get in. > > Whether Howl is a Hive subproject or an Incubator project it faces the same > dilemma. The only other alternative that was suggested was to have Howl > extern the metastore code from Hive and keep its patches in its build and > apply them at build time. But this is very fragile, since any changes in > the Hive metastore code could invalidate all those patches. We know that > this is not sustainable in the long run, which is why the proposal calls out > the need to resolve this one way or another as the project matures. > > As far as reaching an end state where Hive and Howl are not compatible, we > would view that as a failure for Howl. The goal for Howl is to be a > metastore for Pig, MapReduce, and Hive, not just 2 out 3. So we have a > strong motivation to maintain that compatibility. > > In terms of governance, given that we have significant contributions coming > from members of the Pig team, the Hive team, and the core Hadoop team it > seemed that giving Howl its own space in the Incubator made more sense than > adding it as a subproject of any one of those teams. > > Alan. > > On Feb 2, 2011, at 3:11 PM, Edward Capriolo wrote: > >> On Wed, Feb 2, 2011 at 5:08 PM, Jeff Hammerbacher <[EMAIL PROTECTED]> >> wrote: >>> >>> Awesome! Huge +1. >>> >>> On Wed, Feb 2, 2011 at 1:18 PM, Alan Gates <[EMAIL PROTECTED]> wrote: >>> >>>> Howl is a table management system built to provide metadata and storage >>>> management across data processing tools in Hadoop (Pig, Hive, MapReduce, >>>> ...). You can learn more details at http://wiki.apache.org/pig/Howl. >>>> For >>>> the last six months the code has been hosted at github. The Howl team >>>> would >>>> like to move the project into the Apache Incubator. You can see the >>>> proposal for the project at >>>> http://wiki.apache.org/incubator/HowlProposal. >>>> >>>> In order to be accepted as an Incubator project Howl needs a Sponsoring >>>> project. I propose that we, the Pig project, sponsor Howl. By >>>> sponsoring >>>> Howl we are saying that we believe it is a good fit for the ASF and that >>>> we >>>> will assist the Howl project to succeed. You can read full details of >>>> sponsoring a project at >>>> >>>> http://incubator.apache.org/incubation/Roles_and_Responsibilities.html#Sponsor >>>> . >>>> >>>> Our bylaws don't explicitly cover such a vote, but I think lazy majority >>>> should be reasonable. All votes are welcome, PMC member votes will be >>>> binding. >>>> >>>> Clearly I'm +1. >>>> >>>> Alan. >>>> >>> >> >> I do think it is a great idea that hive/pig/ and map reduce share a >> meta store. However I am not sure I agree with the approach. IMHO Howl >> should be a hive sub project. >> >> "The initial release of Howl will allow interoperability of data >> between Pig, Map Reduce, and Hive" >> I believe the "The initial release of Howl should support hive" >> at this point hive should remove the /metastore code from inside hive >> and depend on howl. >> >> I say this because hive is very actively reworking the metastore right Alan, I see your points. I agree with you and I am +1. (incubator/subproject is not important to me) You mentioned that hive is cautious about checking changes into the meta-store. I would not say we (hive) are cautious. Hive is getting pulled in many people in many directions (this is a good thing). But the number of people that can technically review patches might be burdened at times by the number of them. Ideally, I would think hive committers are going to be active (and probably would have commit) on howl or is it going to be the burden of howl track pig and hive until hive drops /metastore and begins using howl? I am just curious about what you think the time line looks like (IE how long howl will be in the incubator for) (rought guess of course) Thank you, Edward
-
Re: [VOTE] Sponsoring Howl as an Apache Incubator projectAlan Gates 2011-02-03, 16:36
> > Alan, > > I see your points. I agree with you and I am +1. > > (incubator/subproject is not important to me) > > You mentioned that hive is cautious about checking changes into the > meta-store. I would not say we (hive) are cautious. Hive is getting > pulled in many people in many directions (this is a good thing). But > the number of people that can technically review patches might be > burdened at times by the number of them. > > Ideally, I would think hive committers are going to be active (and > probably would have commit) on howl or is it going to be the burden of > howl track pig and hive until hive drops /metastore and begins using > howl? I am just curious about what you think the time line looks like > (IE how long howl will be in the incubator for) (rought guess of > course) I hope that Hive committers do become active in Howl, and we will be starting with Paul as a committer and John as a mentor. At least so far the Howl developers have taken up the burden of tracking the changes in Hive, since, as you mention, Hive committers are busy and Howl developers have had the motivation to get it done. As far as how long it will take, prognostication has never been my strength. But I would think it would take at least a year for Howl to mature to the point that Hive would be willing to trust it as its metastore or its development would slow to the point that it could pull the metastore code from Hive. Alan. > > Thank you, > Edward
-
Re: [VOTE] Sponsoring Howl as an Apache Incubator projectAshutosh Chauhan 2011-02-03, 16:43
+1
On Wed, Feb 2, 2011 at 13:18, Alan Gates <[EMAIL PROTECTED]> wrote: > Howl is a table management system built to provide metadata and storage > management across data processing tools in Hadoop (Pig, Hive, MapReduce, > ...). You can learn more details at http://wiki.apache.org/pig/Howl. For > the last six months the code has been hosted at github. The Howl team would > like to move the project into the Apache Incubator. You can see the > proposal for the project at http://wiki.apache.org/incubator/HowlProposal. > > In order to be accepted as an Incubator project Howl needs a Sponsoring > project. I propose that we, the Pig project, sponsor Howl. By sponsoring > Howl we are saying that we believe it is a good fit for the ASF and that we > will assist the Howl project to succeed. You can read full details of > sponsoring a project at > http://incubator.apache.org/incubation/Roles_and_Responsibilities.html#Sponsor. > > Our bylaws don't explicitly cover such a vote, but I think lazy majority > should be reasonable. All votes are welcome, PMC member votes will be > binding. > > Clearly I'm +1. > > Alan. >
-
Re: [VOTE] Sponsoring Howl as an Apache Incubator projectJay Booth 2011-02-03, 16:52
Food for thought, what if the metastore were moved to Howl more
aggressively? It seems like the end state everyone's aiming for is that Hive and Pig share Howl as a metastore layer, which makes all kinds of sense.. would it increase the chances of long term success if you guys just went for it and introduced the Hive->Howl dependency as soon as possible? It would probably create some short term disruption but it could be more healthy for Howl assuming that things were worked out, design choices could be validated faster, you have that end-to-end "it works" thing going, etc. On Thu, Feb 3, 2011 at 11:43 AM, Ashutosh Chauhan <[EMAIL PROTECTED]> wrote: > +1 > > On Wed, Feb 2, 2011 at 13:18, Alan Gates <[EMAIL PROTECTED]> wrote: >> Howl is a table management system built to provide metadata and storage >> management across data processing tools in Hadoop (Pig, Hive, MapReduce, >> ...). You can learn more details at http://wiki.apache.org/pig/Howl. For >> the last six months the code has been hosted at github. The Howl team would >> like to move the project into the Apache Incubator. You can see the >> proposal for the project at http://wiki.apache.org/incubator/HowlProposal. >> >> In order to be accepted as an Incubator project Howl needs a Sponsoring >> project. I propose that we, the Pig project, sponsor Howl. By sponsoring >> Howl we are saying that we believe it is a good fit for the ASF and that we >> will assist the Howl project to succeed. You can read full details of >> sponsoring a project at >> http://incubator.apache.org/incubation/Roles_and_Responsibilities.html#Sponsor. >> >> Our bylaws don't explicitly cover such a vote, but I think lazy majority >> should be reasonable. All votes are welcome, PMC member votes will be >> binding. >> >> Clearly I'm +1. >> >> Alan. >> >
-
Re: [VOTE] Sponsoring Howl as an Apache Incubator projectJohn Sichi 2011-02-03, 19:38
Besides the fact that the refactoring required is significant, I don't think this is possible to do quickly since:
1) Hive (unlike Pig) requires a metastore 2) Hive releases can't depend on an incubator project It's worth pointing out that Howl is already using Hive's CLI+DDL (not just metastore). That's a huge amount of code. In biological terms, Howl has the same DNA as Hive (plus some new Howl-specific genes on a separate plugin chromosome), but only a subset of the Hive genes are expressed when running Howl; the rest are just junk DNA from Howl's perspective. It's not clear yet that refactoring is worth the effort even in the end state. We can achieve the desired compatibility by keeping the current approach but removing the Hive code copy from Howl, instead creating a dependency from Howl to Hive. In this case, graduating to become a Hive subproject might be the correct exit from the incubator. If we do go ahead with pulling the metastore out of Hive, it might make most sense for Howl to become its own TLP rather than a subproject. In the incubator proposal, we have mentioned these issues, but we've attempted to avoid prejudicing any decision. Instead, we'd like to assess the pros and cons (including effort required and impact expected) for both approaches as part of the incubation process. I don't have any voting rights on Pig but obviously I'm +1 on the proposal for incubation. JVS On Feb 3, 2011, at 8:52 AM, Jay Booth wrote: > Food for thought, what if the metastore were moved to Howl more > aggressively? It seems like the end state everyone's aiming for is > that Hive and Pig share Howl as a metastore layer, which makes all > kinds of sense.. would it increase the chances of long term success > if you guys just went for it and introduced the Hive->Howl dependency > as soon as possible? It would probably create some short term > disruption but it could be more healthy for Howl assuming that things > were worked out, design choices could be validated faster, you have > that end-to-end "it works" thing going, etc. > > On Thu, Feb 3, 2011 at 11:43 AM, Ashutosh Chauhan <[EMAIL PROTECTED]> wrote: >> +1 >> >> On Wed, Feb 2, 2011 at 13:18, Alan Gates <[EMAIL PROTECTED]> wrote: >>> Howl is a table management system built to provide metadata and storage >>> management across data processing tools in Hadoop (Pig, Hive, MapReduce, >>> ...). You can learn more details at http://wiki.apache.org/pig/Howl. For >>> the last six months the code has been hosted at github. The Howl team would >>> like to move the project into the Apache Incubator. You can see the >>> proposal for the project at http://wiki.apache.org/incubator/HowlProposal. >>> >>> In order to be accepted as an Incubator project Howl needs a Sponsoring >>> project. I propose that we, the Pig project, sponsor Howl. By sponsoring >>> Howl we are saying that we believe it is a good fit for the ASF and that we >>> will assist the Howl project to succeed. You can read full details of >>> sponsoring a project at >>> http://incubator.apache.org/incubation/Roles_and_Responsibilities.html#Sponsor. >>> >>> Our bylaws don't explicitly cover such a vote, but I think lazy majority >>> should be reasonable. All votes are welcome, PMC member votes will be >>> binding. >>> >>> Clearly I'm +1. >>> >>> Alan. >>> >>
-
Re: [VOTE] Sponsoring Howl as an Apache Incubator projectJeff Hammerbacher 2011-02-03, 21:15
Hey,
> If we do go ahead with pulling the metastore out of Hive, it might make > most sense for Howl to become its own TLP rather than a subproject. > Yes, I did not read the proposal closely enough. I think an end state as a TLP makes more sense for Howl than as a Pig subproject. I'd really love to see Howl replace the metastore in Hive and it would be more natural to do so as a TLP than as a Pig subproject--especially since the current Howl repository is literally a fork of Hive. > In the incubator proposal, we have mentioned these issues, but we've > attempted to avoid prejudicing any decision. Instead, we'd like to assess > the pros and cons (including effort required and impact expected) for both > approaches as part of the incubation process. > Glad the issues are being considered. Later, Jeff
-
Re: [VOTE] Sponsoring Howl as an Apache Incubator projectMilind Bhandarkar 2011-02-03, 21:24
Alan,
1. Contribs being removed from hadoop is due to a. inactivity and b. test failures. Since Howl will be actively worked on, and will be well-tested as a production deployment, I am sure it will not be objected to. 2. That was when Yahoo! was producing it's own distribution, thus not having dependencies on apache releases. With the recent announcements, that would change, no ? - milind On Feb 2, 2011, at 8:58 PM, Alan Gates wrote: > I see a couple blockers that prevent this from being a contrib project of Hadoop: > > 1) The Hadoop project is actively trying to remove the contrib projects it has, see http://tinyurl.com/6yl25jz. I doubt it's interested in any new ones. > > 2) The Hadoop project is producing a release every 2 or 3 years currently. As a new project Howl will be wanting to release every 2 or 3 months for a while. Being tied to something as slow moving as Hadoop for releases would make it hard for Howl get releases out the door. > > Alan. > > On Feb 2, 2011, at 4:57 PM, Milind Bhandarkar wrote: > >> I feel that Howl should start as a contrib to Hadoop, and move to be a subproject of Hadoop once there is sufficient adoption, rather than going the incubator way. My reasons are as follows: >> >> 1. Howl is aimed at providing abstractions for facilitating interoperability between various systems built *on top of Hadoop*, and should not limit itself to Pig, Hive, and native MapReduce. So, any system that is hadoop compatible should be able to use Howl as a metadata store. >> >> 2. Having Howl as contrib of Hadoop will ensure that the input and output formats, compression codecs, underlying storage APIs etc remain in sync from release to release, and users do not have to worry about whether version x of Howl is compatible with version y of Hadoop or not. >> >> 3. Pig, Hive, Cascading, .. are all already dependent on Hadoop. Including Howl as Hadoop contrib means they do not add any more dependencies. >> >> 4. The roadmap of Howl includes authentication and authorization support. It is a standard industry practice that metadata security mechanisms match those for data security. Thus, a significant code can be shared with hadoop's authorization and authentication. >> >> 5. Hadoop-compatible file systems provide an abstraction over underlying storage systems. Howl currently provides a table abstraction over the file system. In future, when Hadoop provides blockpool abstraction (as part of federation), Howl will be able to take advantage of that and optimize. >> >> 6. Howl roadmap currently does not contain multi-tenancy features such as quotas. Since there is a strong correlation between number of tables, number of partitions in Howl and number of directories and files in HDFS, it could be streamlined if Howl is part of Hadoop. >> >> Thoughts ? >> >> - milind >> >> >> On Feb 2, 2011, at 1:18 PM, Alan Gates wrote: >> >>> Howl is a table management system built to provide metadata and storage management across data processing tools in Hadoop (Pig, Hive, MapReduce, ...). You can learn more details at http://wiki.apache.org/pig/Howl. For the last six months the code has been hosted at github. The Howl team would like to move the project into the Apache Incubator. You can see the proposal for the project at http://wiki.apache.org/incubator/HowlProposal. >>> >>> In order to be accepted as an Incubator project Howl needs a Sponsoring project. I propose that we, the Pig project, sponsor Howl. By sponsoring Howl we are saying that we believe it is a good fit for the ASF and that we will assist the Howl project to succeed. You can read full details of sponsoring a project at http://incubator.apache.org/incubation/Roles_and_Responsibilities.html#Sponsor. >>> >>> Our bylaws don't explicitly cover such a vote, but I think lazy majority should be reasonable. All votes are welcome, PMC member votes will be binding. >>> >>> Clearly I'm +1. >>> >>> Alan. >> >> --- >> Milind Bhandarkar >> [EMAIL PROTECTED] Milind Bhandarkar [EMAIL PROTECTED]
-
Re: [VOTE] Sponsoring Howl as an Apache Incubator projectyongqiang he 2011-02-03, 21:41
I am interested in some numbers around the lines of code changes (or
files of changes) which are in Howl but not in Hive? Can anyone give some information here? Thanks Yongqiang On Thu, Feb 3, 2011 at 1:15 PM, Jeff Hammerbacher <[EMAIL PROTECTED]> wrote: > Hey, > >> >> If we do go ahead with pulling the metastore out of Hive, it might make >> most sense for Howl to become its own TLP rather than a subproject. > > Yes, I did not read the proposal closely enough. I think an end state as a > TLP makes more sense for Howl than as a Pig subproject. I'd really love to > see Howl replace the metastore in Hive and it would be more natural to do so > as a TLP than as a Pig subproject--especially since the current Howl > repository is literally a fork of Hive. > >> >> In the incubator proposal, we have mentioned these issues, but we've >> attempted to avoid prejudicing any decision. Instead, we'd like to assess >> the pros and cons (including effort required and impact expected) for both >> approaches as part of the incubation process. > > Glad the issues are being considered. > Later, > Jeff
-
Re: [VOTE] Sponsoring Howl as an Apache Incubator projectAshutosh Chauhan 2011-02-03, 21:49
There are none as of today. In the past, whenever we had to have
changes, we do it in a separate branch in Howl and once those get committed to hive repo, we pull it over in our trunk and drop the branch. Ashutosh On Thu, Feb 3, 2011 at 13:41, yongqiang he <[EMAIL PROTECTED]> wrote: > I am interested in some numbers around the lines of code changes (or > files of changes) which are in Howl but not in Hive? > Can anyone give some information here? > > Thanks > Yongqiang > On Thu, Feb 3, 2011 at 1:15 PM, Jeff Hammerbacher <[EMAIL PROTECTED]> wrote: >> Hey, >> >>> >>> If we do go ahead with pulling the metastore out of Hive, it might make >>> most sense for Howl to become its own TLP rather than a subproject. >> >> Yes, I did not read the proposal closely enough. I think an end state as a >> TLP makes more sense for Howl than as a Pig subproject. I'd really love to >> see Howl replace the metastore in Hive and it would be more natural to do so >> as a TLP than as a Pig subproject--especially since the current Howl >> repository is literally a fork of Hive. >> >>> >>> In the incubator proposal, we have mentioned these issues, but we've >>> attempted to avoid prejudicing any decision. Instead, we'd like to assess >>> the pros and cons (including effort required and impact expected) for both >>> approaches as part of the incubation process. >> >> Glad the issues are being considered. >> Later, >> Jeff >
-
Re: [VOTE] Sponsoring Howl as an Apache Incubator projectJohn Sichi 2011-02-03, 22:49
But Howl does layer on some additional code, right?
https://github.com/yahoo/howl/tree/howl/howl JVS On Feb 3, 2011, at 1:49 PM, Ashutosh Chauhan wrote: > There are none as of today. In the past, whenever we had to have > changes, we do it in a separate branch in Howl and once those get > committed to hive repo, we pull it over in our trunk and drop the > branch. > > Ashutosh > On Thu, Feb 3, 2011 at 13:41, yongqiang he <[EMAIL PROTECTED]> wrote: >> I am interested in some numbers around the lines of code changes (or >> files of changes) which are in Howl but not in Hive? >> Can anyone give some information here? >> >> Thanks >> Yongqiang >> On Thu, Feb 3, 2011 at 1:15 PM, Jeff Hammerbacher <[EMAIL PROTECTED]> wrote: >>> Hey, >>> >>>> >>>> If we do go ahead with pulling the metastore out of Hive, it might make >>>> most sense for Howl to become its own TLP rather than a subproject. >>> >>> Yes, I did not read the proposal closely enough. I think an end state as a >>> TLP makes more sense for Howl than as a Pig subproject. I'd really love to >>> see Howl replace the metastore in Hive and it would be more natural to do so >>> as a TLP than as a Pig subproject--especially since the current Howl >>> repository is literally a fork of Hive. >>> >>>> >>>> In the incubator proposal, we have mentioned these issues, but we've >>>> attempted to avoid prejudicing any decision. Instead, we'd like to assess >>>> the pros and cons (including effort required and impact expected) for both >>>> approaches as part of the incubation process. >>> >>> Glad the issues are being considered. >>> Later, >>> Jeff >>
-
Re: [VOTE] Sponsoring Howl as an Apache Incubator projectAshutosh Chauhan 2011-02-03, 22:58
What I am referring to is metastore/ dir of hive, part of hive code
which howl cares about most. Other howl code is for additional functionalities that Howl provides (none of which lives in metastore/ dir) they are in howl/ dir. There are few build file changes, but they are trivial. Ashutosh On Thu, Feb 3, 2011 at 14:49, John Sichi <[EMAIL PROTECTED]> wrote: > But Howl does layer on some additional code, right? > > https://github.com/yahoo/howl/tree/howl/howl > > JVS > > On Feb 3, 2011, at 1:49 PM, Ashutosh Chauhan wrote: > >> There are none as of today. In the past, whenever we had to have >> changes, we do it in a separate branch in Howl and once those get >> committed to hive repo, we pull it over in our trunk and drop the >> branch. >> >> Ashutosh >> On Thu, Feb 3, 2011 at 13:41, yongqiang he <[EMAIL PROTECTED]> wrote: >>> I am interested in some numbers around the lines of code changes (or >>> files of changes) which are in Howl but not in Hive? >>> Can anyone give some information here? >>> >>> Thanks >>> Yongqiang >>> On Thu, Feb 3, 2011 at 1:15 PM, Jeff Hammerbacher <[EMAIL PROTECTED]> wrote: >>>> Hey, >>>> >>>>> >>>>> If we do go ahead with pulling the metastore out of Hive, it might make >>>>> most sense for Howl to become its own TLP rather than a subproject. >>>> >>>> Yes, I did not read the proposal closely enough. I think an end state as a >>>> TLP makes more sense for Howl than as a Pig subproject. I'd really love to >>>> see Howl replace the metastore in Hive and it would be more natural to do so >>>> as a TLP than as a Pig subproject--especially since the current Howl >>>> repository is literally a fork of Hive. >>>> >>>>> >>>>> In the incubator proposal, we have mentioned these issues, but we've >>>>> attempted to avoid prejudicing any decision. Instead, we'd like to assess >>>>> the pros and cons (including effort required and impact expected) for both >>>>> approaches as part of the incubation process. >>>> >>>> Glad the issues are being considered. >>>> Later, >>>> Jeff >>> > >
-
Re: [VOTE] Sponsoring Howl as an Apache Incubator projectAlan Gates 2011-02-03, 23:11
Yes, it adds Input and Output formats for MapReduce and load and store
functions for Pig. In the future it we expect it will continue to add more additional layers. Alan. On Feb 3, 2011, at 2:49 PM, John Sichi wrote: > But Howl does layer on some additional code, right? > > https://github.com/yahoo/howl/tree/howl/howl > > JVS > > On Feb 3, 2011, at 1:49 PM, Ashutosh Chauhan wrote: > >> There are none as of today. In the past, whenever we had to have >> changes, we do it in a separate branch in Howl and once those get >> committed to hive repo, we pull it over in our trunk and drop the >> branch. >> >> Ashutosh >> On Thu, Feb 3, 2011 at 13:41, yongqiang he >> <[EMAIL PROTECTED]> wrote: >>> I am interested in some numbers around the lines of code changes (or >>> files of changes) which are in Howl but not in Hive? >>> Can anyone give some information here? >>> >>> Thanks >>> Yongqiang >>> On Thu, Feb 3, 2011 at 1:15 PM, Jeff Hammerbacher <[EMAIL PROTECTED] >>> > wrote: >>>> Hey, >>>> >>>>> >>>>> If we do go ahead with pulling the metastore out of Hive, it >>>>> might make >>>>> most sense for Howl to become its own TLP rather than a >>>>> subproject. >>>> >>>> Yes, I did not read the proposal closely enough. I think an end >>>> state as a >>>> TLP makes more sense for Howl than as a Pig subproject. I'd >>>> really love to >>>> see Howl replace the metastore in Hive and it would be more >>>> natural to do so >>>> as a TLP than as a Pig subproject--especially since the current >>>> Howl >>>> repository is literally a fork of Hive. >>>> >>>>> >>>>> In the incubator proposal, we have mentioned these issues, but >>>>> we've >>>>> attempted to avoid prejudicing any decision. Instead, we'd like >>>>> to assess >>>>> the pros and cons (including effort required and impact >>>>> expected) for both >>>>> approaches as part of the incubation process. >>>> >>>> Glad the issues are being considered. >>>> Later, >>>> Jeff >>> >
-
Re: [VOTE] Sponsoring Howl as an Apache Incubator projectAlex Boisvert 2011-02-03, 23:29
On Thu, Feb 3, 2011 at 11:38 AM, John Sichi <[EMAIL PROTECTED]> wrote:
> Besides the fact that the refactoring required is significant, I don't > think this is possible to do quickly since: > > 1) Hive (unlike Pig) requires a metastore > > 2) Hive releases can't depend on an incubator project > I'm not sure what you mean by "can't depend on an incubator project" here. AFAIK, there is no policy at Apache that projects should not depend on incubator projects. Can you clarify what you mean and why you think such a restriction exists? alex
-
Re: [VOTE] Sponsoring Howl as an Apache Incubator projectJohn Sichi 2011-02-04, 00:07
I was going off of what I read in HADOOP-3676 (which lacks a reference as well). But I guess if a release can be made from the incubator, then it's not a blocker.
JVS On Feb 3, 2011, at 3:29 PM, Alex Boisvert wrote: > On Thu, Feb 3, 2011 at 11:38 AM, John Sichi <[EMAIL PROTECTED]> wrote: > Besides the fact that the refactoring required is significant, I don't think this is possible to do quickly since: > > 1) Hive (unlike Pig) requires a metastore > > 2) Hive releases can't depend on an incubator project > > I'm not sure what you mean by "can't depend on an incubator project" here. AFAIK, there is no policy at Apache that projects should not depend on incubator projects. Can you clarify what you mean and why you think such a restriction exists? > > alex >
-
Re: [VOTE] Sponsoring Howl as an Apache Incubator projectJohn Sichi 2011-02-04, 00:30
I forgot about the serde dependencies...can you add those to the Initial Source note in [[HowlProposal]] just for completeness?
JVS On Feb 3, 2011, at 3:11 PM, Alan Gates wrote: > Yes, it adds Input and Output formats for MapReduce and load and store functions for Pig. In the future it we expect it will continue to add more additional layers. > > Alan. > > On Feb 3, 2011, at 2:49 PM, John Sichi wrote: > >> But Howl does layer on some additional code, right? >> >> https://github.com/yahoo/howl/tree/howl/howl >> >> JVS >> >> On Feb 3, 2011, at 1:49 PM, Ashutosh Chauhan wrote: >> >>> There are none as of today. In the past, whenever we had to have >>> changes, we do it in a separate branch in Howl and once those get >>> committed to hive repo, we pull it over in our trunk and drop the >>> branch. >>> >>> Ashutosh >>> On Thu, Feb 3, 2011 at 13:41, yongqiang he <[EMAIL PROTECTED]> wrote: >>>> I am interested in some numbers around the lines of code changes (or >>>> files of changes) which are in Howl but not in Hive? >>>> Can anyone give some information here? >>>> >>>> Thanks >>>> Yongqiang >>>> On Thu, Feb 3, 2011 at 1:15 PM, Jeff Hammerbacher <[EMAIL PROTECTED]> wrote: >>>>> Hey, >>>>> >>>>>> >>>>>> If we do go ahead with pulling the metastore out of Hive, it might make >>>>>> most sense for Howl to become its own TLP rather than a subproject. >>>>> >>>>> Yes, I did not read the proposal closely enough. I think an end state as a >>>>> TLP makes more sense for Howl than as a Pig subproject. I'd really love to >>>>> see Howl replace the metastore in Hive and it would be more natural to do so >>>>> as a TLP than as a Pig subproject--especially since the current Howl >>>>> repository is literally a fork of Hive. >>>>> >>>>>> >>>>>> In the incubator proposal, we have mentioned these issues, but we've >>>>>> attempted to avoid prejudicing any decision. Instead, we'd like to assess >>>>>> the pros and cons (including effort required and impact expected) for both >>>>>> approaches as part of the incubation process. >>>>> >>>>> Glad the issues are being considered. >>>>> Later, >>>>> Jeff >>>> >> >
-
Re: [VOTE] Sponsoring Howl as an Apache Incubator projectAlex Boisvert 2011-02-04, 00:56
Hi John,
Just to clarify where I was going with my line of questioning. There's no Apache policy that prevents dependencies on incubator project, whether it's releases, snapshots or even home-made hacked-together packaging of an incubator project. It's been done before and as long as the incubator code's IP has been cleared and the packaging isn't represented as an official release if it isn't so, there's no wrong in doing that. Now, whether the project choses to use and release with an incubator dependency is a matter of judgment (and ultimately a vote by committers if there is no consensus). I just wanted to make sure there were no incorrect assumptions made. alex On Thu, Feb 3, 2011 at 4:07 PM, John Sichi <[EMAIL PROTECTED]> wrote: > I was going off of what I read in HADOOP-3676 (which lacks a reference as > well). But I guess if a release can be made from the incubator, then it's > not a blocker. > > JVS > > On Feb 3, 2011, at 3:29 PM, Alex Boisvert wrote: > > > On Thu, Feb 3, 2011 at 11:38 AM, John Sichi <[EMAIL PROTECTED]> wrote: > > Besides the fact that the refactoring required is significant, I don't > think this is possible to do quickly since: > > > > 1) Hive (unlike Pig) requires a metastore > > > > 2) Hive releases can't depend on an incubator project > > > > I'm not sure what you mean by "can't depend on an incubator project" > here. AFAIK, there is no policy at Apache that projects should not depend > on incubator projects. Can you clarify what you mean and why you think such > a restriction exists? > > > > alex > > > >
-
Re: [VOTE] Sponsoring Howl as an Apache Incubator projectAlan Gates 2011-02-04, 01:09
Are you referring to the serde jar or any particular serde's we are
making use of? Alan. On Feb 3, 2011, at 4:30 PM, John Sichi wrote: > I forgot about the serde dependencies...can you add those to the > Initial Source note in [[HowlProposal]] just for completeness? > > JVS > > On Feb 3, 2011, at 3:11 PM, Alan Gates wrote: > >> Yes, it adds Input and Output formats for MapReduce and load and >> store functions for Pig. In the future it we expect it will >> continue to add more additional layers. >> >> Alan. >> >> On Feb 3, 2011, at 2:49 PM, John Sichi wrote: >> >>> But Howl does layer on some additional code, right? >>> >>> https://github.com/yahoo/howl/tree/howl/howl >>> >>> JVS >>> >>> On Feb 3, 2011, at 1:49 PM, Ashutosh Chauhan wrote: >>> >>>> There are none as of today. In the past, whenever we had to have >>>> changes, we do it in a separate branch in Howl and once those get >>>> committed to hive repo, we pull it over in our trunk and drop the >>>> branch. >>>> >>>> Ashutosh >>>> On Thu, Feb 3, 2011 at 13:41, yongqiang he <[EMAIL PROTECTED] >>>> > wrote: >>>>> I am interested in some numbers around the lines of code changes >>>>> (or >>>>> files of changes) which are in Howl but not in Hive? >>>>> Can anyone give some information here? >>>>> >>>>> Thanks >>>>> Yongqiang >>>>> On Thu, Feb 3, 2011 at 1:15 PM, Jeff Hammerbacher <[EMAIL PROTECTED] >>>>> > wrote: >>>>>> Hey, >>>>>> >>>>>>> >>>>>>> If we do go ahead with pulling the metastore out of Hive, it >>>>>>> might make >>>>>>> most sense for Howl to become its own TLP rather than a >>>>>>> subproject. >>>>>> >>>>>> Yes, I did not read the proposal closely enough. I think an end >>>>>> state as a >>>>>> TLP makes more sense for Howl than as a Pig subproject. I'd >>>>>> really love to >>>>>> see Howl replace the metastore in Hive and it would be more >>>>>> natural to do so >>>>>> as a TLP than as a Pig subproject--especially since the current >>>>>> Howl >>>>>> repository is literally a fork of Hive. >>>>>> >>>>>>> >>>>>>> In the incubator proposal, we have mentioned these issues, but >>>>>>> we've >>>>>>> attempted to avoid prejudicing any decision. Instead, we'd >>>>>>> like to assess >>>>>>> the pros and cons (including effort required and impact >>>>>>> expected) for both >>>>>>> approaches as part of the incubation process. >>>>>> >>>>>> Glad the issues are being considered. >>>>>> Later, >>>>>> Jeff >>>>> >>> >> >
-
Re: [VOTE] Sponsoring Howl as an Apache Incubator projectJohn Sichi 2011-02-04, 02:00
On Feb 3, 2011, at 5:09 PM, Alan Gates wrote:
> Are you referring to the serde jar or any particular serde's we are making use of? Both (see below). JVS ---- [jsichi@dev1066 ~/open/howl/howl/howl/src/java/org/apache/hadoop/hive/howl] ls cli/ common/ data/ mapreduce/ pig/ rcfile/ [jsichi@dev1066 ~/open/howl/howl/howl/src/java/org/apache/hadoop/hive/howl] grep serde */* common/HowlUtil.java:import org.apache.hadoop.hive.serde2.typeinfo.TypeInfo; common/HowlUtil.java:import org.apache.hadoop.hive.serde2.typeinfo.TypeInfoUtils; rcfile/RCFileInputDriver.java:import org.apache.hadoop.hive.serde.Constants; rcfile/RCFileInputDriver.java:import org.apache.hadoop.hive.serde2.ColumnProjectionUtils; rcfile/RCFileInputDriver.java:import org.apache.hadoop.hive.serde2.SerDe; rcfile/RCFileInputDriver.java:import org.apache.hadoop.hive.serde2.SerDeException; rcfile/RCFileInputDriver.java:import org.apache.hadoop.hive.serde2.columnar.BytesRefArrayWritable; rcfile/RCFileInputDriver.java:import org.apache.hadoop.hive.serde2.columnar.ColumnarSerDe; rcfile/RCFileInputDriver.java:import org.apache.hadoop.hive.serde2.columnar.ColumnarStruct; rcfile/RCFileInputDriver.java:import org.apache.hadoop.hive.serde2.objectinspector.ListObjectInspector; rcfile/RCFileInputDriver.java:import org.apache.hadoop.hive.serde2.objectinspector.MapObjectInspector; rcfile/RCFileInputDriver.java:import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspector; rcfile/RCFileInputDriver.java:import org.apache.hadoop.hive.serde2.objectinspector.PrimitiveObjectInspector; rcfile/RCFileInputDriver.java:import org.apache.hadoop.hive.serde2.objectinspector.StructField; rcfile/RCFileInputDriver.java:import org.apache.hadoop.hive.serde2.objectinspector.StructObjectInspector; rcfile/RCFileInputDriver.java: private SerDe serde; rcfile/RCFileInputDriver.java: struct = (ColumnarStruct)serde.deserialize(bytesRefArray); rcfile/RCFileInputDriver.java: serde = new ColumnarSerDe(); rcfile/RCFileInputDriver.java: serde.initialize(context.getConfiguration(), howlProperties); rcfile/RCFileInputDriver.java: oi = (StructObjectInspector) serde.getObjectInspector(); rcfile/RCFileMapReduceInputFormat.java:import org.apache.hadoop.hive.serde2.columnar.BytesRefArrayWritable; rcfile/RCFileMapReduceOutputFormat.java:import org.apache.hadoop.hive.serde2.columnar.BytesRefArrayWritable; rcfile/RCFileMapReduceRecordReader.java:import org.apache.hadoop.hive.serde2.columnar.BytesRefArrayWritable; rcfile/RCFileOutputDriver.java:import org.apache.hadoop.hive.serde.Constants; rcfile/RCFileOutputDriver.java:import org.apache.hadoop.hive.serde2.SerDe; rcfile/RCFileOutputDriver.java:import org.apache.hadoop.hive.serde2.SerDeException; rcfile/RCFileOutputDriver.java:import org.apache.hadoop.hive.serde2.columnar.ColumnarSerDe; rcfile/RCFileOutputDriver.java:import org.apache.hadoop.hive.serde2.objectinspector.ListObjectInspector; rcfile/RCFileOutputDriver.java:import org.apache.hadoop.hive.serde2.objectinspector.MapObjectInspector; rcfile/RCFileOutputDriver.java:import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspector; rcfile/RCFileOutputDriver.java:import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorFactory; rcfile/RCFileOutputDriver.java:import org.apache.hadoop.hive.serde2.objectinspector.StructObjectInspector; rcfile/RCFileOutputDriver.java:import org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorFactory; rcfile/RCFileOutputDriver.java:import org.apache.hadoop.hive.serde2.typeinfo.ListTypeInfo; rcfile/RCFileOutputDriver.java:import org.apache.hadoop.hive.serde2.typeinfo.MapTypeInfo; rcfile/RCFileOutputDriver.java:import org.apache.hadoop.hive.serde2.typeinfo.PrimitiveTypeInfo; rcfile/RCFileOutputDriver.java:import org.apache.hadoop.hive.serde2.typeinfo.StructTypeInfo; rcfile/RCFileOutputDriver.java:import org.apache.hadoop.hive.serde2.typeinfo.TypeInfo; rcfile/RCFileOutputDriver.java:import org.apache.hadoop.hive.serde2.typeinfo.TypeInfoUtils; rcfile/RCFileOutputDriver.java: /** The serde for serializing the HowlRecord to bytes writable */ rcfile/RCFileOutputDriver.java: private SerDe serde; rcfile/RCFileOutputDriver.java: return serde.serialize(value.getAll(), objectInspector); rcfile/RCFileOutputDriver.java: serde = new ColumnarSerDe(); rcfile/RCFileOutputDriver.java: serde.initialize(context.getConfiguration(), howlProperties); Howl, howl, howl, howl! O! you are men of stones: Had I your tongues and eyes, I'd use them so That heaven's vaults should crack
-
Re: [VOTE] Sponsoring Howl as an Apache Incubator projectJohn Sichi 2011-02-04, 02:07
Got it, thanks for the correction.
JVS On Feb 3, 2011, at 4:56 PM, Alex Boisvert wrote: > Hi John, > > Just to clarify where I was going with my line of questioning. There's no Apache policy that prevents dependencies on incubator project, whether it's releases, snapshots or even home-made hacked-together packaging of an incubator project. It's been done before and as long as the incubator code's IP has been cleared and the packaging isn't represented as an official release if it isn't so, there's no wrong in doing that. > > Now, whether the project choses to use and release with an incubator dependency is a matter of judgment (and ultimately a vote by committers if there is no consensus). I just wanted to make sure there were no incorrect assumptions made. > > alex > > > On Thu, Feb 3, 2011 at 4:07 PM, John Sichi <[EMAIL PROTECTED]> wrote: > I was going off of what I read in HADOOP-3676 (which lacks a reference as well). But I guess if a release can be made from the incubator, then it's not a blocker. > > JVS > > On Feb 3, 2011, at 3:29 PM, Alex Boisvert wrote: > > > On Thu, Feb 3, 2011 at 11:38 AM, John Sichi <[EMAIL PROTECTED]> wrote: > > Besides the fact that the refactoring required is significant, I don't think this is possible to do quickly since: > > > > 1) Hive (unlike Pig) requires a metastore > > > > 2) Hive releases can't depend on an incubator project > > > > I'm not sure what you mean by "can't depend on an incubator project" here. AFAIK, there is no policy at Apache that projects should not depend on incubator projects. Can you clarify what you mean and why you think such a restriction exists? > > > > alex > > > >
-
Re: [VOTE] Sponsoring Howl as an Apache Incubator projectAlan Gates 2011-02-08, 18:10
With 8 +1 votes and no -1s, the vote passes.
Alan. On Feb 2, 2011, at 1:18 PM, Alan Gates wrote: > Howl is a table management system built to provide metadata and > storage management across data processing tools in Hadoop (Pig, Hive, > MapReduce, ...). You can learn more details at http://wiki.apache.org/pig/Howl > . For the last six months the code has been hosted at github. The > Howl team would like to move the project into the Apache Incubator. > You can see the proposal for the project at http://wiki.apache.org/incubator/HowlProposal > . > > In order to be accepted as an Incubator project Howl needs a > Sponsoring project. I propose that we, the Pig project, sponsor > Howl. By sponsoring Howl we are saying that we believe it is a good > fit for the ASF and that we will assist the Howl project to succeed. > You can read full details of sponsoring a project at http://incubator.apache.org/incubation/Roles_and_Responsibilities.html#Sponsor > . > > Our bylaws don't explicitly cover such a vote, but I think lazy > majority should be reasonable. All votes are welcome, PMC member > votes will be binding. > > Clearly I'm +1. > > Alan. |