All sounds reasonable thanks for explaining the thought process.
On Aug 29, 2010, at 3:11 PM, Dmitriy Ryaboy wrote:
> Hi folks,
> I'll try to address both Corbin's and Milind's questions. This is just my
> opinion, I'm open to criticism/suggestions/corrections.
> There are several barriers that are being removed.
> First, piggybank will no longer be bound to the pig release schedule. At the
> moment, I am not sure there will be "releases" of piggybank, as such -- we
> might just tag snapshots with their own git branches and move on. This
> allows the code to develop at a much faster pace, while possibly sacrificing
> some of the stability and permanence of Apache-style releases. I feel that
> this is ok, as piggybank was always subject to less stringent testing, and
> the attitude towards it has long been "it might work, and you might have to
> tweak it if it doesn't".
> Second, moving to github makes it easy for people to cook their own versions
> of piggybank if they want to -- they just have to fork the main master, and
> apply changes as needed. The committers can pull in all, or some, of the
> changes, if they are desirable. This puts such mutations in the public view,
> as opposed to what's happening now, where they either don't happen, or
> happen on people's unseen svn exports.
> Third, this allows contributions to piggybank for older version of pig. At
> the moment, for example, there isn't really a way to contribute a Pig 0.6
> loader -- the current svn trunk is on the new API, so such contributions
> won't compile. Something could be contributed for a 0.6 branch, but that
> won't see the light of day unless Pig team decides to do a 0.6.1 release,
> which is highly unlikely and kind of a maintenance nightmare. This is why,
> for example, my HBase loader changes wound up in Elephant-Bird instead of
> Pig proper -- I didn't have a good way of getting them out there otherwise.
> On github, we will be able to just keep a 0.6 branch that folks using that
> version can keep moving.
> Bottom line is that we are sacrificing the benefits of a stately, strict
> Apache workflow in order to gain agility and decrease barriers to
> contribution. I personally feel that this is ok because piggybank is not so
> much a software project as a collection of individual, distinct libraries.
> It's kind of the CPAN of Pig, and no one versions all modules of CPAN in one
> go -- the whole thing would get bogged down if that were to happen. Granted,
> cpan lets you pull down specific versions of individual modules, and this
> doesn't.. but let's take it one step at a time.
> I think the bit about Hive interoperation might be a bit overstated. The
> observation was just that Hive has the same problem with user-defined
> functions, and some common code might be reused since the two projects are
> often used to achieve similar goals. So if the Hive people wanted to
> collaborate on the common bits, and put their udfs into /hive while we put
> ours into /pig, we agreed that would be a good thing. There is no intent, at
> the moment, to build some generic udf interface that would allow one to
> write udfs for both hive and pig at once. Though that would be cool.
> On Sat, Aug 28, 2010 at 11:39 AM, Milind A Bhandarkar <[EMAIL PROTECTED]
>> +1 on the direction.
>> A few questions:
>> 1. With Pig marching towards becoming a TLP at Apache, can Piggybank become
>> a full-fledged subproject (with it's own releases and all) ?
>> 2. Or since the ultimate goal is to have a common UDF repository for both
>> Pig and Hive, t would make sense to make it into an incubator project, with
>> a name that does not indicate pig dependency?
>> 3. I see parallels between Howl and proposed Piggybank, since they aspire
>> to become common components in both Hive and Pig distributions. What are
>> long term plans for Howl as far as hosting is concerned ?
>> - Milind