-Re: Notes of interest from Apache Pig Hackday, Austin edition
Jeremy Hanna 2012-05-12, 21:25
On May 12, 2012, at 3:42 PM, Jonathan Coveney wrote:
> Wow, that writeup was awesome! And your hackday was really well attended!
> Was it all dachis group people, or did it include people from other Austin
> tech companies?
Thanks for all the help Jonathan. We had about 10 of the thirty people from the Dachis Group. Other companies represented were vast.com and truecar.com which each had several employees there, Bioware, Dell, HP, Freescale Semiconductor, Spredfast.com, PayPal, and the University of Texas.
> I think in the future it'd be nice to figure out how to sync with the
> remote hackers... I think given that the Austin hack day was more about
> usage that the format was ok for this one, but as you guys get ramped up,
> it'd be great to collaborate more directly! And my offer to be flown out to
> Austin for a hack day still stands ;)
Yep - would be great to do this kind of thing again.
> Jeremy is right in that on our end it was more about just crunching through
> some tickets, but we did help some users get ramped up with Pig (getting
> pig into eclipse, using git to fork the pig project on github), and we also
> had some good chats about higher level issues with Pig or the ecosystem.
> I personally came away with some projects that could be of varying interest
> for Pig...
> 1. Pull out the logical planner from Pig in such a way that it can target a
> generic physical plan (or something like cascading), and so that other
> projects (hive, scalding) can target it. This is something that people have
> wanted for a long time, but is pretty nontrivial to design. As the
> ecosystem of tools gets more sophisticated, though, the need is really
> really growing...we're getting to the point where there are some pretty
> sophisticated optimizations that could be put into Hive, Pig, etc and the
> duplication of labor is getting very expensive.
> 2. Daniel and I chatted about a possible way to chain operators to save on
> namespace. IE instead of doing
> A = load 'thing' as (x:int, y:int);
> B = group A by x;
> C = foreach B generate group, COUNT(A), SUM(A.y);
> D = FILTER C by $1 > 5, $2 > 7;
> you could do
> A = load 'thing' as (x:int, y:int);
> => group _ by x;
> => foreach _ generate group, COUNT(_), SUM(_.y);
> => FILTER _ by $1 > 5 and $2 > 7;
> and now A would be the result of the entire chain.
> 3. Everyone agrees that EvalFuncs need a major overhaul, and someone should
> move to submit a proposal because right now it's just sort of languishing.
> 4. ONERROR would be a real coup for pig...there's a spec, someone just
> needs to do the work!
> And then there are various and sundry things that I would like to
> do...finish up SchemaTuple, move on to SchemaBag, and so on.
> 2012/5/12 Jagat <[EMAIL PROTECTED]>
>> Wow Jeremy ,
>> Thanks for detailed coverage. Seems you guys did lots of good work along
>> with fun.
>> Sent from Mobile , short and crisp.
>> On 12-May-2012 11:53 PM, "Jeremy Hanna" <[EMAIL PROTECTED]>
>>> Thanks again to Twitter for doing their event and inspiring ours. I just
>>> wanted to report on some things we did in Austin for any interested. We
>>> had a good turnout of about 30 people.
>>> Kevin Safford presented an introduction to Pig, or Pig 101. The slides
>>> are available here:
>>> Timothy Potter down from Colorado gave a presentation on intermediate
>>> or Pig 202. His slides are available here:
>>> Clint Miller gave an introduction to unit testing with Pig with these
>>> slides: http://www.slideshare.net/clintmiller1/unit-testing-pig
>>> After that we had some lunch and linked up remotely for a bit to the
>>> Twitter hackday in the Bay Area. Their group is mostly Pig committers
>>> contributors so they worked on Pig tickets. One thing that Twitter