-Re: Pig Meetup Notes
Russell Jurney 2012-06-16, 02:10
Is that a PMC position? I also do AV and can bounce #credentials :D
On Jun 15, 2012, at 3:35 PM, Jonathan Coveney <[EMAIL PROTECTED]> wrote:
> 2012/6/15 Alan Gates <[EMAIL PROTECTED]>
>> Thanks Russell. I move we make you the official Apache Pig secretary. :)
>> On Jun 12, 2012, at 9:45 PM, Russell Jurney wrote:
>>> Tuesday, Pig Meetup
>>> Alan Gates - upcoming improvements in operators/backend physical plan.
>>> Reworking UDF interface, keep backward compatibility.
>>> Hadoop 2 coming, will be slow adoption.
>>> Bill Graham, Julien & Twitter - Optimization oriented. Cluster is at
>>> capacity. Detect skew, cost based optimizers, dynamic tuning. Gathering
>>> performance metrics, will be in HCatalog. Look at previous executions of
>>> same job to optimize on the fly.
>>> Companies: Yahoo, consultants, salesforce, twitter, hortonworks,
>>> zocalo systems?, trend micro
>>> Bill presented Ambrose. Motivation: 40MR job pig scripts, added DAG view.
>>> Shows you progress of your script as percentage and stepwise view. Helps
>>> with debug, optimization. Major progress.
>>> Pig users talk - using pig in local mode on sample, then pushing to
>>> cluster. Using illustrate to cut developer iterations. No counters in
>>> mode. Embedded pig in loops for ML. Java embedding.
>>> Java API PigServer to run scripts from apps. Macros are helping remove
>>> blocks of code, but UDFs are more solved by JRuby. Mortar data fixed
>>> Reducing friction around using Pig with tools is important. Slowness of
>>> batch is hard for new users. Sample is hard to prepare that will do
>>> Illustrate was invented for this purpose.
>>> Scheduling pig jobs is still a problem. Oozie is unpopular and too hard.
>>> Azkaban is inadequate for the enterprise. People hack things together. It
>>> HCatalog is maturing. Rest API. Hive and Pig together. Rest interface is
>>> for metadata so far. People are wanting to extend it to grab UDFs, etc.
>>> Russell Jurney http://datasyndrome.com