-Re: Pig Meetup Notes
Alan Gates 2012-06-15, 16:23
Thanks Russell. I move we make you the official Apache Pig secretary. :)
On Jun 12, 2012, at 9:45 PM, Russell Jurney wrote:
> Tuesday, Pig Meetup
> Alan Gates - upcoming improvements in operators/backend physical plan.
> Reworking UDF interface, keep backward compatibility.
> Hadoop 2 coming, will be slow adoption.
> Bill Graham, Julien & Twitter - Optimization oriented. Cluster is at
> capacity. Detect skew, cost based optimizers, dynamic tuning. Gathering
> performance metrics, will be in HCatalog. Look at previous executions of
> same job to optimize on the fly.
> Companies: Yahoo, consultants, salesforce, twitter, hortonworks, cloudera,
> zocalo systems?, trend micro
> Bill presented Ambrose. Motivation: 40MR job pig scripts, added DAG view.
> Shows you progress of your script as percentage and stepwise view. Helps
> with debug, optimization. Major progress.
> Pig users talk - using pig in local mode on sample, then pushing to
> cluster. Using illustrate to cut developer iterations. No counters in local
> mode. Embedded pig in loops for ML. Java embedding.
> Java API PigServer to run scripts from apps. Macros are helping remove ugly
> blocks of code, but UDFs are more solved by JRuby. Mortar data fixed Python
> Reducing friction around using Pig with tools is important. Slowness of
> batch is hard for new users. Sample is hard to prepare that will do joins.
> Illustrate was invented for this purpose.
> Scheduling pig jobs is still a problem. Oozie is unpopular and too hard.
> Azkaban is inadequate for the enterprise. People hack things together. It
> HCatalog is maturing. Rest API. Hive and Pig together. Rest interface is
> for metadata so far. People are wanting to extend it to grab UDFs, etc.
> Russell Jurney http://datasyndrome.com