Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> Pig Meetup Notes


Copy link to this message
-
Re: Pig Meetup Notes
Thanks Russell.  I move we make you the official Apache Pig secretary. :)

Alan.

On Jun 12, 2012, at 9:45 PM, Russell Jurney wrote:

> Tuesday, Pig Meetup
>
> Alan Gates - upcoming improvements in operators/backend physical plan.
> Desphagetification.
> Reworking UDF interface, keep backward compatibility.
> Hadoop 2 coming, will be slow adoption.
>
> Bill Graham, Julien & Twitter - Optimization oriented. Cluster is at
> capacity. Detect skew, cost based optimizers, dynamic tuning. Gathering
> performance metrics, will be in HCatalog. Look at previous executions of
> same job to optimize on the fly.
>
> Companies: Yahoo, consultants, salesforce, twitter, hortonworks, cloudera,
> zocalo systems?, trend micro
>
> Bill presented Ambrose. Motivation: 40MR job pig scripts, added DAG view.
> Shows you progress of your script as percentage and stepwise view. Helps
> with debug, optimization. Major progress.
>
> Pig users talk - using pig in local mode on sample, then pushing to
> cluster. Using illustrate to cut developer iterations. No counters in local
> mode. Embedded pig in loops for ML. Java embedding.
> Java API PigServer to run scripts from apps. Macros are helping remove ugly
> blocks of code, but UDFs are more solved by JRuby. Mortar data fixed Python
> UDFs.
>
> Reducing friction around using Pig with tools is important. Slowness of
> batch is hard for new users. Sample is hard to prepare that will do joins.
> Illustrate was invented for this purpose.
>
> Scheduling pig jobs is still a problem. Oozie is unpopular and too hard.
> Azkaban is inadequate for the enterprise. People hack things together. It
> sucks.
>
> HCatalog is maturing. Rest API. Hive and Pig together. Rest interface is
> for metadata so far. People are wanting to extend it to grab UDFs, etc.
>
> Russell Jurney http://datasyndrome.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB