Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> Pig Meetup Notes


Copy link to this message
-
Re: Pig Meetup Notes
Is that a PMC position? I also do AV and can bounce #credentials :D

Russell Jurney
twitter.com/rjurney
[EMAIL PROTECTED]
datasyndrome.com

On Jun 15, 2012, at 3:35 PM, Jonathan Coveney <[EMAIL PROTECTED]> wrote:

> +1
>
> 2012/6/15 Alan Gates <[EMAIL PROTECTED]>
>
>> Thanks Russell.  I move we make you the official Apache Pig secretary. :)
>>
>> Alan.
>>
>> On Jun 12, 2012, at 9:45 PM, Russell Jurney wrote:
>>
>>> Tuesday, Pig Meetup
>>>
>>> Alan Gates - upcoming improvements in operators/backend physical plan.
>>> Desphagetification.
>>> Reworking UDF interface, keep backward compatibility.
>>> Hadoop 2 coming, will be slow adoption.
>>>
>>> Bill Graham, Julien & Twitter - Optimization oriented. Cluster is at
>>> capacity. Detect skew, cost based optimizers, dynamic tuning. Gathering
>>> performance metrics, will be in HCatalog. Look at previous executions of
>>> same job to optimize on the fly.
>>>
>>> Companies: Yahoo, consultants, salesforce, twitter, hortonworks,
>> cloudera,
>>> zocalo systems?, trend micro
>>>
>>> Bill presented Ambrose. Motivation: 40MR job pig scripts, added DAG view.
>>> Shows you progress of your script as percentage and stepwise view. Helps
>>> with debug, optimization. Major progress.
>>>
>>> Pig users talk - using pig in local mode on sample, then pushing to
>>> cluster. Using illustrate to cut developer iterations. No counters in
>> local
>>> mode. Embedded pig in loops for ML. Java embedding.
>>> Java API PigServer to run scripts from apps. Macros are helping remove
>> ugly
>>> blocks of code, but UDFs are more solved by JRuby. Mortar data fixed
>> Python
>>> UDFs.
>>>
>>> Reducing friction around using Pig with tools is important. Slowness of
>>> batch is hard for new users. Sample is hard to prepare that will do
>> joins.
>>> Illustrate was invented for this purpose.
>>>
>>> Scheduling pig jobs is still a problem. Oozie is unpopular and too hard.
>>> Azkaban is inadequate for the enterprise. People hack things together. It
>>> sucks.
>>>
>>> HCatalog is maturing. Rest API. Hive and Pig together. Rest interface is
>>> for metadata so far. People are wanting to extend it to grab UDFs, etc.
>>>
>>> Russell Jurney http://datasyndrome.com
>>
>>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB