Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> Pig Meetup Notes


Copy link to this message
-
Re: Pig Meetup Notes
Is that a PMC position? I also do AV and can bounce #credentials :D

Russell Jurney
twitter.com/rjurney
[EMAIL PROTECTED]
datasyndrome.com

On Jun 15, 2012, at 3:35 PM, Jonathan Coveney <[EMAIL PROTECTED]> wrote:

> +1
>
> 2012/6/15 Alan Gates <[EMAIL PROTECTED]>
>
>> Thanks Russell.  I move we make you the official Apache Pig secretary. :)
>>
>> Alan.
>>
>> On Jun 12, 2012, at 9:45 PM, Russell Jurney wrote:
>>
>>> Tuesday, Pig Meetup
>>>
>>> Alan Gates - upcoming improvements in operators/backend physical plan.
>>> Desphagetification.
>>> Reworking UDF interface, keep backward compatibility.
>>> Hadoop 2 coming, will be slow adoption.
>>>
>>> Bill Graham, Julien & Twitter - Optimization oriented. Cluster is at
>>> capacity. Detect skew, cost based optimizers, dynamic tuning. Gathering
>>> performance metrics, will be in HCatalog. Look at previous executions of
>>> same job to optimize on the fly.
>>>
>>> Companies: Yahoo, consultants, salesforce, twitter, hortonworks,
>> cloudera,
>>> zocalo systems?, trend micro
>>>
>>> Bill presented Ambrose. Motivation: 40MR job pig scripts, added DAG view.
>>> Shows you progress of your script as percentage and stepwise view. Helps
>>> with debug, optimization. Major progress.
>>>
>>> Pig users talk - using pig in local mode on sample, then pushing to
>>> cluster. Using illustrate to cut developer iterations. No counters in
>> local
>>> mode. Embedded pig in loops for ML. Java embedding.
>>> Java API PigServer to run scripts from apps. Macros are helping remove
>> ugly
>>> blocks of code, but UDFs are more solved by JRuby. Mortar data fixed
>> Python
>>> UDFs.
>>>
>>> Reducing friction around using Pig with tools is important. Slowness of
>>> batch is hard for new users. Sample is hard to prepare that will do
>> joins.
>>> Illustrate was invented for this purpose.
>>>
>>> Scheduling pig jobs is still a problem. Oozie is unpopular and too hard.
>>> Azkaban is inadequate for the enterprise. People hack things together. It
>>> sucks.
>>>
>>> HCatalog is maturing. Rest API. Hive and Pig together. Rest interface is
>>> for metadata so far. People are wanting to extend it to grab UDFs, etc.
>>>
>>> Russell Jurney http://datasyndrome.com
>>
>>