Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Pig >> mail # dev >> Persisting Pig Scripts


+
Prashant Kommireddi 2012-06-06, 22:52
+
Daniel Dai 2012-06-06, 23:19
+
Prashant Kommireddi 2012-06-07, 00:16
+
Dmitriy Ryaboy 2012-06-07, 00:39
Copy link to this message
-
Re: Persisting Pig Scripts
I completely agree that's an option. But IMHO being able to do that upfront
would be a nice feature, adding cron is just an additional process we could
avoid if possible.

On Wed, Jun 6, 2012 at 5:39 PM, Dmitriy Ryaboy <[EMAIL PROTECTED]> wrote:

> You can write a nightly cron that runs the JobHistoryLoader job and
> stores parsed scripts to hdfs...
>
> D
>
> On Wed, Jun 6, 2012 at 5:16 PM, Prashant Kommireddi <[EMAIL PROTECTED]>
> wrote:
> > I think that would be more of a post-process vs having Pig write the same
> > to a HDFS location. That would avoid having to parse it from job.xml.
> >
> > On Wed, Jun 6, 2012 at 4:19 PM, Daniel Dai <[EMAIL PROTECTED]>
> wrote:
> >
> >> One existing solution is "pig.script" entry inside job.xml, it is the
> >> serialized Pig script. JobHistoryLoader can load job.xml files and grab
> >> those entries. Does that solve your problem?
> >>
> >> Daniel
> >>
> >> On Wed, Jun 6, 2012 at 3:52 PM, Prashant Kommireddi <
> [EMAIL PROTECTED]
> >> >wrote:
> >>
> >> > Hi All,
> >> >
> >> > What do you guys think about adding a feature to be able to persist
> the
> >> > script (file or cache in case of grunt) on HDFS or locally based on an
> >> > admin setting (pig.properties). This will help infrastructure/ops
> teams
> >> > analyze nature of Pig scripts and be able to make certain decisions
> based
> >> > on it (optimizing data storage based on access patterns etc). This is
> >> > actually something we want to do but the challenge is there is no
> central
> >> > place where we can track user scripts.
> >> >
> >> > It could be a config param "pig.persist.script=/pig/". The script
> could
> >> be
> >> > stored with a configurable name -> ${mapred.job.name}+${user.name
> >> > }+timestamp"
> >> > either on HDFS or local based on the configuration setting.
> >> >
> >> > Thanks,
> >> > Prashant
> >> >
> >>
>
+
Bill Graham 2012-06-07, 00:56
+
Prashant Kommireddi 2012-06-11, 21:33
+
Bill Graham 2012-06-11, 22:01
+
Jonathan Coveney 2012-06-11, 22:17
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB