Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # dev >> Persisting Pig Scripts


Copy link to this message
-
Re: Persisting Pig Scripts
One thing to be aware of when accessing the pig.script option is that AFAIK
there's a limit to how large the script can be, after which the rest would
be truncated.
On Wed, Jun 6, 2012 at 5:44 PM, Prashant Kommireddi <[EMAIL PROTECTED]>wrote:

> I completely agree that's an option. But IMHO being able to do that upfront
> would be a nice feature, adding cron is just an additional process we could
> avoid if possible.
>
> On Wed, Jun 6, 2012 at 5:39 PM, Dmitriy Ryaboy <[EMAIL PROTECTED]> wrote:
>
> > You can write a nightly cron that runs the JobHistoryLoader job and
> > stores parsed scripts to hdfs...
> >
> > D
> >
> > On Wed, Jun 6, 2012 at 5:16 PM, Prashant Kommireddi <[EMAIL PROTECTED]
> >
> > wrote:
> > > I think that would be more of a post-process vs having Pig write the
> same
> > > to a HDFS location. That would avoid having to parse it from job.xml.
> > >
> > > On Wed, Jun 6, 2012 at 4:19 PM, Daniel Dai <[EMAIL PROTECTED]>
> > wrote:
> > >
> > >> One existing solution is "pig.script" entry inside job.xml, it is the
> > >> serialized Pig script. JobHistoryLoader can load job.xml files and
> grab
> > >> those entries. Does that solve your problem?
> > >>
> > >> Daniel
> > >>
> > >> On Wed, Jun 6, 2012 at 3:52 PM, Prashant Kommireddi <
> > [EMAIL PROTECTED]
> > >> >wrote:
> > >>
> > >> > Hi All,
> > >> >
> > >> > What do you guys think about adding a feature to be able to persist
> > the
> > >> > script (file or cache in case of grunt) on HDFS or locally based on
> an
> > >> > admin setting (pig.properties). This will help infrastructure/ops
> > teams
> > >> > analyze nature of Pig scripts and be able to make certain decisions
> > based
> > >> > on it (optimizing data storage based on access patterns etc). This
> is
> > >> > actually something we want to do but the challenge is there is no
> > central
> > >> > place where we can track user scripts.
> > >> >
> > >> > It could be a config param "pig.persist.script=/pig/". The script
> > could
> > >> be
> > >> > stored with a configurable name -> ${mapred.job.name}+${user.name
> > >> > }+timestamp"
> > >> > either on HDFS or local based on the configuration setting.
> > >> >
> > >> > Thanks,
> > >> > Prashant
> > >> >
> > >>
> >
>

--
*Note that I'm no longer using my Yahoo! email address. Please email me at
[EMAIL PROTECTED] going forward.*