Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # dev >> Persisting Pig Scripts


Copy link to this message
-
Re: Persisting Pig Scripts
That's expected. It's a cap on the size of how much of the script can be
stored. I'm not sure what the exact size limit is though, but if it's
causing issues I'm sure we could make it a configurable value.
On Mon, Jun 11, 2012 at 2:33 PM, Prashant Kommireddi <[EMAIL PROTECTED]>wrote:

> Bill,
>
> Would you know if that is expected or a bug?
>
>
>
>
> On Wed, Jun 6, 2012 at 5:56 PM, Bill Graham <[EMAIL PROTECTED]> wrote:
>
>> One thing to be aware of when accessing the pig.script option is that
>> AFAIK
>> there's a limit to how large the script can be, after which the rest would
>> be truncated.
>>
>>
>> On Wed, Jun 6, 2012 at 5:44 PM, Prashant Kommireddi <[EMAIL PROTECTED]
>> >wrote:
>>
>> > I completely agree that's an option. But IMHO being able to do that
>> upfront
>> > would be a nice feature, adding cron is just an additional process we
>> could
>> > avoid if possible.
>> >
>> > On Wed, Jun 6, 2012 at 5:39 PM, Dmitriy Ryaboy <[EMAIL PROTECTED]>
>> wrote:
>> >
>> > > You can write a nightly cron that runs the JobHistoryLoader job and
>> > > stores parsed scripts to hdfs...
>> > >
>> > > D
>> > >
>> > > On Wed, Jun 6, 2012 at 5:16 PM, Prashant Kommireddi <
>> [EMAIL PROTECTED]
>> > >
>> > > wrote:
>> > > > I think that would be more of a post-process vs having Pig write the
>> > same
>> > > > to a HDFS location. That would avoid having to parse it from
>> job.xml.
>> > > >
>> > > > On Wed, Jun 6, 2012 at 4:19 PM, Daniel Dai <[EMAIL PROTECTED]>
>> > > wrote:
>> > > >
>> > > >> One existing solution is "pig.script" entry inside job.xml, it is
>> the
>> > > >> serialized Pig script. JobHistoryLoader can load job.xml files and
>> > grab
>> > > >> those entries. Does that solve your problem?
>> > > >>
>> > > >> Daniel
>> > > >>
>> > > >> On Wed, Jun 6, 2012 at 3:52 PM, Prashant Kommireddi <
>> > > [EMAIL PROTECTED]
>> > > >> >wrote:
>> > > >>
>> > > >> > Hi All,
>> > > >> >
>> > > >> > What do you guys think about adding a feature to be able to
>> persist
>> > > the
>> > > >> > script (file or cache in case of grunt) on HDFS or locally based
>> on
>> > an
>> > > >> > admin setting (pig.properties). This will help infrastructure/ops
>> > > teams
>> > > >> > analyze nature of Pig scripts and be able to make certain
>> decisions
>> > > based
>> > > >> > on it (optimizing data storage based on access patterns etc).
>> This
>> > is
>> > > >> > actually something we want to do but the challenge is there is no
>> > > central
>> > > >> > place where we can track user scripts.
>> > > >> >
>> > > >> > It could be a config param "pig.persist.script=/pig/". The script
>> > > could
>> > > >> be
>> > > >> > stored with a configurable name -> ${mapred.job.name}+${
>> user.name
>> > > >> > }+timestamp"
>> > > >> > either on HDFS or local based on the configuration setting.
>> > > >> >
>> > > >> > Thanks,
>> > > >> > Prashant
>> > > >> >
>> > > >>
>> > >
>> >
>>
>>
>>
>> --
>> *Note that I'm no longer using my Yahoo! email address. Please email me at
>> [EMAIL PROTECTED] going forward.*
>>
>
>
--
*Note that I'm no longer using my Yahoo! email address. Please email me at
[EMAIL PROTECTED] going forward.*