Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> multiple folder loading or passing comma on parameter with Amazon Pig


Copy link to this message
-
Re: multiple folder loading or passing comma on parameter with Amazon Pig
I will.

There is also a "bug" on Pig documentation here:

http://pig.apache.org/docs/r0.8.1/piglatin_ref2.html

where it says

   In this example the command is executed and its stdout is used as the
parameter value.

  %declare CMD 'generate_date';

it should really be `generate_date` with the back ticks, not the single
quotes.

On Wed, Aug 17, 2011 at 6:18 PM, Dmitriy Ryaboy <[EMAIL PROTECTED]> wrote:

> Nice job figuring out a fix!
> You should seriously file a bug with AMR for that. That's kind of
> ridiculous.
>
> D
>
> On Wed, Aug 17, 2011 at 6:03 PM, Dexin Wang <[EMAIL PROTECTED]> wrote:
>
> > I solved my own problem and just want to share with whoever might
> encounter
> > the same issue.
> >
> > I pass colon separated list then convert it to comma separated list
> inside
> > pig script using declare command.
> >
> > Submit pig job  like this:
> >
> >     -p SOURCE_DIRS="2011-08:2011-07:2011-06"
> >
> > and in Pig script
> >
> >     % declare SOURCE_DIRS_CONVERTED  `echo $SOURCE_DIRS | tr ':' ','`;
> >     LOAD '/root_dir/{$SOURCE_DIRS_CONVERTED}' ...
> >
> >
> > On Wed, Aug 17, 2011 at 4:21 PM, Dexin Wang <[EMAIL PROTECTED]> wrote:
> >
> > > Hi,
> > >
> > > I'm running pig jobs using Amazon pig support, where you submit jobs
> with
> > > comma concatenated parameters like this:
> > >
> > >      elastic-mapreduce --pig-script --args myscript.pig --args
> > > -p,PARAM1=value1,-p,PARAM2=value2,-p,PARAM3=value3
> > >
> > > In my script, I need to pass multiple directories for the pig script to
> > > load like this:
> > >
> > >      raw = LOAD '/root_dir/{$SOURCE_DIRS}'
> > >
> > > and SOURCE_DIRS is computed. For example, it can be
> > > "2011-08,2011-07,20110-06", meaning my pig script need to load data for
> > the
> > > past 3 months. This works fine when I run my job using local or direct
> > > hadoop mode. But with Amazon pig, I have to do something like this:
> > >
> > >      elastic-mapreduce --pig-script --args myscript.pig
> > > -p,SOURCE_DIRS="2011-08,2011-07,2011-06"
> > >
> > > but emr will just replace commas with spaces so it breaks the parameter
> > > passing syntax. I've tried adding backslashes before commas, but I
> simply
> > > end up with back slash with space in between.
> > >
> > > So question becomes:
> > >
> > > 1. can I do something differently than what I'm doing to pass multiple
> > > folders to pig script (without commas), or
> > > 2. anyone knows how to properly pass commas to elastic-mapreduce ?
> > >
> > > Thanks!
> > >
> > > Dexin
> > >
> >
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB