Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> multiple folder loading or passing comma on parameter with Amazon Pig


Copy link to this message
-
Re: multiple folder loading or passing comma on parameter with Amazon Pig
I solved my own problem and just want to share with whoever might encounter
the same issue.

I pass colon separated list then convert it to comma separated list inside
pig script using declare command.

Submit pig job  like this:

     -p SOURCE_DIRS="2011-08:2011-07:2011-06"

and in Pig script

     % declare SOURCE_DIRS_CONVERTED  `echo $SOURCE_DIRS | tr ':' ','`;
     LOAD '/root_dir/{$SOURCE_DIRS_CONVERTED}' ...
On Wed, Aug 17, 2011 at 4:21 PM, Dexin Wang <[EMAIL PROTECTED]> wrote:

> Hi,
>
> I'm running pig jobs using Amazon pig support, where you submit jobs with
> comma concatenated parameters like this:
>
>      elastic-mapreduce --pig-script --args myscript.pig --args
> -p,PARAM1=value1,-p,PARAM2=value2,-p,PARAM3=value3
>
> In my script, I need to pass multiple directories for the pig script to
> load like this:
>
>      raw = LOAD '/root_dir/{$SOURCE_DIRS}'
>
> and SOURCE_DIRS is computed. For example, it can be
> "2011-08,2011-07,20110-06", meaning my pig script need to load data for the
> past 3 months. This works fine when I run my job using local or direct
> hadoop mode. But with Amazon pig, I have to do something like this:
>
>      elastic-mapreduce --pig-script --args myscript.pig
> -p,SOURCE_DIRS="2011-08,2011-07,2011-06"
>
> but emr will just replace commas with spaces so it breaks the parameter
> passing syntax. I've tried adding backslashes before commas, but I simply
> end up with back slash with space in between.
>
> So question becomes:
>
> 1. can I do something differently than what I'm doing to pass multiple
> folders to pig script (without commas), or
> 2. anyone knows how to properly pass commas to elastic-mapreduce ?
>
> Thanks!
>
> Dexin
>