Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig, mail # user - multiple folder loading or passing comma on parameter with Amazon Pig


Copy link to this message
-
multiple folder loading or passing comma on parameter with Amazon Pig
Dexin Wang 2011-08-17, 23:21
Hi,

I'm running pig jobs using Amazon pig support, where you submit jobs with
comma concatenated parameters like this:

     elastic-mapreduce --pig-script --args myscript.pig --args
-p,PARAM1=value1,-p,PARAM2=value2,-p,PARAM3=value3

In my script, I need to pass multiple directories for the pig script to load
like this:

     raw = LOAD '/root_dir/{$SOURCE_DIRS}'

and SOURCE_DIRS is computed. For example, it can be
"2011-08,2011-07,20110-06", meaning my pig script need to load data for the
past 3 months. This works fine when I run my job using local or direct
hadoop mode. But with Amazon pig, I have to do something like this:

     elastic-mapreduce --pig-script --args myscript.pig
-p,SOURCE_DIRS="2011-08,2011-07,2011-06"

but emr will just replace commas with spaces so it breaks the parameter
passing syntax. I've tried adding backslashes before commas, but I simply
end up with back slash with space in between.

So question becomes:

1. can I do something differently than what I'm doing to pass multiple
folders to pig script (without commas), or
2. anyone knows how to properly pass commas to elastic-mapreduce ?

Thanks!

Dexin