|
|
-
Re: multiple folder loading or passing comma on parameter with Amazon PigDexin Wang 2011-08-18, 01:03
I solved my own problem and just want to share with whoever might encounter
the same issue. I pass colon separated list then convert it to comma separated list inside pig script using declare command. Submit pig job like this: -p SOURCE_DIRS="2011-08:2011-07:2011-06" and in Pig script % declare SOURCE_DIRS_CONVERTED `echo $SOURCE_DIRS | tr ':' ','`; LOAD '/root_dir/{$SOURCE_DIRS_CONVERTED}' ... On Wed, Aug 17, 2011 at 4:21 PM, Dexin Wang <[EMAIL PROTECTED]> wrote: > Hi, > > I'm running pig jobs using Amazon pig support, where you submit jobs with > comma concatenated parameters like this: > > elastic-mapreduce --pig-script --args myscript.pig --args > -p,PARAM1=value1,-p,PARAM2=value2,-p,PARAM3=value3 > > In my script, I need to pass multiple directories for the pig script to > load like this: > > raw = LOAD '/root_dir/{$SOURCE_DIRS}' > > and SOURCE_DIRS is computed. For example, it can be > "2011-08,2011-07,20110-06", meaning my pig script need to load data for the > past 3 months. This works fine when I run my job using local or direct > hadoop mode. But with Amazon pig, I have to do something like this: > > elastic-mapreduce --pig-script --args myscript.pig > -p,SOURCE_DIRS="2011-08,2011-07,2011-06" > > but emr will just replace commas with spaces so it breaks the parameter > passing syntax. I've tried adding backslashes before commas, but I simply > end up with back slash with space in between. > > So question becomes: > > 1. can I do something differently than what I'm doing to pass multiple > folders to pig script (without commas), or > 2. anyone knows how to properly pass commas to elastic-mapreduce ? > > Thanks! > > Dexin > |