Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> multiple folder loading or passing comma on parameter with Amazon Pig


Copy link to this message
-
multiple folder loading or passing comma on parameter with Amazon Pig
Hi,

I'm running pig jobs using Amazon pig support, where you submit jobs with
comma concatenated parameters like this:

     elastic-mapreduce --pig-script --args myscript.pig --args
-p,PARAM1=value1,-p,PARAM2=value2,-p,PARAM3=value3

In my script, I need to pass multiple directories for the pig script to load
like this:

     raw = LOAD '/root_dir/{$SOURCE_DIRS}'

and SOURCE_DIRS is computed. For example, it can be
"2011-08,2011-07,20110-06", meaning my pig script need to load data for the
past 3 months. This works fine when I run my job using local or direct
hadoop mode. But with Amazon pig, I have to do something like this:

     elastic-mapreduce --pig-script --args myscript.pig
-p,SOURCE_DIRS="2011-08,2011-07,2011-06"

but emr will just replace commas with spaces so it breaks the parameter
passing syntax. I've tried adding backslashes before commas, but I simply
end up with back slash with space in between.

So question becomes:

1. can I do something differently than what I'm doing to pass multiple
folders to pig script (without commas), or
2. anyone knows how to properly pass commas to elastic-mapreduce ?

Thanks!

Dexin
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB