Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> multiple folder loading or passing comma on parameter with Amazon Pig


Copy link to this message
-
Re: multiple folder loading or passing comma on parameter with Amazon Pig
I solved my own problem and just want to share with whoever might encounter
the same issue.

I pass colon separated list then convert it to comma separated list inside
pig script using declare command.

Submit pig job  like this:

     -p SOURCE_DIRS="2011-08:2011-07:2011-06"

and in Pig script

     % declare SOURCE_DIRS_CONVERTED  `echo $SOURCE_DIRS | tr ':' ','`;
     LOAD '/root_dir/{$SOURCE_DIRS_CONVERTED}' ...
On Wed, Aug 17, 2011 at 4:21 PM, Dexin Wang <[EMAIL PROTECTED]> wrote:

> Hi,
>
> I'm running pig jobs using Amazon pig support, where you submit jobs with
> comma concatenated parameters like this:
>
>      elastic-mapreduce --pig-script --args myscript.pig --args
> -p,PARAM1=value1,-p,PARAM2=value2,-p,PARAM3=value3
>
> In my script, I need to pass multiple directories for the pig script to
> load like this:
>
>      raw = LOAD '/root_dir/{$SOURCE_DIRS}'
>
> and SOURCE_DIRS is computed. For example, it can be
> "2011-08,2011-07,20110-06", meaning my pig script need to load data for the
> past 3 months. This works fine when I run my job using local or direct
> hadoop mode. But with Amazon pig, I have to do something like this:
>
>      elastic-mapreduce --pig-script --args myscript.pig
> -p,SOURCE_DIRS="2011-08,2011-07,2011-06"
>
> but emr will just replace commas with spaces so it breaks the parameter
> passing syntax. I've tried adding backslashes before commas, but I simply
> end up with back slash with space in between.
>
> So question becomes:
>
> 1. can I do something differently than what I'm doing to pass multiple
> folders to pig script (without commas), or
> 2. anyone knows how to properly pass commas to elastic-mapreduce ?
>
> Thanks!
>
> Dexin
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB