Search Hadoop and all its sub project:

Switch to Threaded View
Subject: PigRunner setting replication level
I'm having some trouble controlling output replication level with PigRunner.  In particular, when on a single box dev environment I want to set replication to 1.  Else, with replication at 3x, the name node marks all blocks as under replicated and eventually starts freaking out.

Here are some details:
-I set replication to 1 in hdfs-site.xml.  I set all relevant environment variables like HADOOP_CONF_DIR and PIG_HOME
-When I run pig on the command line I get my desired output replication of 1.
-When I run pig through PigRunner I get output replication of 3.
  -I checked on all ENV variables within my process using PigRunner.  They match what I see in the shell.  (Not sure if PigRunner would pick these up anyway).
  -I pass a properties file to PigRunner.  The only relevant property there is 'mapred.submit.replication=1'

My best guess is I'm not passing in the correct properties, but I am not sure.  Thanks in advance for any suggestions here.
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB