I'm having some trouble controlling output replication level with PigRunner. In particular, when on a single box dev environment I want to set replication to 1. Else, with replication at 3x, the name node marks all blocks as under replicated and eventually starts freaking out.
Here are some details:
-I set replication to 1 in hdfs-site.xml. I set all relevant environment variables like HADOOP_CONF_DIR and PIG_HOME
-When I run pig on the command line I get my desired output replication of 1.
-When I run pig through PigRunner I get output replication of 3.
-I checked on all ENV variables within my process using PigRunner. They match what I see in the shell. (Not sure if PigRunner would pick these up anyway).
-I pass a properties file to PigRunner. The only relevant property there is 'mapred.submit.replication=1'
My best guess is I'm not passing in the correct properties, but I am not sure. Thanks in advance for any suggestions here.