Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> PigRunner setting replication level


Copy link to this message
-
PigRunner setting replication level
Hi,
I'm having some trouble controlling output replication level with PigRunner.  In particular, when on a single box dev environment I want to set replication to 1.  Else, with replication at 3x, the name node marks all blocks as under replicated and eventually starts freaking out.

Here are some details:
-I set replication to 1 in hdfs-site.xml.  I set all relevant environment variables like HADOOP_CONF_DIR and PIG_HOME
-When I run pig on the command line I get my desired output replication of 1.
-When I run pig through PigRunner I get output replication of 3.
  -I checked on all ENV variables within my process using PigRunner.  They match what I see in the shell.  (Not sure if PigRunner would pick these up anyway).
  -I pass a properties file to PigRunner.  The only relevant property there is 'mapred.submit.replication=1'

My best guess is I'm not passing in the correct properties, but I am not sure.  Thanks in advance for any suggestions here.
Thanks,
Adam
 
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB