|
|
-
accessing remote cluster with Pig
Anze 2010-10-16, 20:52
Hi again! :)
I am trying to run Pig on a local machine, but I want it to connect to a remote cluster. I can't make it use my settings - whatever I do, I get this: ----- $ pig -x mapreduce 10/10/16 22:17:43 INFO pig.Main: Logging error messages to: /home/pigtest/conf/pig_1287260263699.log 2010-10-16 22:17:43,896 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to hadoop file system at: file:/// grunt> -----
I have copied the hadoop settings files (/etc/hadoop/conf/*) from the remote cluster's namenode to /home/pigtest/conf/ and exported PIG_CLASSPATH, PIGDIR, HADOOP_CLASSPATH,... I have also tried changing /etc/pig/conf/pig.configuration (even wrote there some free text so it would at least give me an error message) - nothing. It still connects to file:/// and is still doesn't display a message about a jobtracker: ----- $ export HADOOPDIR=/etc/hadoop/conf $ export PIG_PATH=/etc/pig/conf $ export PIG_CLASSPATH=$HADOOPDIR $ export PIG_HADOOP_VERSION=0.20.2 $ export PIG_HOME="/usr/lib/pig" $ export PIG_CONF_DIR="/etc/pig/" $ export PIG_LOG_DIR="/var/log/pig" $ pig -x mapreduce 10/10/16 22:32:34 INFO pig.Main: Logging error messages to: /home/pigtest/conf/pig_1287261154272.log 2010-10-16 22:32:34,471 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to hadoop file system at: file:/// grunt> -----
I am guessing I am doing something fundamentally wrong. How do I change the Pig's settings?
More info: using Cloudera package hadoop-pig from CDH3b3 (0.7.0+16-1~lenny- cdh3b3). I would appreciate some pointers.
Kind regards,
Anze
-
RE: accessing remote cluster with Pig
Gerrit Jansen van Vuuren 2010-10-17, 01:49
Hi,
Pig configuration is in the file: $PIG_HOME/conf/pig.properties
The two parameters that tell pig where to find the namenode and job tracker are:
E.g (assuming your using the default ports)
----[ $PIG_HOME/conf/pig.properties ]---------------
fs.default.name=hdfs://<namenode url>:8020/ mapred.job.tracker=<jobtracker url>:8021
--------------
Having these properties you don't need to specify pig -x mapreduce, just pig is enough. Cheers, Gerrit
-----Original Message----- From: Anze [mailto:[EMAIL PROTECTED]] Sent: Saturday, October 16, 2010 9:53 PM To: [EMAIL PROTECTED] Subject: accessing remote cluster with Pig
Hi again! :)
I am trying to run Pig on a local machine, but I want it to connect to a remote cluster. I can't make it use my settings - whatever I do, I get this: ----- $ pig -x mapreduce 10/10/16 22:17:43 INFO pig.Main: Logging error messages to: /home/pigtest/conf/pig_1287260263699.log 2010-10-16 22:17:43,896 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to hadoop file system at: file:/// grunt> -----
I have copied the hadoop settings files (/etc/hadoop/conf/*) from the remote
cluster's namenode to /home/pigtest/conf/ and exported PIG_CLASSPATH, PIGDIR, HADOOP_CLASSPATH,... I have also tried changing /etc/pig/conf/pig.configuration (even wrote there some free text so it would
at least give me an error message) - nothing. It still connects to file:/// and is still doesn't display a message about a jobtracker: ----- $ export HADOOPDIR=/etc/hadoop/conf $ export PIG_PATH=/etc/pig/conf $ export PIG_CLASSPATH=$HADOOPDIR $ export PIG_HADOOP_VERSION=0.20.2 $ export PIG_HOME="/usr/lib/pig" $ export PIG_CONF_DIR="/etc/pig/" $ export PIG_LOG_DIR="/var/log/pig" $ pig -x mapreduce 10/10/16 22:32:34 INFO pig.Main: Logging error messages to: /home/pigtest/conf/pig_1287261154272.log 2010-10-16 22:32:34,471 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to hadoop file system at: file:/// grunt> -----
I am guessing I am doing something fundamentally wrong. How do I change the Pig's settings?
More info: using Cloudera package hadoop-pig from CDH3b3 (0.7.0+16-1~lenny- cdh3b3). I would appreciate some pointers.
Kind regards,
Anze
-
Re: accessing remote cluster with Pig
Anze 2010-10-17, 06:49
Gerrir, thank you for your answer! It has pointed me in the right direction.
It looks like Pig (at least mine) ignores PIG_HOME. But with your help I was able to debug a bit further: ----- $ find / -name 'pig.properties' /etc/pig/conf.dist/pig.properties /etc/pig/conf/pig.properties /usr/lib/pig/example-confs/conf.default/pig.properties /usr/lib/pig/conf/pig.properties -----
I have changed /usr/lib/pig/conf/pig.properties and bingo - this is what my Pig uses.
So while Cloudera packaging makes /etc/pig/conf/pig.properties (the "Debian way"), it is not used at all. And it probably ignores the environment vars too.
Thanks again! :)
Anze
On Sunday 17 October 2010, Gerrit Jansen van Vuuren wrote: > Hi, > > Pig configuration is in the file: $PIG_HOME/conf/pig.properties > > The two parameters that tell pig where to find the namenode and job tracker > are: > > E.g (assuming your using the default ports) > > ----[ $PIG_HOME/conf/pig.properties ]--------------- > > fs.default.name=hdfs://<namenode url>:8020/ > mapred.job.tracker=<jobtracker url>:8021 > > -------------- > > Having these properties you don't need to specify pig -x mapreduce, just > pig is enough. > > > Cheers, > Gerrit > > -----Original Message----- > From: Anze [mailto:[EMAIL PROTECTED]] > Sent: Saturday, October 16, 2010 9:53 PM > To: [EMAIL PROTECTED] > Subject: accessing remote cluster with Pig > > Hi again! :) > > I am trying to run Pig on a local machine, but I want it to connect to a > remote cluster. I can't make it use my settings - whatever I do, I get > this: ----- > $ pig -x mapreduce > 10/10/16 22:17:43 INFO pig.Main: Logging error messages to: > /home/pigtest/conf/pig_1287260263699.log > 2010-10-16 22:17:43,896 [main] INFO > org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting > to > hadoop file system at: file:/// > grunt> > ----- > > I have copied the hadoop settings files (/etc/hadoop/conf/*) from the > remote > > cluster's namenode to /home/pigtest/conf/ and exported PIG_CLASSPATH, > PIGDIR, > HADOOP_CLASSPATH,... I have also tried changing > /etc/pig/conf/pig.configuration (even wrote there some free text so it > would > > at least give me an error message) - nothing. It still connects to file:/// > and is still doesn't display a message about a jobtracker: > ----- > $ export HADOOPDIR=/etc/hadoop/conf > $ export PIG_PATH=/etc/pig/conf > $ export PIG_CLASSPATH=$HADOOPDIR > $ export PIG_HADOOP_VERSION=0.20.2 > $ export PIG_HOME="/usr/lib/pig" > $ export PIG_CONF_DIR="/etc/pig/" > $ export PIG_LOG_DIR="/var/log/pig" > $ pig -x mapreduce > 10/10/16 22:32:34 INFO pig.Main: Logging error messages to: > /home/pigtest/conf/pig_1287261154272.log > 2010-10-16 22:32:34,471 [main] INFO > org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting > to > hadoop file system at: file:/// > grunt> > ----- > > I am guessing I am doing something fundamentally wrong. How do I change the > Pig's settings? > > More info: using Cloudera package hadoop-pig from CDH3b3 (0.7.0+16-1~lenny- > cdh3b3). I would appreciate some pointers. > > Kind regards, > > Anze
-
RE: accessing remote cluster with Pig
Gerrit Jansen van Vuuren 2010-10-17, 13:16
Glad it worked for you :)
I use the standard apache pig distributions. There are several places that environment variables can be changed and set, and I have no idea which one cloudera uses but here is a list:
/etc/profile.d/<any file> (we have hadoop.sh, pig.sh and java.sh here that sets the home variables and is managed by puppet) /etc/bash.bashrc (not good idea to set it here) $HOME/.bashrc (quick for users that don't have permission to root but not for production ) $PIG_HOME/conf/pig-env.sh (standard in all hadoop related projects, gets sourced by $PIG_HOME/bin/pig )
To see what variables your pig is picking up you can manually insert the lines echo "home:$PIG_HOME conf:$PIG_CONF_DIR" into the $PIG_HOME/bin/pig file just before it calls java.
Cheers, Gerrit
-----Original Message----- From: Anze [mailto:[EMAIL PROTECTED]] Sent: Sunday, October 17, 2010 7:49 AM To: [EMAIL PROTECTED] Subject: Re: accessing remote cluster with Pig Gerrir, thank you for your answer! It has pointed me in the right direction. It looks like Pig (at least mine) ignores PIG_HOME. But with your help I was
able to debug a bit further: ----- $ find / -name 'pig.properties' /etc/pig/conf.dist/pig.properties /etc/pig/conf/pig.properties /usr/lib/pig/example-confs/conf.default/pig.properties /usr/lib/pig/conf/pig.properties -----
I have changed /usr/lib/pig/conf/pig.properties and bingo - this is what my Pig uses.
So while Cloudera packaging makes /etc/pig/conf/pig.properties (the "Debian way"), it is not used at all. And it probably ignores the environment vars too.
Thanks again! :)
Anze
On Sunday 17 October 2010, Gerrit Jansen van Vuuren wrote: > Hi, > > Pig configuration is in the file: $PIG_HOME/conf/pig.properties > > The two parameters that tell pig where to find the namenode and job tracker > are: > > E.g (assuming your using the default ports) > > ----[ $PIG_HOME/conf/pig.properties ]--------------- > > fs.default.name=hdfs://<namenode url>:8020/ > mapred.job.tracker=<jobtracker url>:8021 > > -------------- > > Having these properties you don't need to specify pig -x mapreduce, just > pig is enough. > > > Cheers, > Gerrit > > -----Original Message----- > From: Anze [mailto:[EMAIL PROTECTED]] > Sent: Saturday, October 16, 2010 9:53 PM > To: [EMAIL PROTECTED] > Subject: accessing remote cluster with Pig > > Hi again! :) > > I am trying to run Pig on a local machine, but I want it to connect to a > remote cluster. I can't make it use my settings - whatever I do, I get > this: ----- > $ pig -x mapreduce > 10/10/16 22:17:43 INFO pig.Main: Logging error messages to: > /home/pigtest/conf/pig_1287260263699.log > 2010-10-16 22:17:43,896 [main] INFO > org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting > to > hadoop file system at: file:/// > grunt> > ----- > > I have copied the hadoop settings files (/etc/hadoop/conf/*) from the > remote > > cluster's namenode to /home/pigtest/conf/ and exported PIG_CLASSPATH, > PIGDIR, > HADOOP_CLASSPATH,... I have also tried changing > /etc/pig/conf/pig.configuration (even wrote there some free text so it > would > > at least give me an error message) - nothing. It still connects to file:/// > and is still doesn't display a message about a jobtracker: > ----- > $ export HADOOPDIR=/etc/hadoop/conf > $ export PIG_PATH=/etc/pig/conf > $ export PIG_CLASSPATH=$HADOOPDIR > $ export PIG_HADOOP_VERSION=0.20.2 > $ export PIG_HOME="/usr/lib/pig" > $ export PIG_CONF_DIR="/etc/pig/" > $ export PIG_LOG_DIR="/var/log/pig" > $ pig -x mapreduce > 10/10/16 22:32:34 INFO pig.Main: Logging error messages to: > /home/pigtest/conf/pig_1287261154272.log > 2010-10-16 22:32:34,471 [main] INFO > org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting > to > hadoop file system at: file:/// > grunt> > ----- > > I am guessing I am doing something fundamentally wrong. How do I change the > Pig's settings? > > More info: using Cloudera package hadoop-pig from CDH3b3 (0.7.0+16-1~lenny-
-
Re: accessing remote cluster with Pig
Anze 2010-10-18, 07:26
Good idea. :)
Here is the output for Cloudera CDH3b3 distribution in case someone else needs it: home:/usr/lib/pig/bin/.. conf:/usr/lib/pig/bin/../conf
Thanks for helping me out!
Anze On Sunday 17 October 2010, Gerrit Jansen van Vuuren wrote: > Glad it worked for you :) > > I use the standard apache pig distributions. > There are several places that environment variables can be changed and set, > and I have no idea which one cloudera uses but here is a list: > > /etc/profile.d/<any file> (we have hadoop.sh, pig.sh and java.sh here that > sets the home variables and is managed by puppet) > /etc/bash.bashrc (not good idea to set it here) > $HOME/.bashrc (quick for users that don't have permission to root but not > for production ) > $PIG_HOME/conf/pig-env.sh (standard in all hadoop related projects, gets > sourced by $PIG_HOME/bin/pig ) > > To see what variables your pig is picking up you can manually insert the > lines > echo "home:$PIG_HOME conf:$PIG_CONF_DIR" into the $PIG_HOME/bin/pig file > just before it calls java. > > Cheers, > Gerrit > > -----Original Message----- > From: Anze [mailto:[EMAIL PROTECTED]] > Sent: Sunday, October 17, 2010 7:49 AM > To: [EMAIL PROTECTED] > Subject: Re: accessing remote cluster with Pig > > > Gerrir, thank you for your answer! It has pointed me in the right > direction. > > > It looks like Pig (at least mine) ignores PIG_HOME. But with your help I > was > > able to debug a bit further: > ----- > $ find / -name 'pig.properties' > /etc/pig/conf.dist/pig.properties > /etc/pig/conf/pig.properties > /usr/lib/pig/example-confs/conf.default/pig.properties > /usr/lib/pig/conf/pig.properties > ----- > > I have changed /usr/lib/pig/conf/pig.properties and bingo - this is what my > Pig uses. > > So while Cloudera packaging makes /etc/pig/conf/pig.properties (the "Debian > way"), it is not used at all. And it probably ignores the environment vars > too. > > Thanks again! :) > > Anze > > On Sunday 17 October 2010, Gerrit Jansen van Vuuren wrote: > > Hi, > > > > Pig configuration is in the file: $PIG_HOME/conf/pig.properties > > > > The two parameters that tell pig where to find the namenode and job > > tracker > > > are: > > > > E.g (assuming your using the default ports) > > > > ----[ $PIG_HOME/conf/pig.properties ]--------------- > > > > fs.default.name=hdfs://<namenode url>:8020/ > > mapred.job.tracker=<jobtracker url>:8021 > > > > -------------- > > > > Having these properties you don't need to specify pig -x mapreduce, just > > pig is enough. > > > > > > Cheers, > > > > Gerrit > > > > -----Original Message----- > > From: Anze [mailto:[EMAIL PROTECTED]] > > Sent: Saturday, October 16, 2010 9:53 PM > > To: [EMAIL PROTECTED] > > Subject: accessing remote cluster with Pig > > > > Hi again! :) > > > > I am trying to run Pig on a local machine, but I want it to connect to a > > remote cluster. I can't make it use my settings - whatever I do, I get > > this: ----- > > $ pig -x mapreduce > > 10/10/16 22:17:43 INFO pig.Main: Logging error messages to: > > /home/pigtest/conf/pig_1287260263699.log > > 2010-10-16 22:17:43,896 [main] INFO > > org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - > > Connecting > > > to > > hadoop file system at: file:/// > > grunt> > > ----- > > > > I have copied the hadoop settings files (/etc/hadoop/conf/*) from the > > remote > > > > cluster's namenode to /home/pigtest/conf/ and exported PIG_CLASSPATH, > > PIGDIR, > > HADOOP_CLASSPATH,... I have also tried changing > > /etc/pig/conf/pig.configuration (even wrote there some free text so it > > would > > > > at least give me an error message) - nothing. It still connects to > > file:/// > > > and is still doesn't display a message about a jobtracker: > > ----- > > $ export HADOOPDIR=/etc/hadoop/conf > > $ export PIG_PATH=/etc/pig/conf > > $ export PIG_CLASSPATH=$HADOOPDIR > > $ export PIG_HADOOP_VERSION=0.20.2 > > $ export PIG_HOME="/usr/lib/pig" > > $ export PIG_CONF_DIR="/etc/pig/"
-
RE: accessing remote cluster with Pig
Kaluskar, Sanjay 2010-10-21, 10:17
I am trying to do the same (submitting a PIG script to a remote cluster from a Windows m/c) and the job gets submitted after setting the following in pig.properties:
fs.default.name=hdfs://<node>:54310 mapred.job.tracker=hdfs://<node>:54510
However, my script fails because it looks for inputs under /user/DrWho. Is it possible to specify the hadoop cluster user in pig.properties? How does one control it? Where is DrWho coming from?
Thanks, -sanjay
-----Original Message----- From: Gerrit Jansen van Vuuren [mailto:[EMAIL PROTECTED]] Sent: Sunday, October 17, 2010 6:47 PM To: [EMAIL PROTECTED] Subject: RE: accessing remote cluster with Pig
Glad it worked for you :)
I use the standard apache pig distributions. There are several places that environment variables can be changed and set, and I have no idea which one cloudera uses but here is a list:
/etc/profile.d/<any file> (we have hadoop.sh, pig.sh and java.sh here that sets the home variables and is managed by puppet) /etc/bash.bashrc (not good idea to set it here) $HOME/.bashrc (quick for users that don't have permission to root but not for production ) $PIG_HOME/conf/pig-env.sh (standard in all hadoop related projects, gets sourced by $PIG_HOME/bin/pig )
To see what variables your pig is picking up you can manually insert the lines echo "home:$PIG_HOME conf:$PIG_CONF_DIR" into the $PIG_HOME/bin/pig file just before it calls java.
Cheers, Gerrit
-----Original Message----- From: Anze [mailto:[EMAIL PROTECTED]] Sent: Sunday, October 17, 2010 7:49 AM To: [EMAIL PROTECTED] Subject: Re: accessing remote cluster with Pig Gerrir, thank you for your answer! It has pointed me in the right direction. It looks like Pig (at least mine) ignores PIG_HOME. But with your help I was
able to debug a bit further: ----- $ find / -name 'pig.properties' /etc/pig/conf.dist/pig.properties /etc/pig/conf/pig.properties /usr/lib/pig/example-confs/conf.default/pig.properties /usr/lib/pig/conf/pig.properties -----
I have changed /usr/lib/pig/conf/pig.properties and bingo - this is what my Pig uses.
So while Cloudera packaging makes /etc/pig/conf/pig.properties (the "Debian way"), it is not used at all. And it probably ignores the environment vars too.
Thanks again! :)
Anze
On Sunday 17 October 2010, Gerrit Jansen van Vuuren wrote: > Hi, > > Pig configuration is in the file: $PIG_HOME/conf/pig.properties > > The two parameters that tell pig where to find the namenode and job tracker > are: > > E.g (assuming your using the default ports) > > ----[ $PIG_HOME/conf/pig.properties ]--------------- > > fs.default.name=hdfs://<namenode url>:8020/ > mapred.job.tracker=<jobtracker url>:8021 > > -------------- > > Having these properties you don't need to specify pig -x mapreduce, just > pig is enough. > > > Cheers, > Gerrit > > -----Original Message----- > From: Anze [mailto:[EMAIL PROTECTED]] > Sent: Saturday, October 16, 2010 9:53 PM > To: [EMAIL PROTECTED] > Subject: accessing remote cluster with Pig > > Hi again! :) > > I am trying to run Pig on a local machine, but I want it to connect to a > remote cluster. I can't make it use my settings - whatever I do, I get > this: ----- > $ pig -x mapreduce > 10/10/16 22:17:43 INFO pig.Main: Logging error messages to: > /home/pigtest/conf/pig_1287260263699.log > 2010-10-16 22:17:43,896 [main] INFO > org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting > to > hadoop file system at: file:/// > grunt> > ----- > > I have copied the hadoop settings files (/etc/hadoop/conf/*) from the > remote > > cluster's namenode to /home/pigtest/conf/ and exported PIG_CLASSPATH, > PIGDIR, > HADOOP_CLASSPATH,... I have also tried changing > /etc/pig/conf/pig.configuration (even wrote there some free text so it > would > > at least give me an error message) - nothing. It still connects to file:/// > and is still doesn't display a message about a jobtracker: > ----- > $ export HADOOPDIR=/etc/hadoop/conf > $ export PIG_PATH=/etc/pig/conf Connecting change the (0.7.0+16-1~lenny-
-
Re: accessing remote cluster with Pig
김영우 2010-10-21, 14:41
Hi Sanjay,
You can specify a 'hadoop.job.ugi' property for your mapreduce job.
e.g., hadoop.job.ugi=username,groupname
Hope this helps.
Regards,
- Youngwoo
2010/10/21 Kaluskar, Sanjay <[EMAIL PROTECTED]>
> I am trying to do the same (submitting a PIG script to a remote cluster > from a Windows m/c) and the job gets submitted after setting the > following in pig.properties: > > fs.default.name=hdfs://<node>:54310 > mapred.job.tracker=hdfs://<node>:54510 > > However, my script fails because it looks for inputs under /user/DrWho. > Is it possible to specify the hadoop cluster user in pig.properties? How > does one control it? Where is DrWho coming from? > > Thanks, > -sanjay > > -----Original Message----- > From: Gerrit Jansen van Vuuren [mailto:[EMAIL PROTECTED]] > Sent: Sunday, October 17, 2010 6:47 PM > To: [EMAIL PROTECTED] > Subject: RE: accessing remote cluster with Pig > > Glad it worked for you :) > > I use the standard apache pig distributions. > There are several places that environment variables can be changed and > set, and I have no idea which one cloudera uses but here is a list: > > /etc/profile.d/<any file> (we have hadoop.sh, pig.sh and java.sh here > that sets the home variables and is managed by puppet) /etc/bash.bashrc > (not good idea to set it here) $HOME/.bashrc (quick for users that > don't have permission to root but not for production ) > $PIG_HOME/conf/pig-env.sh (standard in all hadoop related projects, > gets > sourced by $PIG_HOME/bin/pig ) > > To see what variables your pig is picking up you can manually insert the > lines echo "home:$PIG_HOME conf:$PIG_CONF_DIR" into the > $PIG_HOME/bin/pig file just before it calls java. > > Cheers, > Gerrit > > -----Original Message----- > From: Anze [mailto:[EMAIL PROTECTED]] > Sent: Sunday, October 17, 2010 7:49 AM > To: [EMAIL PROTECTED] > Subject: Re: accessing remote cluster with Pig > > > Gerrir, thank you for your answer! It has pointed me in the right > direction. > > > It looks like Pig (at least mine) ignores PIG_HOME. But with your help I > was > > able to debug a bit further: > ----- > $ find / -name 'pig.properties' > /etc/pig/conf.dist/pig.properties > /etc/pig/conf/pig.properties > /usr/lib/pig/example-confs/conf.default/pig.properties > /usr/lib/pig/conf/pig.properties > ----- > > I have changed /usr/lib/pig/conf/pig.properties and bingo - this is what > my > Pig uses. > > So while Cloudera packaging makes /etc/pig/conf/pig.properties (the > "Debian > way"), it is not used at all. And it probably ignores the environment > vars > too. > > Thanks again! :) > > Anze > > > > On Sunday 17 October 2010, Gerrit Jansen van Vuuren wrote: > > Hi, > > > > Pig configuration is in the file: $PIG_HOME/conf/pig.properties > > > > The two parameters that tell pig where to find the namenode and job > tracker > > are: > > > > E.g (assuming your using the default ports) > > > > ----[ $PIG_HOME/conf/pig.properties ]--------------- > > > > fs.default.name=hdfs://<namenode url>:8020/ > > mapred.job.tracker=<jobtracker url>:8021 > > > > -------------- > > > > Having these properties you don't need to specify pig -x mapreduce, > just > > pig is enough. > > > > > > Cheers, > > Gerrit > > > > -----Original Message----- > > From: Anze [mailto:[EMAIL PROTECTED]] > > Sent: Saturday, October 16, 2010 9:53 PM > > To: [EMAIL PROTECTED] > > Subject: accessing remote cluster with Pig > > > > Hi again! :) > > > > I am trying to run Pig on a local machine, but I want it to connect to > a > > remote cluster. I can't make it use my settings - whatever I do, I get > > this: ----- > > $ pig -x mapreduce > > 10/10/16 22:17:43 INFO pig.Main: Logging error messages to: > > /home/pigtest/conf/pig_1287260263699.log > > 2010-10-16 22:17:43,896 [main] INFO > > org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - > Connecting > > to > > hadoop file system at: file:/// > > grunt> > > ----- > > > > I have copied the hadoop settings files (/etc/hadoop/conf/*) from the
-
RE: accessing remote cluster with Pig
Santhosh Srinivasan 2010-10-21, 16:55
http://blog.rapleaf.com/dev/2010/01/05/the-wrath-of-drwho-or-unpredictable-hadoop-memory-usage/Check for the load on your client. Sometimes, if the client cannot determine your user name (who am i), your name node will receive DrWho which is the default user name when no user name is specified (read null) I have seen this behaviour when I use boxes with low memory especially on VMs. Santhosh -----Original Message----- From: Kaluskar, Sanjay [mailto:[EMAIL PROTECTED]] Sent: Thursday, October 21, 2010 3:17 AM To: [EMAIL PROTECTED] Subject: RE: accessing remote cluster with Pig I am trying to do the same (submitting a PIG script to a remote cluster from a Windows m/c) and the job gets submitted after setting the following in pig.properties: fs.default.name=hdfs://<node>:54310 mapred.job.tracker=hdfs://<node>:54510 However, my script fails because it looks for inputs under /user/DrWho. Is it possible to specify the hadoop cluster user in pig.properties? How does one control it? Where is DrWho coming from? Thanks, -sanjay -----Original Message----- From: Gerrit Jansen van Vuuren [mailto:[EMAIL PROTECTED]] Sent: Sunday, October 17, 2010 6:47 PM To: [EMAIL PROTECTED] Subject: RE: accessing remote cluster with Pig Glad it worked for you :) I use the standard apache pig distributions. There are several places that environment variables can be changed and set, and I have no idea which one cloudera uses but here is a list: /etc/profile.d/<any file> (we have hadoop.sh, pig.sh and java.sh here that sets the home variables and is managed by puppet) /etc/bash.bashrc (not good idea to set it here) $HOME/.bashrc (quick for users that don't have permission to root but not for production ) $PIG_HOME/conf/pig-env.sh (standard in all hadoop related projects, gets sourced by $PIG_HOME/bin/pig ) To see what variables your pig is picking up you can manually insert the lines echo "home:$PIG_HOME conf:$PIG_CONF_DIR" into the $PIG_HOME/bin/pig file just before it calls java. Cheers, Gerrit -----Original Message----- From: Anze [mailto:[EMAIL PROTECTED]] Sent: Sunday, October 17, 2010 7:49 AM To: [EMAIL PROTECTED] Subject: Re: accessing remote cluster with Pig Gerrir, thank you for your answer! It has pointed me in the right direction. It looks like Pig (at least mine) ignores PIG_HOME. But with your help I was able to debug a bit further: ----- $ find / -name 'pig.properties' /etc/pig/conf.dist/pig.properties /etc/pig/conf/pig.properties /usr/lib/pig/example-confs/conf.default/pig.properties /usr/lib/pig/conf/pig.properties ----- I have changed /usr/lib/pig/conf/pig.properties and bingo - this is what my Pig uses. So while Cloudera packaging makes /etc/pig/conf/pig.properties (the "Debian way"), it is not used at all. And it probably ignores the environment vars too. Thanks again! :) Anze On Sunday 17 October 2010, Gerrit Jansen van Vuuren wrote: > Hi, > > Pig configuration is in the file: $PIG_HOME/conf/pig.properties > > The two parameters that tell pig where to find the namenode and job tracker > are: > > E.g (assuming your using the default ports) > > ----[ $PIG_HOME/conf/pig.properties ]--------------- > > fs.default.name=hdfs://<namenode url>:8020/ > mapred.job.tracker=<jobtracker url>:8021 > > -------------- > > Having these properties you don't need to specify pig -x mapreduce, just > pig is enough. > > > Cheers, > Gerrit > > -----Original Message----- > From: Anze [mailto:[EMAIL PROTECTED]] > Sent: Saturday, October 16, 2010 9:53 PM > To: [EMAIL PROTECTED] > Subject: accessing remote cluster with Pig > > Hi again! :) > > I am trying to run Pig on a local machine, but I want it to connect to a > remote cluster. I can't make it use my settings - whatever I do, I get > this: ----- > $ pig -x mapreduce > 10/10/16 22:17:43 INFO pig.Main: Logging error messages to: > /home/pigtest/conf/pig_1287260263699.log > 2010-10-16 22:17:43,896 [main] INFO > org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting file:/// Connecting change the (0.7.0+16-1~lenny-
-
RE: accessing remote cluster with Pig
Kaluskar, Sanjay 2010-10-25, 05:56
Thanks, the job gets submitted as the right user with this property, so it works. But my job setup fails and I can't see any logs to figure out what went wrong. (The same pig script runs successfully when submitted from a Linux m/c that is part of the hadoop cluster). Do I need to set some options to see a log of the job setup?
Thanks, -sanjay
-----Original Message----- From: 김영우 [mailto:[EMAIL PROTECTED]] Sent: Thursday, October 21, 2010 8:11 PM To: [EMAIL PROTECTED] Subject: Re: accessing remote cluster with Pig
Hi Sanjay,
You can specify a 'hadoop.job.ugi' property for your mapreduce job.
e.g., hadoop.job.ugi=username,groupname
Hope this helps.
Regards,
- Youngwoo
2010/10/21 Kaluskar, Sanjay <[EMAIL PROTECTED]>
> I am trying to do the same (submitting a PIG script to a remote > cluster from a Windows m/c) and the job gets submitted after setting > the following in pig.properties: > > fs.default.name=hdfs://<node>:54310 > mapred.job.tracker=hdfs://<node>:54510 > > However, my script fails because it looks for inputs under /user/DrWho. > Is it possible to specify the hadoop cluster user in pig.properties? > How does one control it? Where is DrWho coming from? > > Thanks, > -sanjay > > -----Original Message----- > From: Gerrit Jansen van Vuuren [mailto:[EMAIL PROTECTED]] > Sent: Sunday, October 17, 2010 6:47 PM > To: [EMAIL PROTECTED] > Subject: RE: accessing remote cluster with Pig > > Glad it worked for you :) > > I use the standard apache pig distributions. > There are several places that environment variables can be changed and > set, and I have no idea which one cloudera uses but here is a list: > > /etc/profile.d/<any file> (we have hadoop.sh, pig.sh and java.sh here > that sets the home variables and is managed by puppet) > /etc/bash.bashrc (not good idea to set it here) $HOME/.bashrc (quick > for users that don't have permission to root but not for production ) > $PIG_HOME/conf/pig-env.sh (standard in all hadoop related projects, > gets > sourced by $PIG_HOME/bin/pig ) > > To see what variables your pig is picking up you can manually insert > the lines echo "home:$PIG_HOME conf:$PIG_CONF_DIR" into the > $PIG_HOME/bin/pig file just before it calls java. > > Cheers, > Gerrit > > -----Original Message----- > From: Anze [mailto:[EMAIL PROTECTED]] > Sent: Sunday, October 17, 2010 7:49 AM > To: [EMAIL PROTECTED] > Subject: Re: accessing remote cluster with Pig > > > Gerrir, thank you for your answer! It has pointed me in the right > direction. > > > It looks like Pig (at least mine) ignores PIG_HOME. But with your help > I was > > able to debug a bit further: > ----- > $ find / -name 'pig.properties' > /etc/pig/conf.dist/pig.properties > /etc/pig/conf/pig.properties > /usr/lib/pig/example-confs/conf.default/pig.properties > /usr/lib/pig/conf/pig.properties > ----- > > I have changed /usr/lib/pig/conf/pig.properties and bingo - this is > what my Pig uses. > > So while Cloudera packaging makes /etc/pig/conf/pig.properties (the > "Debian way"), it is not used at all. And it probably ignores the > environment vars too. > > Thanks again! :) > > Anze > > > > On Sunday 17 October 2010, Gerrit Jansen van Vuuren wrote: > > Hi, > > > > Pig configuration is in the file: $PIG_HOME/conf/pig.properties > > > > The two parameters that tell pig where to find the namenode and job > tracker > > are: > > > > E.g (assuming your using the default ports) > > > > ----[ $PIG_HOME/conf/pig.properties ]--------------- > > > > fs.default.name=hdfs://<namenode url>:8020/ > > mapred.job.tracker=<jobtracker url>:8021 > > > > -------------- > > > > Having these properties you don't need to specify pig -x mapreduce, > just > > pig is enough. > > > > > > Cheers, > > Gerrit > > > > -----Original Message----- > > From: Anze [mailto:[EMAIL PROTECTED]] > > Sent: Saturday, October 16, 2010 9:53 PM > > To: [EMAIL PROTECTED] > > Subject: accessing remote cluster with Pig > > > > Hi again! :) > > > > I am trying to run Pig on a local machine, but I want it to connect
|
|