|
rahul
2010-08-27, 00:32
Jeff Zhang
2010-08-27, 01:10
rahul
2010-08-27, 01:22
Santhosh Srinivasan
2010-08-27, 01:25
Jeff Zhang
2010-08-27, 01:30
Jeff Zhang
2010-08-27, 01:33
rahul
2010-08-27, 01:49
rahul
2010-08-27, 01:51
Jeff Zhang
2010-08-27, 01:59
rahul
2010-08-27, 02:00
Jeff Zhang
2010-08-27, 02:07
rahul
2010-08-27, 02:12
Jeff Zhang
2010-08-27, 02:23
rahul
2010-08-27, 02:49
Jeff Zhang
2010-08-27, 03:17
rahul
2010-08-27, 16:17
|
-
Pig and Hadoop Integration Errorrahul 2010-08-27, 00:32
Hi ,
I am trying to integrate Pig with Hadoop for processing of jobs. I am able to run Pig in local mode and Hadoop with streaming api perfectly. But when I try to run Pig with Hadoop I get follwong Error: Pig Stack Trace --------------- ERROR 2116: Unexpected error. Could not validate the output specification for: file:///Users/rahulmalviya/Documents/Pig/dev/main_merged_hdp_out org.apache.pig.impl.plan.PlanValidationException: ERROR 0: An unexpected exception caused the validation to stop at org.apache.pig.impl.plan.PlanValidator.validate(PlanValidator.java:56) at org.apache.pig.impl.logicalLayer.validators.InputOutputFileValidator.validate(InputOutputFileValidator.java:49) at org.apache.pig.impl.logicalLayer.validators.InputOutputFileValidator.validate(InputOutputFileValidator.java:37) at org.apache.pig.impl.logicalLayer.validators.LogicalPlanValidationExecutor.validate(LogicalPlanValidationExecutor.java:89) at org.apache.pig.PigServer.validate(PigServer.java:930) at org.apache.pig.PigServer.compileLp(PigServer.java:910) at org.apache.pig.PigServer.compileLp(PigServer.java:871) at org.apache.pig.PigServer.compileLp(PigServer.java:852) at org.apache.pig.PigServer.execute(PigServer.java:816) at org.apache.pig.PigServer.access$100(PigServer.java:105) at org.apache.pig.PigServer$Graph.execute(PigServer.java:1080) at org.apache.pig.PigServer.executeBatch(PigServer.java:288) at org.apache.pig.tools.grunt.GruntParser.executeBatch(GruntParser.java:109) at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:166) at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:138) at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:89) at org.apache.pig.Main.main(Main.java:391) Caused by: org.apache.pig.impl.plan.PlanValidationException: ERROR 2116: Unexpected error. Could not validate the output specification for: file:///Users/rahulmalviya/Documents/Pig/dev/main_merged_hdp_out at org.apache.pig.impl.logicalLayer.validators.InputOutputFileVisitor.visit(InputOutputFileVisitor.java:93) at org.apache.pig.impl.logicalLayer.LOStore.visit(LOStore.java:140) at org.apache.pig.impl.logicalLayer.LOStore.visit(LOStore.java:37) at org.apache.pig.impl.plan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:67) at org.apache.pig.impl.plan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:69) at org.apache.pig.impl.plan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:69) at org.apache.pig.impl.plan.DepthFirstWalker.walk(DepthFirstWalker.java:50) at org.apache.pig.impl.plan.PlanVisitor.visit(PlanVisitor.java:51) at org.apache.pig.impl.plan.PlanValidator.validate(PlanValidator.java:50) ... 16 more Caused by: java.io.IOException: Call to localhost/127.0.0.1:9001 failed on local exception: java.io.EOFException at org.apache.hadoop.ipc.Client.wrapException(Client.java:775) at org.apache.hadoop.ipc.Client.call(Client.java:743) at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:220) at org.apache.hadoop.mapred.$Proxy0.getProtocolVersion(Unknown Source) at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:359) at org.apache.hadoop.mapred.JobClient.createRPCProxy(JobClient.java:429) at org.apache.hadoop.mapred.JobClient.init(JobClient.java:423) at org.apache.hadoop.mapred.JobClient.<init>(JobClient.java:410) at org.apache.hadoop.mapreduce.Job.<init>(Job.java:50) at org.apache.pig.impl.logicalLayer.validators.InputOutputFileVisitor.visit(InputOutputFileVisitor.java:89) ... 24 more Caused by: java.io.EOFException at java.io.DataInputStream.readInt(DataInputStream.java:375) at org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:501) at org.apache.hadoop.ipc.Client$Connection.run(Client.java:446) =============================================================================== Did anyone got the same error. I think it related to connection between pig and hadoop. Can someone tell me how to connect Pig and hadoop. Thanks.
-
Re: Pig and Hadoop Integration ErrorJeff Zhang 2010-08-27, 01:10
Do you put the hadoop conf on classpath ? It seems you are still using
local file system but conncect Hadoop's JobTracker. Make sure you set the correct configuration in core-site.xml hdfs-site.xml, mapred-site.xml, and put them on classpath. On Thu, Aug 26, 2010 at 5:32 PM, rahul <[EMAIL PROTECTED]> wrote: > Hi , > > I am trying to integrate Pig with Hadoop for processing of jobs. > > I am able to run Pig in local mode and Hadoop with streaming api perfectly. > > But when I try to run Pig with Hadoop I get follwong Error: > > Pig Stack Trace > --------------- > ERROR 2116: Unexpected error. Could not validate the output specification for: file:///Users/rahulmalviya/Documents/Pig/dev/main_merged_hdp_out > > org.apache.pig.impl.plan.PlanValidationException: ERROR 0: An unexpected exception caused the validation to stop > at org.apache.pig.impl.plan.PlanValidator.validate(PlanValidator.java:56) > at org.apache.pig.impl.logicalLayer.validators.InputOutputFileValidator.validate(InputOutputFileValidator.java:49) > at org.apache.pig.impl.logicalLayer.validators.InputOutputFileValidator.validate(InputOutputFileValidator.java:37) > at org.apache.pig.impl.logicalLayer.validators.LogicalPlanValidationExecutor.validate(LogicalPlanValidationExecutor.java:89) > at org.apache.pig.PigServer.validate(PigServer.java:930) > at org.apache.pig.PigServer.compileLp(PigServer.java:910) > at org.apache.pig.PigServer.compileLp(PigServer.java:871) > at org.apache.pig.PigServer.compileLp(PigServer.java:852) > at org.apache.pig.PigServer.execute(PigServer.java:816) > at org.apache.pig.PigServer.access$100(PigServer.java:105) > at org.apache.pig.PigServer$Graph.execute(PigServer.java:1080) > at org.apache.pig.PigServer.executeBatch(PigServer.java:288) > at org.apache.pig.tools.grunt.GruntParser.executeBatch(GruntParser.java:109) > at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:166) > at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:138) > at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:89) > at org.apache.pig.Main.main(Main.java:391) > Caused by: org.apache.pig.impl.plan.PlanValidationException: ERROR 2116: Unexpected error. Could not validate the output specification for: file:///Users/rahulmalviya/Documents/Pig/dev/main_merged_hdp_out > at org.apache.pig.impl.logicalLayer.validators.InputOutputFileVisitor.visit(InputOutputFileVisitor.java:93) > at org.apache.pig.impl.logicalLayer.LOStore.visit(LOStore.java:140) > at org.apache.pig.impl.logicalLayer.LOStore.visit(LOStore.java:37) > at org.apache.pig.impl.plan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:67) > at org.apache.pig.impl.plan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:69) > at org.apache.pig.impl.plan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:69) > at org.apache.pig.impl.plan.DepthFirstWalker.walk(DepthFirstWalker.java:50) > at org.apache.pig.impl.plan.PlanVisitor.visit(PlanVisitor.java:51) > at org.apache.pig.impl.plan.PlanValidator.validate(PlanValidator.java:50) > ... 16 more > Caused by: java.io.IOException: Call to localhost/127.0.0.1:9001 failed on local exception: java.io.EOFException > at org.apache.hadoop.ipc.Client.wrapException(Client.java:775) > at org.apache.hadoop.ipc.Client.call(Client.java:743) > at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:220) > at org.apache.hadoop.mapred.$Proxy0.getProtocolVersion(Unknown Source) > at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:359) > at org.apache.hadoop.mapred.JobClient.createRPCProxy(JobClient.java:429) > at org.apache.hadoop.mapred.JobClient.init(JobClient.java:423) > at org.apache.hadoop.mapred.JobClient.<init>(JobClient.java:410) > at org.apache.hadoop.mapreduce.Job.<init>(Job.java:50) > at org.apache.pig.impl.logicalLayer.validators.InputOutputFileVisitor.visit(InputOutputFileVisitor.java:89) Best Regards Jeff Zhang
-
Re: Pig and Hadoop Integration Errorrahul 2010-08-27, 01:22
Hi Jeff,
I have set the hadoop conf in class path by setting $HADOOP_CONF_DIR variable. But I have both Pig and hadoop running at the same machine, so localhost should not make a difference. So I have used all the default config setting for the core-site.xml, hdfs-site.xml, mapred-site.xml, as per the hadoop tutorial. Please let me know if my understanding is correct ? I am attaching the conf files as well : hdfs-site.xml: <?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <!-- Put site-specific property overrides in this file. --> <configuration> <property> <name>fs.default.name</name> <value>hdfs://localhost:9000</value> <description>The name of the default file system. A URI whose scheme and authority determine the FileSystem implementation. The uri's scheme determines the config property (fs.SCHEME.impl) naming the FileSystem implementation class. The uri's authority is used to determine the host, port, etc. for a filesystem.</description> </property> <property> <name>dfs.replication</name> <value>1</value> <description>Default block replication. The actual number of replications can be specified when the file is created. The default is used if replication is not specified in create time. </description> </property> </configuration> core-site.xml <?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <!-- Put site-specific property overrides in this file. --> <configuration> <property> <name>hadoop.tmp.dir</name> <value>/Users/rahulmalviya/Documents/Hadoop/hadoop-0.21.0/hadoop-${user.name}</value> <description>A base for other temporary directories.</description> </property> </configuration> mapred-site.xml <?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <!-- Put site-specific property overrides in this file. --> <configuration> <property> <name>mapred.job.tracker</name> <value>localhost:9001</value> <description>The host and port that the MapReduce job tracker runs at. If "local", then jobs are run in-process as a single map and reduce task. </description> </property> <property> <name>mapred.tasktracker.tasks.maximum</name> <value>8</value> <description>The maximum number of tasks that will be run simultaneously by a a task tracker </description> </property> </configuration> Please let me know if there is a issue in my configurations ? Any input is valuable for me. Thanks, Rahul On Aug 26, 2010, at 6:10 PM, Jeff Zhang wrote: > Do you put the hadoop conf on classpath ? It seems you are still using > local file system but conncect Hadoop's JobTracker. > Make sure you set the correct configuration in core-site.xml > hdfs-site.xml, mapred-site.xml, and put them on classpath. > > > > On Thu, Aug 26, 2010 at 5:32 PM, rahul <[EMAIL PROTECTED]> wrote: >> Hi , >> >> I am trying to integrate Pig with Hadoop for processing of jobs. >> >> I am able to run Pig in local mode and Hadoop with streaming api perfectly. >> >> But when I try to run Pig with Hadoop I get follwong Error: >> >> Pig Stack Trace >> --------------- >> ERROR 2116: Unexpected error. Could not validate the output specification for: file:///Users/rahulmalviya/Documents/Pig/dev/main_merged_hdp_out >> >> org.apache.pig.impl.plan.PlanValidationException: ERROR 0: An unexpected exception caused the validation to stop >> at org.apache.pig.impl.plan.PlanValidator.validate(PlanValidator.java:56) >> at org.apache.pig.impl.logicalLayer.validators.InputOutputFileValidator.validate(InputOutputFileValidator.java:49) >> at org.apache.pig.impl.logicalLayer.validators.InputOutputFileValidator.validate(InputOutputFileValidator.java:37) >> at org.apache.pig.impl.logicalLayer.validators.LogicalPlanValidationExecutor.validate(LogicalPlanValidationExecutor.java:89) >> at org.apache.pig.PigServer.validate(PigServer.java:930) >> at org.apache.pig.PigServer.compileLp(PigServer.java:910)
-
RE: Pig and Hadoop Integration ErrorSanthosh Srinivasan 2010-08-27, 01:25
Can you try replacing localhost with the fully qualified name of your host?
Santhosh -----Original Message----- From: rahul [mailto:[EMAIL PROTECTED]] Sent: Thursday, August 26, 2010 6:22 PM To: [EMAIL PROTECTED] Subject: Re: Pig and Hadoop Integration Error Hi Jeff, I have set the hadoop conf in class path by setting $HADOOP_CONF_DIR variable. But I have both Pig and hadoop running at the same machine, so localhost should not make a difference. So I have used all the default config setting for the core-site.xml, hdfs-site.xml, mapred-site.xml, as per the hadoop tutorial. Please let me know if my understanding is correct ? I am attaching the conf files as well : hdfs-site.xml: <?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <!-- Put site-specific property overrides in this file. --> <configuration> <property> <name>fs.default.name</name> <value>hdfs://localhost:9000</value> <description>The name of the default file system. A URI whose scheme and authority determine the FileSystem implementation. The uri's scheme determines the config property (fs.SCHEME.impl) naming the FileSystem implementation class. The uri's authority is used to determine the host, port, etc. for a filesystem.</description> </property> <property> <name>dfs.replication</name> <value>1</value> <description>Default block replication. The actual number of replications can be specified when the file is created. The default is used if replication is not specified in create time. </description> </property> </configuration> core-site.xml <?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <!-- Put site-specific property overrides in this file. --> <configuration> <property> <name>hadoop.tmp.dir</name> <value>/Users/rahulmalviya/Documents/Hadoop/hadoop-0.21.0/hadoop-${user.name}</value> <description>A base for other temporary directories.</description> </property> </configuration> mapred-site.xml <?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <!-- Put site-specific property overrides in this file. --> <configuration> <property> <name>mapred.job.tracker</name> <value>localhost:9001</value> <description>The host and port that the MapReduce job tracker runs at. If "local", then jobs are run in-process as a single map and reduce task. </description> </property> <property> <name>mapred.tasktracker.tasks.maximum</name> <value>8</value> <description>The maximum number of tasks that will be run simultaneously by a a task tracker </description> </property> </configuration> Please let me know if there is a issue in my configurations ? Any input is valuable for me. Thanks, Rahul On Aug 26, 2010, at 6:10 PM, Jeff Zhang wrote: > Do you put the hadoop conf on classpath ? It seems you are still using > local file system but conncect Hadoop's JobTracker. > Make sure you set the correct configuration in core-site.xml > hdfs-site.xml, mapred-site.xml, and put them on classpath. > > > > On Thu, Aug 26, 2010 at 5:32 PM, rahul <[EMAIL PROTECTED]> wrote: >> Hi , >> >> I am trying to integrate Pig with Hadoop for processing of jobs. >> >> I am able to run Pig in local mode and Hadoop with streaming api perfectly. >> >> But when I try to run Pig with Hadoop I get follwong Error: >> >> Pig Stack Trace >> --------------- >> ERROR 2116: Unexpected error. Could not validate the output >> specification for: >> file:///Users/rahulmalviya/Documents/Pig/dev/main_merged_hdp_out >> >> org.apache.pig.impl.plan.PlanValidationException: ERROR 0: An unexpected exception caused the validation to stop >> at org.apache.pig.impl.plan.PlanValidator.validate(PlanValidator.java:56) >> at org.apache.pig.impl.logicalLayer.validators.InputOutputFileValidator.validate(InputOutputFileValidator.java:49) >> at org.apache.pig.impl.logicalLayer.validators.InputOutputFileValidator.validate(InputOutputFileValidator.java:37)
-
Re: Pig and Hadoop Integration ErrorJeff Zhang 2010-08-27, 01:30
But according the errer log:
"Could not validate the output specification for: file:///Users/rahulmalviya/Documents/Pig/dev/main_merged_hdp_out" It still try to access local file system rather than HDFS File System On Thu, Aug 26, 2010 at 6:22 PM, rahul <[EMAIL PROTECTED]> wrote: > Hi Jeff, > > I have set the hadoop conf in class path by setting $HADOOP_CONF_DIR variable. > > But I have both Pig and hadoop running at the same machine, so localhost should not make a difference. > > So I have used all the default config setting for the core-site.xml, hdfs-site.xml, mapred-site.xml, as per the hadoop tutorial. > > Please let me know if my understanding is correct ? > > I am attaching the conf files as well : > hdfs-site.xml: > > <?xml version="1.0"?> > <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> > > <!-- Put site-specific property overrides in this file. --> > > <configuration> > <property> > <name>fs.default.name</name> > <value>hdfs://localhost:9000</value> > <description>The name of the default file system. A URI whose > scheme and authority determine the FileSystem implementation. ��The > uri's scheme determines the config property (fs.SCHEME.impl) naming > the FileSystem implementation class. The uri's authority is used to > determine the host, port, etc. for a filesystem.</description> > </property> > > <property> > <name>dfs.replication</name> > <value>1</value> > <description>Default block replication. > The actual number of replications can be specified when the file is created. > The default is used if replication is not specified in create time. > </description> > </property> > > </configuration> > > core-site.xml > <?xml version="1.0"?> > <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> > > <!-- Put site-specific property overrides in this file. --> > > <configuration> > <property> > <name>hadoop.tmp.dir</name> > <value>/Users/rahulmalviya/Documents/Hadoop/hadoop-0.21.0/hadoop-${user.name}</value> > <description>A base for other temporary directories.</description> > </property> > </configuration> > > mapred-site.xml > <?xml version="1.0"?> > <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> > > <!-- Put site-specific property overrides in this file. --> > > <configuration> > <property> > <name>mapred.job.tracker</name> > <value>localhost:9001</value> > <description>The host and port that the MapReduce job tracker runs > at. If "local", then jobs are run in-process as a single map > and reduce task. > </description> > </property> > > <property> > <name>mapred.tasktracker.tasks.maximum</name> > <value>8</value> > <description>The maximum number of tasks that will be run simultaneously by a > a task tracker > </description> > </property> > </configuration> > > Please let me know if there is a issue in my configurations ? Any input is valuable for me. > > Thanks, > Rahul > > On Aug 26, 2010, at 6:10 PM, Jeff Zhang wrote: > >> Do you put the hadoop conf on classpath ? It seems you are still using >> local file system but conncect Hadoop's JobTracker. >> Make sure you set the correct configuration in core-site.xml >> hdfs-site.xml, mapred-site.xml, and put them on classpath. >> >> >> >> On Thu, Aug 26, 2010 at 5:32 PM, rahul <[EMAIL PROTECTED]> wrote: >>> Hi , >>> >>> I am trying to integrate Pig with Hadoop for processing of jobs. >>> >>> I am able to run Pig in local mode and Hadoop with streaming api perfectly. >>> >>> But when I try to run Pig with Hadoop I get follwong Error: >>> >>> Pig Stack Trace >>> --------------- >>> ERROR 2116: Unexpected error. Could not validate the output specification for: file:///Users/rahulmalviya/Documents/Pig/dev/main_merged_hdp_out >>> >>> org.apache.pig.impl.plan.PlanValidationException: ERROR 0: An unexpected exception caused the validation to stop >>> at org.apache.pig.impl.plan.PlanValidator.validate(PlanValidator.java:56) >>> at org.apache.pig.impl.logicalLayer.validators.InputOutputFileValidator.validate(InputOutputFileValidator.java:49) Best Regards Jeff Zhang
-
Re: Pig and Hadoop Integration ErrorJeff Zhang 2010-08-27, 01:33
Try to put the hadoop xml configuration file to pig/conf folder
On Thu, Aug 26, 2010 at 6:22 PM, rahul <[EMAIL PROTECTED]> wrote: > Hi Jeff, > > I have set the hadoop conf in class path by setting $HADOOP_CONF_DIR variable. > > But I have both Pig and hadoop running at the same machine, so localhost should not make a difference. > > So I have used all the default config setting for the core-site.xml, hdfs-site.xml, mapred-site.xml, as per the hadoop tutorial. > > Please let me know if my understanding is correct ? > > I am attaching the conf files as well : > hdfs-site.xml: > > <?xml version="1.0"?> > <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> > > <!-- Put site-specific property overrides in this file. --> > > <configuration> > <property> > <name>fs.default.name</name> > <value>hdfs://localhost:9000</value> > <description>The name of the default file system. A URI whose > scheme and authority determine the FileSystem implementation. ��The > uri's scheme determines the config property (fs.SCHEME.impl) naming > the FileSystem implementation class. The uri's authority is used to > determine the host, port, etc. for a filesystem.</description> > </property> > > <property> > <name>dfs.replication</name> > <value>1</value> > <description>Default block replication. > The actual number of replications can be specified when the file is created. > The default is used if replication is not specified in create time. > </description> > </property> > > </configuration> > > core-site.xml > <?xml version="1.0"?> > <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> > > <!-- Put site-specific property overrides in this file. --> > > <configuration> > <property> > <name>hadoop.tmp.dir</name> > <value>/Users/rahulmalviya/Documents/Hadoop/hadoop-0.21.0/hadoop-${user.name}</value> > <description>A base for other temporary directories.</description> > </property> > </configuration> > > mapred-site.xml > <?xml version="1.0"?> > <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> > > <!-- Put site-specific property overrides in this file. --> > > <configuration> > <property> > <name>mapred.job.tracker</name> > <value>localhost:9001</value> > <description>The host and port that the MapReduce job tracker runs > at. If "local", then jobs are run in-process as a single map > and reduce task. > </description> > </property> > > <property> > <name>mapred.tasktracker.tasks.maximum</name> > <value>8</value> > <description>The maximum number of tasks that will be run simultaneously by a > a task tracker > </description> > </property> > </configuration> > > Please let me know if there is a issue in my configurations ? Any input is valuable for me. > > Thanks, > Rahul > > On Aug 26, 2010, at 6:10 PM, Jeff Zhang wrote: > >> Do you put the hadoop conf on classpath ? It seems you are still using >> local file system but conncect Hadoop's JobTracker. >> Make sure you set the correct configuration in core-site.xml >> hdfs-site.xml, mapred-site.xml, and put them on classpath. >> >> >> >> On Thu, Aug 26, 2010 at 5:32 PM, rahul <[EMAIL PROTECTED]> wrote: >>> Hi , >>> >>> I am trying to integrate Pig with Hadoop for processing of jobs. >>> >>> I am able to run Pig in local mode and Hadoop with streaming api perfectly. >>> >>> But when I try to run Pig with Hadoop I get follwong Error: >>> >>> Pig Stack Trace >>> --------------- >>> ERROR 2116: Unexpected error. Could not validate the output specification for: file:///Users/rahulmalviya/Documents/Pig/dev/main_merged_hdp_out >>> >>> org.apache.pig.impl.plan.PlanValidationException: ERROR 0: An unexpected exception caused the validation to stop >>> at org.apache.pig.impl.plan.PlanValidator.validate(PlanValidator.java:56) >>> at org.apache.pig.impl.logicalLayer.validators.InputOutputFileValidator.validate(InputOutputFileValidator.java:49) >>> at org.apache.pig.impl.logicalLayer.validators.InputOutputFileValidator.validate(InputOutputFileValidator.java:37) Best Regards Jeff Zhang
-
Re: Pig and Hadoop Integration Errorrahul 2010-08-27, 01:49
Hi Jeff,
I transferred the hadoop conf files to the pig/conf location but still i get the same error. Does the issue is with the configuration files or with the hdfs files system ? Can test the connection to hdfs(localhost/127.0.0.1:9001) in some way ? Steps I did : 1. I have formatted initially my local file system using the ./hadoop namenode -format command. I believe this mounts the local file system to HDFS. 2. Then I configured the hadoop conf files and started ./start-all script. 3. Started Pig with a custom pig script which should read hdfs as I passed the HADOOP_CONF_DIR as parameter. The command was java -cp $PIGDIR/pig.jar:$HADOOP_CONF_DIR org.apache.pig.Main script1-hadoop.pig Please let me know if these step miss something ? Thanks, Rahul On Aug 26, 2010, at 6:33 PM, Jeff Zhang wrote: > Try to put the hadoop xml configuration file to pig/conf folder > > > > On Thu, Aug 26, 2010 at 6:22 PM, rahul <[EMAIL PROTECTED]> wrote: >> Hi Jeff, >> >> I have set the hadoop conf in class path by setting $HADOOP_CONF_DIR variable. >> >> But I have both Pig and hadoop running at the same machine, so localhost should not make a difference. >> >> So I have used all the default config setting for the core-site.xml, hdfs-site.xml, mapred-site.xml, as per the hadoop tutorial. >> >> Please let me know if my understanding is correct ? >> >> I am attaching the conf files as well : >> hdfs-site.xml: >> >> <?xml version="1.0"?> >> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> >> >> <!-- Put site-specific property overrides in this file. --> >> >> <configuration> >> <property> >> <name>fs.default.name</name> >> <value>hdfs://localhost:9000</value> >> <description>The name of the default file system. A URI whose >> scheme and authority determine the FileSystem implementation. The >> uri's scheme determines the config property (fs.SCHEME.impl) naming >> the FileSystem implementation class. The uri's authority is used to >> determine the host, port, etc. for a filesystem.</description> >> </property> >> >> <property> >> <name>dfs.replication</name> >> <value>1</value> >> <description>Default block replication. >> The actual number of replications can be specified when the file is created. >> The default is used if replication is not specified in create time. >> </description> >> </property> >> >> </configuration> >> >> core-site.xml >> <?xml version="1.0"?> >> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> >> >> <!-- Put site-specific property overrides in this file. --> >> >> <configuration> >> <property> >> <name>hadoop.tmp.dir</name> >> <value>/Users/rahulmalviya/Documents/Hadoop/hadoop-0.21.0/hadoop-${user.name}</value> >> <description>A base for other temporary directories.</description> >> </property> >> </configuration> >> >> mapred-site.xml >> <?xml version="1.0"?> >> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> >> >> <!-- Put site-specific property overrides in this file. --> >> >> <configuration> >> <property> >> <name>mapred.job.tracker</name> >> <value>localhost:9001</value> >> <description>The host and port that the MapReduce job tracker runs >> at. If "local", then jobs are run in-process as a single map >> and reduce task. >> </description> >> </property> >> >> <property> >> <name>mapred.tasktracker.tasks.maximum</name> >> <value>8</value> >> <description>The maximum number of tasks that will be run simultaneously by a >> a task tracker >> </description> >> </property> >> </configuration> >> >> Please let me know if there is a issue in my configurations ? Any input is valuable for me. >> >> Thanks, >> Rahul >> >> On Aug 26, 2010, at 6:10 PM, Jeff Zhang wrote: >> >>> Do you put the hadoop conf on classpath ? It seems you are still using >>> local file system but conncect Hadoop's JobTracker. >>> Make sure you set the correct configuration in core-site.xml >>> hdfs-site.xml, mapred-site.xml, and put them on classpath. >>> >>>
-
Re: Pig and Hadoop Integration Errorrahul 2010-08-27, 01:51
Hi Santhosh I tried with absolute path as well but error remains the same.
I think absolute path should not be a issue as both Pig and Hadoop are at the same location. Please let me know if there is some gap in my understanding ? Thanks, Rahul On Aug 26, 2010, at 6:25 PM, Santhosh Srinivasan wrote: > Can you try replacing localhost with the fully qualified name of your host? > > Santhosh > > > -----Original Message----- > From: rahul [mailto:[EMAIL PROTECTED]] > Sent: Thursday, August 26, 2010 6:22 PM > To: [EMAIL PROTECTED] > Subject: Re: Pig and Hadoop Integration Error > > Hi Jeff, > > I have set the hadoop conf in class path by setting $HADOOP_CONF_DIR variable. > > But I have both Pig and hadoop running at the same machine, so localhost should not make a difference. > > So I have used all the default config setting for the core-site.xml, hdfs-site.xml, mapred-site.xml, as per the hadoop tutorial. > > Please let me know if my understanding is correct ? > > I am attaching the conf files as well : > hdfs-site.xml: > > <?xml version="1.0"?> > <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> > > <!-- Put site-specific property overrides in this file. --> > > <configuration> > <property> > <name>fs.default.name</name> > <value>hdfs://localhost:9000</value> > <description>The name of the default file system. A URI whose > scheme and authority determine the FileSystem implementation. The > uri's scheme determines the config property (fs.SCHEME.impl) naming > the FileSystem implementation class. The uri's authority is used to > determine the host, port, etc. for a filesystem.</description> </property> > > <property> > <name>dfs.replication</name> > <value>1</value> > <description>Default block replication. > The actual number of replications can be specified when the file is created. > The default is used if replication is not specified in create time. > </description> > </property> > > </configuration> > > core-site.xml > <?xml version="1.0"?> > <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> > > <!-- Put site-specific property overrides in this file. --> > > <configuration> > <property> > <name>hadoop.tmp.dir</name> > <value>/Users/rahulmalviya/Documents/Hadoop/hadoop-0.21.0/hadoop-${user.name}</value> > <description>A base for other temporary directories.</description> </property> </configuration> > > mapred-site.xml > <?xml version="1.0"?> > <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> > > <!-- Put site-specific property overrides in this file. --> > > <configuration> > <property> > <name>mapred.job.tracker</name> > <value>localhost:9001</value> > <description>The host and port that the MapReduce job tracker runs > at. If "local", then jobs are run in-process as a single map > and reduce task. > </description> > </property> > > <property> > <name>mapred.tasktracker.tasks.maximum</name> > <value>8</value> > <description>The maximum number of tasks that will be run simultaneously by a a task tracker </description> </property> </configuration> > > Please let me know if there is a issue in my configurations ? Any input is valuable for me. > > Thanks, > Rahul > > On Aug 26, 2010, at 6:10 PM, Jeff Zhang wrote: > >> Do you put the hadoop conf on classpath ? It seems you are still using >> local file system but conncect Hadoop's JobTracker. >> Make sure you set the correct configuration in core-site.xml >> hdfs-site.xml, mapred-site.xml, and put them on classpath. >> >> >> >> On Thu, Aug 26, 2010 at 5:32 PM, rahul <[EMAIL PROTECTED]> wrote: >>> Hi , >>> >>> I am trying to integrate Pig with Hadoop for processing of jobs. >>> >>> I am able to run Pig in local mode and Hadoop with streaming api perfectly. >>> >>> But when I try to run Pig with Hadoop I get follwong Error: >>> >>> Pig Stack Trace >>> --------------- >>> ERROR 2116: Unexpected error. Could not validate the output >>> specification for: >>> file:///Users/rahulmalviya/Documents/Pig/dev/main_merged_hdp_out
-
Re: Pig and Hadoop Integration ErrorJeff Zhang 2010-08-27, 01:59
Execute command jps in shell to see whether namenode and jobtracker is
running correctly. On Fri, Aug 27, 2010 at 9:49 AM, rahul <[EMAIL PROTECTED]> wrote: > Hi Jeff, > > I transferred the hadoop conf files to the pig/conf location but still i get the same error. > > Does the issue is with the configuration files or with the hdfs files system ? > > Can test the connection to hdfs(localhost/127.0.0.1:9001) in some way ? > > Steps I did : > > 1. I have formatted initially my local file system using the ./hadoop namenode -format command. I believe this mounts the local file system to HDFS. > 2. Then I configured the hadoop conf files and started ./start-all script. > 3. Started Pig with a custom pig script which should read hdfs as I passed the HADOOP_CONF_DIR as parameter. > The command was java -cp $PIGDIR/pig.jar:$HADOOP_CONF_DIR org.apache.pig.Main script1-hadoop.pig > > Please let me know if these step miss something ? > > Thanks, > Rahul > > > On Aug 26, 2010, at 6:33 PM, Jeff Zhang wrote: > >> Try to put the hadoop xml configuration file to pig/conf folder >> >> >> >> On Thu, Aug 26, 2010 at 6:22 PM, rahul <[EMAIL PROTECTED]> wrote: >>> Hi Jeff, >>> >>> I have set the hadoop conf in class path by setting $HADOOP_CONF_DIR variable. >>> >>> But I have both Pig and hadoop running at the same machine, so localhost should not make a difference. >>> >>> So I have used all the default config setting for the core-site.xml, hdfs-site.xml, mapred-site.xml, as per the hadoop tutorial. >>> >>> Please let me know if my understanding is correct ? >>> >>> I am attaching the conf files as well : >>> hdfs-site.xml: >>> >>> <?xml version="1.0"?> >>> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> >>> >>> <!-- Put site-specific property overrides in this file. --> >>> >>> <configuration> >>> <property> >>> <name>fs.default.name</name> >>> <value>hdfs://localhost:9000</value> >>> <description>The name of the default file system. A URI whose >>> scheme and authority determine the FileSystem implementation. ��The >>> uri's scheme determines the config property (fs.SCHEME.impl) naming >>> the FileSystem implementation class. The uri's authority is used to >>> determine the host, port, etc. for a filesystem.</description> >>> </property> >>> >>> <property> >>> <name>dfs.replication</name> >>> <value>1</value> >>> <description>Default block replication. >>> The actual number of replications can be specified when the file is created. >>> The default is used if replication is not specified in create time. >>> </description> >>> </property> >>> >>> </configuration> >>> >>> core-site.xml >>> <?xml version="1.0"?> >>> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> >>> >>> <!-- Put site-specific property overrides in this file. --> >>> >>> <configuration> >>> <property> >>> <name>hadoop.tmp.dir</name> >>> <value>/Users/rahulmalviya/Documents/Hadoop/hadoop-0.21.0/hadoop-${user.name}</value> >>> <description>A base for other temporary directories.</description> >>> </property> >>> </configuration> >>> >>> mapred-site.xml >>> <?xml version="1.0"?> >>> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> >>> >>> <!-- Put site-specific property overrides in this file. --> >>> >>> <configuration> >>> <property> >>> <name>mapred.job.tracker</name> >>> <value>localhost:9001</value> >>> <description>The host and port that the MapReduce job tracker runs >>> at. If "local", then jobs are run in-process as a single map >>> and reduce task. >>> </description> >>> </property> >>> >>> <property> >>> <name>mapred.tasktracker.tasks.maximum</name> >>> <value>8</value> >>> <description>The maximum number of tasks that will be run simultaneously by a >>> a task tracker >>> </description> >>> </property> >>> </configuration> >>> >>> Please let me know if there is a issue in my configurations ? Any input is valuable for me. >>> >>> Thanks, >>> Rahul >>> >>> On Aug 26, 2010, at 6:10 PM, Jeff Zhang wrote: >>> >>>> Do you put the hadoop conf on classpath ? It seems you are still using Best Regards Jeff Zhang
-
Re: Pig and Hadoop Integration Errorrahul 2010-08-27, 02:00
Yes they are running.
On Aug 26, 2010, at 6:59 PM, Jeff Zhang wrote: > Execute command jps in shell to see whether namenode and jobtracker is > running correctly. > > > > On Fri, Aug 27, 2010 at 9:49 AM, rahul <[EMAIL PROTECTED]> wrote: >> Hi Jeff, >> >> I transferred the hadoop conf files to the pig/conf location but still i get the same error. >> >> Does the issue is with the configuration files or with the hdfs files system ? >> >> Can test the connection to hdfs(localhost/127.0.0.1:9001) in some way ? >> >> Steps I did : >> >> 1. I have formatted initially my local file system using the ./hadoop namenode -format command. I believe this mounts the local file system to HDFS. >> 2. Then I configured the hadoop conf files and started ./start-all script. >> 3. Started Pig with a custom pig script which should read hdfs as I passed the HADOOP_CONF_DIR as parameter. >> The command was java -cp $PIGDIR/pig.jar:$HADOOP_CONF_DIR org.apache.pig.Main script1-hadoop.pig >> >> Please let me know if these step miss something ? >> >> Thanks, >> Rahul >> >> >> On Aug 26, 2010, at 6:33 PM, Jeff Zhang wrote: >> >>> Try to put the hadoop xml configuration file to pig/conf folder >>> >>> >>> >>> On Thu, Aug 26, 2010 at 6:22 PM, rahul <[EMAIL PROTECTED]> wrote: >>>> Hi Jeff, >>>> >>>> I have set the hadoop conf in class path by setting $HADOOP_CONF_DIR variable. >>>> >>>> But I have both Pig and hadoop running at the same machine, so localhost should not make a difference. >>>> >>>> So I have used all the default config setting for the core-site.xml, hdfs-site.xml, mapred-site.xml, as per the hadoop tutorial. >>>> >>>> Please let me know if my understanding is correct ? >>>> >>>> I am attaching the conf files as well : >>>> hdfs-site.xml: >>>> >>>> <?xml version="1.0"?> >>>> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> >>>> >>>> <!-- Put site-specific property overrides in this file. --> >>>> >>>> <configuration> >>>> <property> >>>> <name>fs.default.name</name> >>>> <value>hdfs://localhost:9000</value> >>>> <description>The name of the default file system. A URI whose >>>> scheme and authority determine the FileSystem implementation. The >>>> uri's scheme determines the config property (fs.SCHEME.impl) naming >>>> the FileSystem implementation class. The uri's authority is used to >>>> determine the host, port, etc. for a filesystem.</description> >>>> </property> >>>> >>>> <property> >>>> <name>dfs.replication</name> >>>> <value>1</value> >>>> <description>Default block replication. >>>> The actual number of replications can be specified when the file is created. >>>> The default is used if replication is not specified in create time. >>>> </description> >>>> </property> >>>> >>>> </configuration> >>>> >>>> core-site.xml >>>> <?xml version="1.0"?> >>>> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> >>>> >>>> <!-- Put site-specific property overrides in this file. --> >>>> >>>> <configuration> >>>> <property> >>>> <name>hadoop.tmp.dir</name> >>>> <value>/Users/rahulmalviya/Documents/Hadoop/hadoop-0.21.0/hadoop-${user.name}</value> >>>> <description>A base for other temporary directories.</description> >>>> </property> >>>> </configuration> >>>> >>>> mapred-site.xml >>>> <?xml version="1.0"?> >>>> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> >>>> >>>> <!-- Put site-specific property overrides in this file. --> >>>> >>>> <configuration> >>>> <property> >>>> <name>mapred.job.tracker</name> >>>> <value>localhost:9001</value> >>>> <description>The host and port that the MapReduce job tracker runs >>>> at. If "local", then jobs are run in-process as a single map >>>> and reduce task. >>>> </description> >>>> </property> >>>> >>>> <property> >>>> <name>mapred.tasktracker.tasks.maximum</name> >>>> <value>8</value> >>>> <description>The maximum number of tasks that will be run simultaneously by a >>>> a task tracker >>>> </description> >>>> </property>
-
Re: Pig and Hadoop Integration ErrorJeff Zhang 2010-08-27, 02:07
Can you look at the jobtracker log or access jobtracker web ui ?
It seems you can not connect to jobtracker according your log "Caused by: java.io.IOException: Call to localhost/127.0.0.1:9001 failed on local exception: java.io.EOFException" On Fri, Aug 27, 2010 at 10:00 AM, rahul <[EMAIL PROTECTED]> wrote: > Yes they are running. > > On Aug 26, 2010, at 6:59 PM, Jeff Zhang wrote: > >> Execute command jps in shell to see whether namenode and jobtracker is >> running correctly. >> >> >> >> On Fri, Aug 27, 2010 at 9:49 AM, rahul <[EMAIL PROTECTED]> wrote: >>> Hi Jeff, >>> >>> I transferred the hadoop conf files to the pig/conf location but still i get the same error. >>> >>> Does the issue is with the configuration files or with the hdfs files system ? >>> >>> Can test the connection to hdfs(localhost/127.0.0.1:9001) in some way ? >>> >>> Steps I did : >>> >>> 1. I have formatted initially my local file system using the ./hadoop namenode -format command. I believe this mounts the local file system to HDFS. >>> 2. Then I configured the hadoop conf files and started ./start-all script. >>> 3. Started Pig with a custom pig script which should read hdfs as I passed the HADOOP_CONF_DIR as parameter. >>> The command was java -cp $PIGDIR/pig.jar:$HADOOP_CONF_DIR org.apache.pig.Main script1-hadoop.pig >>> >>> Please let me know if these step miss something ? >>> >>> Thanks, >>> Rahul >>> >>> >>> On Aug 26, 2010, at 6:33 PM, Jeff Zhang wrote: >>> >>>> Try to put the hadoop xml configuration file to pig/conf folder >>>> >>>> >>>> >>>> On Thu, Aug 26, 2010 at 6:22 PM, rahul <[EMAIL PROTECTED]> wrote: >>>>> Hi Jeff, >>>>> >>>>> I have set the hadoop conf in class path by setting $HADOOP_CONF_DIR variable. >>>>> >>>>> But I have both Pig and hadoop running at the same machine, so localhost should not make a difference. >>>>> >>>>> So I have used all the default config setting for the core-site.xml, hdfs-site.xml, mapred-site.xml, as per the hadoop tutorial. >>>>> >>>>> Please let me know if my understanding is correct ? >>>>> >>>>> I am attaching the conf files as well : >>>>> hdfs-site.xml: >>>>> >>>>> <?xml version="1.0"?> >>>>> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> >>>>> >>>>> <!-- Put site-specific property overrides in this file. --> >>>>> >>>>> <configuration> >>>>> <property> >>>>> <name>fs.default.name</name> >>>>> <value>hdfs://localhost:9000</value> >>>>> <description>The name of the default file system. A URI whose >>>>> scheme and authority determine the FileSystem implementation. The >>>>> uri's scheme determines the config property (fs.SCHEME.impl) naming >>>>> the FileSystem implementation class. The uri's authority is used to >>>>> determine the host, port, etc. for a filesystem.</description> >>>>> </property> >>>>> >>>>> <property> >>>>> <name>dfs.replication</name> >>>>> <value>1</value> >>>>> <description>Default block replication. >>>>> The actual number of replications can be specified when the file is created. >>>>> The default is used if replication is not specified in create time. >>>>> </description> >>>>> </property> >>>>> >>>>> </configuration> >>>>> >>>>> core-site.xml >>>>> <?xml version="1.0"?> >>>>> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> >>>>> >>>>> <!-- Put site-specific property overrides in this file. --> >>>>> >>>>> <configuration> >>>>> <property> >>>>> <name>hadoop.tmp.dir</name> >>>>> <value>/Users/rahulmalviya/Documents/Hadoop/hadoop-0.21.0/hadoop-${user.name}</value> >>>>> <description>A base for other temporary directories.</description> >>>>> </property> >>>>> </configuration> >>>>> >>>>> mapred-site.xml >>>>> <?xml version="1.0"?> >>>>> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> >>>>> >>>>> <!-- Put site-specific property overrides in this file. --> >>>>> >>>>> <configuration> >>>>> <property> >>>>> <name>mapred.job.tracker</name> >>>>> <value>localhost:9001</value> >>>>> <description>The host and port that the MapReduce job tracker runs Best Regards Jeff Zhang
-
Re: Pig and Hadoop Integration Errorrahul 2010-08-27, 02:12
Hi Jeff,
I can connect to the jobtracker web UI using the following URL : http://localhost:50030/jobtracker.jsp And also I can see jobs which I ran directly using the streaming api on hadoop. I also see it tries to connect to localhost/127.0.0.1:9001 which I have specified in the hadoop conf file and I have also tried changing this location to localhost:50030 but still the error remains the same. Can you suggest something further ? Thanks, Rahul On Aug 26, 2010, at 7:07 PM, Jeff Zhang wrote: > Can you look at the jobtracker log or access jobtracker web ui ? > It seems you can not connect to jobtracker according your log > > "Caused by: java.io.IOException: Call to localhost/127.0.0.1:9001 > failed on local exception: java.io.EOFException" > > > > On Fri, Aug 27, 2010 at 10:00 AM, rahul <[EMAIL PROTECTED]> wrote: >> Yes they are running. >> >> On Aug 26, 2010, at 6:59 PM, Jeff Zhang wrote: >> >>> Execute command jps in shell to see whether namenode and jobtracker is >>> running correctly. >>> >>> >>> >>> On Fri, Aug 27, 2010 at 9:49 AM, rahul <[EMAIL PROTECTED]> wrote: >>>> Hi Jeff, >>>> >>>> I transferred the hadoop conf files to the pig/conf location but still i get the same error. >>>> >>>> Does the issue is with the configuration files or with the hdfs files system ? >>>> >>>> Can test the connection to hdfs(localhost/127.0.0.1:9001) in some way ? >>>> >>>> Steps I did : >>>> >>>> 1. I have formatted initially my local file system using the ./hadoop namenode -format command. I believe this mounts the local file system to HDFS. >>>> 2. Then I configured the hadoop conf files and started ./start-all script. >>>> 3. Started Pig with a custom pig script which should read hdfs as I passed the HADOOP_CONF_DIR as parameter. >>>> The command was java -cp $PIGDIR/pig.jar:$HADOOP_CONF_DIR org.apache.pig.Main script1-hadoop.pig >>>> >>>> Please let me know if these step miss something ? >>>> >>>> Thanks, >>>> Rahul >>>> >>>> >>>> On Aug 26, 2010, at 6:33 PM, Jeff Zhang wrote: >>>> >>>>> Try to put the hadoop xml configuration file to pig/conf folder >>>>> >>>>> >>>>> >>>>> On Thu, Aug 26, 2010 at 6:22 PM, rahul <[EMAIL PROTECTED]> wrote: >>>>>> Hi Jeff, >>>>>> >>>>>> I have set the hadoop conf in class path by setting $HADOOP_CONF_DIR variable. >>>>>> >>>>>> But I have both Pig and hadoop running at the same machine, so localhost should not make a difference. >>>>>> >>>>>> So I have used all the default config setting for the core-site.xml, hdfs-site.xml, mapred-site.xml, as per the hadoop tutorial. >>>>>> >>>>>> Please let me know if my understanding is correct ? >>>>>> >>>>>> I am attaching the conf files as well : >>>>>> hdfs-site.xml: >>>>>> >>>>>> <?xml version="1.0"?> >>>>>> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> >>>>>> >>>>>> <!-- Put site-specific property overrides in this file. --> >>>>>> >>>>>> <configuration> >>>>>> <property> >>>>>> <name>fs.default.name</name> >>>>>> <value>hdfs://localhost:9000</value> >>>>>> <description>The name of the default file system. A URI whose >>>>>> scheme and authority determine the FileSystem implementation. The >>>>>> uri's scheme determines the config property (fs.SCHEME.impl) naming >>>>>> the FileSystem implementation class. The uri's authority is used to >>>>>> determine the host, port, etc. for a filesystem.</description> >>>>>> </property> >>>>>> >>>>>> <property> >>>>>> <name>dfs.replication</name> >>>>>> <value>1</value> >>>>>> <description>Default block replication. >>>>>> The actual number of replications can be specified when the file is created. >>>>>> The default is used if replication is not specified in create time. >>>>>> </description> >>>>>> </property> >>>>>> >>>>>> </configuration> >>>>>> >>>>>> core-site.xml >>>>>> <?xml version="1.0"?> >>>>>> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> >>>>>> >>>>>> <!-- Put site-specific property overrides in this file. --> >>>>>> >>>>
-
Re: Pig and Hadoop Integration ErrorJeff Zhang 2010-08-27, 02:23
Connect to 9001 is right, this is jobtracker's ipc port while 50030
is its http server port. And have you ever try to run the grunt shell ? On Thu, Aug 26, 2010 at 7:12 PM, rahul <[EMAIL PROTECTED]> wrote: > Hi Jeff, > > I can connect to the jobtracker web UI using the following URL : http://localhost:50030/jobtracker.jsp > > And also I can see jobs which I ran directly using the streaming api on hadoop. > > I also see it tries to connect to localhost/127.0.0.1:9001 which I have specified in the hadoop conf file > and I have also tried changing this location to localhost:50030 but still the error remains the same. > > Can you suggest something further ? > > Thanks, > Rahul > > On Aug 26, 2010, at 7:07 PM, Jeff Zhang wrote: > >> Can you look at the jobtracker log or access jobtracker web ui ? >> It seems you can not connect to jobtracker according your log >> >> "Caused by: java.io.IOException: Call to localhost/127.0.0.1:9001 >> failed on local exception: java.io.EOFException" >> >> >> >> On Fri, Aug 27, 2010 at 10:00 AM, rahul <[EMAIL PROTECTED]> wrote: >>> Yes they are running. >>> >>> On Aug 26, 2010, at 6:59 PM, Jeff Zhang wrote: >>> >>>> Execute command jps in shell to see whether namenode and jobtracker is >>>> running correctly. >>>> >>>> >>>> >>>> On Fri, Aug 27, 2010 at 9:49 AM, rahul <[EMAIL PROTECTED]> wrote: >>>>> Hi Jeff, >>>>> >>>>> I transferred the hadoop conf files to the pig/conf location but still i get the same error. >>>>> >>>>> Does the issue is with the configuration files or with the hdfs files system ? >>>>> >>>>> Can test the connection to hdfs(localhost/127.0.0.1:9001) in some way ? >>>>> >>>>> Steps I did : >>>>> >>>>> 1. I have formatted initially my local file system using the ./hadoop namenode -format command. I believe this mounts the local file system to HDFS. >>>>> 2. Then I configured the hadoop conf files and started ./start-all script. >>>>> 3. Started Pig with a custom pig script which should read hdfs as I passed the HADOOP_CONF_DIR as parameter. >>>>> The command was java -cp $PIGDIR/pig.jar:$HADOOP_CONF_DIR org.apache.pig.Main script1-hadoop.pig >>>>> >>>>> Please let me know if these step miss something ? >>>>> >>>>> Thanks, >>>>> Rahul >>>>> >>>>> >>>>> On Aug 26, 2010, at 6:33 PM, Jeff Zhang wrote: >>>>> >>>>>> Try to put the hadoop xml configuration file to pig/conf folder >>>>>> >>>>>> >>>>>> >>>>>> On Thu, Aug 26, 2010 at 6:22 PM, rahul <[EMAIL PROTECTED]> wrote: >>>>>>> Hi Jeff, >>>>>>> >>>>>>> I have set the hadoop conf in class path by setting $HADOOP_CONF_DIR variable. >>>>>>> >>>>>>> But I have both Pig and hadoop running at the same machine, so localhost should not make a difference. >>>>>>> >>>>>>> So I have used all the default config setting for the core-site.xml, hdfs-site.xml, mapred-site.xml, as per the hadoop tutorial. >>>>>>> >>>>>>> Please let me know if my understanding is correct ? >>>>>>> >>>>>>> I am attaching the conf files as well : >>>>>>> hdfs-site.xml: >>>>>>> >>>>>>> <?xml version="1.0"?> >>>>>>> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> >>>>>>> >>>>>>> <!-- Put site-specific property overrides in this file. --> >>>>>>> >>>>>>> <configuration> >>>>>>> <property> >>>>>>> <name>fs.default.name</name> >>>>>>> <value>hdfs://localhost:9000</value> >>>>>>> <description>The name of the default file system. A URI whose >>>>>>> scheme and authority determine the FileSystem implementation. The >>>>>>> uri's scheme determines the config property (fs.SCHEME.impl) naming >>>>>>> the FileSystem implementation class. The uri's authority is used to >>>>>>> determine the host, port, etc. for a filesystem.</description> >>>>>>> </property> >>>>>>> >>>>>>> <property> >>>>>>> <name>dfs.replication</name> >>>>>>> <value>1</value> >>>>>>> <description>Default block replication. >>>>>>> The actual number of replications can be specified when the file is created. >>>>>>> The default is used if replication is not specified in create time. >> Best Regards Jeff Zhang
-
Re: Pig and Hadoop Integration Errorrahul 2010-08-27, 02:49
Hi ,
I tried the grunt shell as well but that also does not connects to hadoop. It throws a warning and runs the job in standalone mode. So it tried it using the pig.jar. Do you have any further suggestion on that ? Rahul On Aug 26, 2010, at 7:23 PM, Jeff Zhang wrote: > Connect to 9001 is right, this is jobtracker's ipc port while 50030 > is its http server port. > And have you ever try to run the grunt shell ? > > On Thu, Aug 26, 2010 at 7:12 PM, rahul <[EMAIL PROTECTED]> wrote: >> Hi Jeff, >> >> I can connect to the jobtracker web UI using the following URL : http://localhost:50030/jobtracker.jsp >> >> And also I can see jobs which I ran directly using the streaming api on hadoop. >> >> I also see it tries to connect to localhost/127.0.0.1:9001 which I have specified in the hadoop conf file >> and I have also tried changing this location to localhost:50030 but still the error remains the same. >> >> Can you suggest something further ? >> >> Thanks, >> Rahul >> >> On Aug 26, 2010, at 7:07 PM, Jeff Zhang wrote: >> >>> Can you look at the jobtracker log or access jobtracker web ui ? >>> It seems you can not connect to jobtracker according your log >>> >>> "Caused by: java.io.IOException: Call to localhost/127.0.0.1:9001 >>> failed on local exception: java.io.EOFException" >>> >>> >>> >>> On Fri, Aug 27, 2010 at 10:00 AM, rahul <[EMAIL PROTECTED]> wrote: >>>> Yes they are running. >>>> >>>> On Aug 26, 2010, at 6:59 PM, Jeff Zhang wrote: >>>> >>>>> Execute command jps in shell to see whether namenode and jobtracker is >>>>> running correctly. >>>>> >>>>> >>>>> >>>>> On Fri, Aug 27, 2010 at 9:49 AM, rahul <[EMAIL PROTECTED]> wrote: >>>>>> Hi Jeff, >>>>>> >>>>>> I transferred the hadoop conf files to the pig/conf location but still i get the same error. >>>>>> >>>>>> Does the issue is with the configuration files or with the hdfs files system ? >>>>>> >>>>>> Can test the connection to hdfs(localhost/127.0.0.1:9001) in some way ? >>>>>> >>>>>> Steps I did : >>>>>> >>>>>> 1. I have formatted initially my local file system using the ./hadoop namenode -format command. I believe this mounts the local file system to HDFS. >>>>>> 2. Then I configured the hadoop conf files and started ./start-all script. >>>>>> 3. Started Pig with a custom pig script which should read hdfs as I passed the HADOOP_CONF_DIR as parameter. >>>>>> The command was java -cp $PIGDIR/pig.jar:$HADOOP_CONF_DIR org.apache.pig.Main script1-hadoop.pig >>>>>> >>>>>> Please let me know if these step miss something ? >>>>>> >>>>>> Thanks, >>>>>> Rahul >>>>>> >>>>>> >>>>>> On Aug 26, 2010, at 6:33 PM, Jeff Zhang wrote: >>>>>> >>>>>>> Try to put the hadoop xml configuration file to pig/conf folder >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Thu, Aug 26, 2010 at 6:22 PM, rahul <[EMAIL PROTECTED]> wrote: >>>>>>>> Hi Jeff, >>>>>>>> >>>>>>>> I have set the hadoop conf in class path by setting $HADOOP_CONF_DIR variable. >>>>>>>> >>>>>>>> But I have both Pig and hadoop running at the same machine, so localhost should not make a difference. >>>>>>>> >>>>>>>> So I have used all the default config setting for the core-site.xml, hdfs-site.xml, mapred-site.xml, as per the hadoop tutorial. >>>>>>>> >>>>>>>> Please let me know if my understanding is correct ? >>>>>>>> >>>>>>>> I am attaching the conf files as well : >>>>>>>> hdfs-site.xml: >>>>>>>> >>>>>>>> <?xml version="1.0"?> >>>>>>>> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> >>>>>>>> >>>>>>>> <!-- Put site-specific property overrides in this file. --> >>>>>>>> >>>>>>>> <configuration> >>>>>>>> <property> >>>>>>>> <name>fs.default.name</name> >>>>>>>> <value>hdfs://localhost:9000</value> >>>>>>>> <description>The name of the default file system. A URI whose >>>>>>>> scheme and authority determine the FileSystem implementation. The >>>>>>>> uri's scheme determines the config property (fs.SCHEME.impl) naming >>>>>>>> the FileSystem implementation class. The uri's authority is used to
-
Re: Pig and Hadoop Integration ErrorJeff Zhang 2010-08-27, 03:17
It's weird. I doubt maybe there's other configuration file on your
class path which override your real conf files. Could you download a new pig release and follow the instructions on http://hadoop.apache.org/pig/docs/r0.7.0/setup.html on a new environment ? On Thu, Aug 26, 2010 at 7:49 PM, rahul <[EMAIL PROTECTED]> wrote: > Hi , > > I tried the grunt shell as well but that also does not connects to hadoop. It throws a warning and runs the job in standalone mode. So it tried it using the pig.jar. > > Do you have any further suggestion on that ? > > Rahul > > On Aug 26, 2010, at 7:23 PM, Jeff Zhang wrote: > >> Connect to 9001 is right, this is jobtracker's ipc port while 50030 >> is its http server port. >> And have you ever try to run the grunt shell ? >> >> On Thu, Aug 26, 2010 at 7:12 PM, rahul <[EMAIL PROTECTED]> wrote: >>> Hi Jeff, >>> >>> I can connect to the jobtracker web UI using the following URL : http://localhost:50030/jobtracker.jsp >>> >>> And also I can see jobs which I ran directly using the streaming api on hadoop. >>> >>> I also see it tries to connect to localhost/127.0.0.1:9001 which I have specified in the hadoop conf file >>> and I have also tried changing this location to localhost:50030 but still the error remains the same. >>> >>> Can you suggest something further ? >>> >>> Thanks, >>> Rahul >>> >>> On Aug 26, 2010, at 7:07 PM, Jeff Zhang wrote: >>> >>>> Can you look at the jobtracker log or access jobtracker web ui ? >>>> It seems you can not connect to jobtracker according your log >>>> >>>> "Caused by: java.io.IOException: Call to localhost/127.0.0.1:9001 >>>> failed on local exception: java.io.EOFException" >>>> >>>> >>>> >>>> On Fri, Aug 27, 2010 at 10:00 AM, rahul <[EMAIL PROTECTED]> wrote: >>>>> Yes they are running. >>>>> >>>>> On Aug 26, 2010, at 6:59 PM, Jeff Zhang wrote: >>>>> >>>>>> Execute command jps in shell to see whether namenode and jobtracker is >>>>>> running correctly. >>>>>> >>>>>> >>>>>> >>>>>> On Fri, Aug 27, 2010 at 9:49 AM, rahul <[EMAIL PROTECTED]> wrote: >>>>>>> Hi Jeff, >>>>>>> >>>>>>> I transferred the hadoop conf files to the pig/conf location but still i get the same error. >>>>>>> >>>>>>> Does the issue is with the configuration files or with the hdfs files system ? >>>>>>> >>>>>>> Can test the connection to hdfs(localhost/127.0.0.1:9001) in some way ? >>>>>>> >>>>>>> Steps I did : >>>>>>> >>>>>>> 1. I have formatted initially my local file system using the ./hadoop namenode -format command. I believe this mounts the local file system to HDFS. >>>>>>> 2. Then I configured the hadoop conf files and started ./start-all script. >>>>>>> 3. Started Pig with a custom pig script which should read hdfs as I passed the HADOOP_CONF_DIR as parameter. >>>>>>> The command was java -cp $PIGDIR/pig.jar:$HADOOP_CONF_DIR org.apache.pig.Main script1-hadoop.pig >>>>>>> >>>>>>> Please let me know if these step miss something ? >>>>>>> >>>>>>> Thanks, >>>>>>> Rahul >>>>>>> >>>>>>> >>>>>>> On Aug 26, 2010, at 6:33 PM, Jeff Zhang wrote: >>>>>>> >>>>>>>> Try to put the hadoop xml configuration file to pig/conf folder >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On Thu, Aug 26, 2010 at 6:22 PM, rahul <[EMAIL PROTECTED]> wrote: >>>>>>>>> Hi Jeff, >>>>>>>>> >>>>>>>>> I have set the hadoop conf in class path by setting $HADOOP_CONF_DIR variable. >>>>>>>>> >>>>>>>>> But I have both Pig and hadoop running at the same machine, so localhost should not make a difference. >>>>>>>>> >>>>>>>>> So I have used all the default config setting for the core-site.xml, hdfs-site.xml, mapred-site.xml, as per the hadoop tutorial. >>>>>>>>> >>>>>>>>> Please let me know if my understanding is correct ? >>>>>>>>> >>>>>>>>> I am attaching the conf files as well : >>>>>>>>> hdfs-site.xml: >>>>>>>>> >>>>>>>>> <?xml version="1.0"?> >>>>>>>>> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> >>>>>>>>> >>>>>>>>> <!-- Put site-specific property overrides in this file. --> >>>>>>>>> >>>>>>>>> <configuration> >>>>>>> Best Regards Jeff Zhang
-
Re: Pig and Hadoop Integration Errorrahul 2010-08-27, 16:17
Sure Zhang.
Thanks for help. -Rahul On Aug 26, 2010, at 8:17 PM, Jeff Zhang wrote: > It's weird. I doubt maybe there's other configuration file on your > class path which override your real conf files. > Could you download a new pig release and follow the instructions on > http://hadoop.apache.org/pig/docs/r0.7.0/setup.html on a new > environment ? > > > > On Thu, Aug 26, 2010 at 7:49 PM, rahul <[EMAIL PROTECTED]> wrote: >> Hi , >> >> I tried the grunt shell as well but that also does not connects to hadoop. It throws a warning and runs the job in standalone mode. So it tried it using the pig.jar. >> >> Do you have any further suggestion on that ? >> >> Rahul >> >> On Aug 26, 2010, at 7:23 PM, Jeff Zhang wrote: >> >>> Connect to 9001 is right, this is jobtracker's ipc port while 50030 >>> is its http server port. >>> And have you ever try to run the grunt shell ? >>> >>> On Thu, Aug 26, 2010 at 7:12 PM, rahul <[EMAIL PROTECTED]> wrote: >>>> Hi Jeff, >>>> >>>> I can connect to the jobtracker web UI using the following URL : http://localhost:50030/jobtracker.jsp >>>> >>>> And also I can see jobs which I ran directly using the streaming api on hadoop. >>>> >>>> I also see it tries to connect to localhost/127.0.0.1:9001 which I have specified in the hadoop conf file >>>> and I have also tried changing this location to localhost:50030 but still the error remains the same. >>>> >>>> Can you suggest something further ? >>>> >>>> Thanks, >>>> Rahul >>>> >>>> On Aug 26, 2010, at 7:07 PM, Jeff Zhang wrote: >>>> >>>>> Can you look at the jobtracker log or access jobtracker web ui ? >>>>> It seems you can not connect to jobtracker according your log >>>>> >>>>> "Caused by: java.io.IOException: Call to localhost/127.0.0.1:9001 >>>>> failed on local exception: java.io.EOFException" >>>>> >>>>> >>>>> >>>>> On Fri, Aug 27, 2010 at 10:00 AM, rahul <[EMAIL PROTECTED]> wrote: >>>>>> Yes they are running. >>>>>> >>>>>> On Aug 26, 2010, at 6:59 PM, Jeff Zhang wrote: >>>>>> >>>>>>> Execute command jps in shell to see whether namenode and jobtracker is >>>>>>> running correctly. >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Fri, Aug 27, 2010 at 9:49 AM, rahul <[EMAIL PROTECTED]> wrote: >>>>>>>> Hi Jeff, >>>>>>>> >>>>>>>> I transferred the hadoop conf files to the pig/conf location but still i get the same error. >>>>>>>> >>>>>>>> Does the issue is with the configuration files or with the hdfs files system ? >>>>>>>> >>>>>>>> Can test the connection to hdfs(localhost/127.0.0.1:9001) in some way ? >>>>>>>> >>>>>>>> Steps I did : >>>>>>>> >>>>>>>> 1. I have formatted initially my local file system using the ./hadoop namenode -format command. I believe this mounts the local file system to HDFS. >>>>>>>> 2. Then I configured the hadoop conf files and started ./start-all script. >>>>>>>> 3. Started Pig with a custom pig script which should read hdfs as I passed the HADOOP_CONF_DIR as parameter. >>>>>>>> The command was java -cp $PIGDIR/pig.jar:$HADOOP_CONF_DIR org.apache.pig.Main script1-hadoop.pig >>>>>>>> >>>>>>>> Please let me know if these step miss something ? >>>>>>>> >>>>>>>> Thanks, >>>>>>>> Rahul >>>>>>>> >>>>>>>> >>>>>>>> On Aug 26, 2010, at 6:33 PM, Jeff Zhang wrote: >>>>>>>> >>>>>>>>> Try to put the hadoop xml configuration file to pig/conf folder >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> On Thu, Aug 26, 2010 at 6:22 PM, rahul <[EMAIL PROTECTED]> wrote: >>>>>>>>>> Hi Jeff, >>>>>>>>>> >>>>>>>>>> I have set the hadoop conf in class path by setting $HADOOP_CONF_DIR variable. >>>>>>>>>> >>>>>>>>>> But I have both Pig and hadoop running at the same machine, so localhost should not make a difference. >>>>>>>>>> >>>>>>>>>> So I have used all the default config setting for the core-site.xml, hdfs-site.xml, mapred-site.xml, as per the hadoop tutorial. >>>>>>>>>> >>>>>>>>>> Please let me know if my understanding is correct ? >>>>>>>>>> >>>>>>>>>> I am attaching the conf files as well : >>>>>>>>>> hdfs-site.xml: |