Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> Trying to submit Pig job to Amazon EMR


Copy link to this message
-
Re: Trying to submit Pig job to Amazon EMR
Can you send the entire stack trace from pig logs ?
-Thejas
On 12/5/11 11:08 AM, Ayon Sinha wrote:
> Looks like I'm running into a problem I hadn't seen before.
> Pig is 9.1. Hadoop is the same version as on EMR. The conf is being
> picked up so that it connects to the EMR NN and JT. Now I get this:
>
> /home/mashlogic/ayon/hadoop-0.20.0
> 2011-12-05 10:56:58,200 [main] INFO org.apache.pig.Main - Logging error
> messages to: /home/mashlogic/ayon/pig_1323111418198.log
> 2011-12-05 10:56:58,398 [main] INFO
> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine -
> Connecting to hadoop file system at: 10.203.6.84:9000
> 2011-12-05 10:56:58,402 [main] WARN org.apache.hadoop.fs.FileSystem -
> "10.203.6.84:9000" is a deprecated filesystem name. Use
> "hdfs://10.203.6.84:9000/" instead.
> 2011-12-05 10:56:58,531 [main] INFO
> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine -
> Connecting to map-reduce job tracker at: 10.203.6.84:9001
> 2011-12-05 10:56:58,532 [main] WARN org.apache.hadoop.fs.FileSystem -
> "10.203.6.84:9000" is a deprecated filesystem name. Use
> "hdfs://10.203.6.84:9000/" instead.
> grunt> *a = load 's3n://ml-weblogs/smartlinks/daytsvs/day=20111130'
> using PigStorage();*
> 2011-12-05 10:57:18,078 [main] ERROR org.apache.pig.tools.grunt.Grunt -
> ERROR 1200: Pig script failed to parse:
> <line 1, column 4> pig script failed to validate:
> java.net.URISyntaxException: Illegal character in scheme name at index
> 0: 10.203.6.84:9000
>
> What is going on here?
> -Ayon
> See My Photos on Flickr <http://www.flickr.com/photos/ayonsinha/>
> Also check out my Blog for answers to commonly asked questions.
> <http://dailyadvisor.blogspot.com>
>
> ------------------------------------------------------------------------
> *From:* Ayon Sinha <[EMAIL PROTECTED]>
> *To:* "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>
> *Sent:* Friday, December 2, 2011 8:01 PM
> *Subject:* Re: Trying to submit Pig job to Amazon EMR
>
> So with the help of Daniel and Thejas, we figured out the problem. The
> root cause was the mismatch of Hadoop versions between EMR and the Pig
> client. When I copied over all the hadoop jars from the EMR box to the
> EC2 Pig 0.8.1 client EC2 box, it still did not resolve the issue. The
> root cause of that was that,
> Pig 0.8.1 uses hadoop classes from within its own packaged jar. Version
> 0.9 has pigwithouthadoop jar so we used that.
>
> Also, the bin/pig script has a bug that resets HADOOP_HOME. The script
> was also patched to fix this.
>
> Then also Pig will look for /user/<username> directory in the HDFS of
> the EMR cluster. So one way is to create the directory in the HDFS and
> then let Pig do its job. I'm not sure why Pig can't create that
> directory if its doesn't exist. Will investigate that.
>
> Thanks to Daniel & Thejas once again.
>
> -Ayon
> See My Photos on Flickr
> Also check out my Blog for answers to commonly asked questions.
>
>
>
> ________________________________
> From: Ayon Sinha <[EMAIL PROTECTED] <mailto:[EMAIL PROTECTED]>>
> To: Daniel Dai <[EMAIL PROTECTED] <mailto:[EMAIL PROTECTED]>>;
> "[EMAIL PROTECTED] <mailto:[EMAIL PROTECTED]>" <[EMAIL PROTECTED]
> <mailto:[EMAIL PROTECTED]>>
> Sent: Friday, December 2, 2011 8:15 AM
> Subject: Re: Trying to submit Pig job to Amazon EMR
>
> Yes, I do that the awsSecretAccessKey defined, correct, I believe.
> To test:
>
> mashlogic@cruncher ~ [ 8:07AM] hadoop dfs -ls
> s3n://ml-weblogs/smartlinks/daytsvs/day=20111130/
> Found 29 items
> -rwxrwxrwx 1 139148530 2011-12-01 07:03
> /smartlinks/daytsvs/day=20111130/xaa.tsv.gz
> -rwxrwxrwx 1 138086136 2011-12-01 07:03
> /smartlinks/daytsvs/day=20111130/xab.tsv.gz
> -rwxrwxrwx 1 146165298 2011-12-01 07:03
> /smartlinks/daytsvs/day=20111130/xac.tsv.gz
> -rwxrwxrwx 1 152491197 2011-12-01 07:03
> /smartlinks/daytsvs/day=20111130/xad.tsv.gz
> -rwxrwxrwx 1 154673351 2011-12-01 07:03
> /smartlinks/daytsvs/day=20111130/xae.tsv.gz
> -rwxrwxrwx 1 155920643 2011-12-01 07