Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> Trying to submit Pig job to Amazon EMR


Copy link to this message
-
Re: Trying to submit Pig job to Amazon EMR
Can you send the entire stack trace from pig logs ?
-Thejas
On 12/5/11 11:08 AM, Ayon Sinha wrote:
> Looks like I'm running into a problem I hadn't seen before.
> Pig is 9.1. Hadoop is the same version as on EMR. The conf is being
> picked up so that it connects to the EMR NN and JT. Now I get this:
>
> /home/mashlogic/ayon/hadoop-0.20.0
> 2011-12-05 10:56:58,200 [main] INFO org.apache.pig.Main - Logging error
> messages to: /home/mashlogic/ayon/pig_1323111418198.log
> 2011-12-05 10:56:58,398 [main] INFO
> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine -
> Connecting to hadoop file system at: 10.203.6.84:9000
> 2011-12-05 10:56:58,402 [main] WARN org.apache.hadoop.fs.FileSystem -
> "10.203.6.84:9000" is a deprecated filesystem name. Use
> "hdfs://10.203.6.84:9000/" instead.
> 2011-12-05 10:56:58,531 [main] INFO
> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine -
> Connecting to map-reduce job tracker at: 10.203.6.84:9001
> 2011-12-05 10:56:58,532 [main] WARN org.apache.hadoop.fs.FileSystem -
> "10.203.6.84:9000" is a deprecated filesystem name. Use
> "hdfs://10.203.6.84:9000/" instead.
> grunt> *a = load 's3n://ml-weblogs/smartlinks/daytsvs/day=20111130'
> using PigStorage();*
> 2011-12-05 10:57:18,078 [main] ERROR org.apache.pig.tools.grunt.Grunt -
> ERROR 1200: Pig script failed to parse:
> <line 1, column 4> pig script failed to validate:
> java.net.URISyntaxException: Illegal character in scheme name at index
> 0: 10.203.6.84:9000
>
> What is going on here?
> -Ayon
> See My Photos on Flickr <http://www.flickr.com/photos/ayonsinha/>
> Also check out my Blog for answers to commonly asked questions.
> <http://dailyadvisor.blogspot.com>
>
> ------------------------------------------------------------------------
> *From:* Ayon Sinha <[EMAIL PROTECTED]>
> *To:* "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>
> *Sent:* Friday, December 2, 2011 8:01 PM
> *Subject:* Re: Trying to submit Pig job to Amazon EMR
>
> So with the help of Daniel and Thejas, we figured out the problem. The
> root cause was the mismatch of Hadoop versions between EMR and the Pig
> client. When I copied over all the hadoop jars from the EMR box to the
> EC2 Pig 0.8.1 client EC2 box, it still did not resolve the issue. The
> root cause of that was that,
> Pig 0.8.1 uses hadoop classes from within its own packaged jar. Version
> 0.9 has pigwithouthadoop jar so we used that.
>
> Also, the bin/pig script has a bug that resets HADOOP_HOME. The script
> was also patched to fix this.
>
> Then also Pig will look for /user/<username> directory in the HDFS of
> the EMR cluster. So one way is to create the directory in the HDFS and
> then let Pig do its job. I'm not sure why Pig can't create that
> directory if its doesn't exist. Will investigate that.
>
> Thanks to Daniel & Thejas once again.
>
> -Ayon
> See My Photos on Flickr
> Also check out my Blog for answers to commonly asked questions.
>
>
>
> ________________________________
> From: Ayon Sinha <[EMAIL PROTECTED] <mailto:[EMAIL PROTECTED]>>
> To: Daniel Dai <[EMAIL PROTECTED] <mailto:[EMAIL PROTECTED]>>;
> "[EMAIL PROTECTED] <mailto:[EMAIL PROTECTED]>" <[EMAIL PROTECTED]
> <mailto:[EMAIL PROTECTED]>>
> Sent: Friday, December 2, 2011 8:15 AM
> Subject: Re: Trying to submit Pig job to Amazon EMR
>
> Yes, I do that the awsSecretAccessKey defined, correct, I believe.
> To test:
>
> mashlogic@cruncher ~ [ 8:07AM] hadoop dfs -ls
> s3n://ml-weblogs/smartlinks/daytsvs/day=20111130/
> Found 29 items
> -rwxrwxrwx 1 139148530 2011-12-01 07:03
> /smartlinks/daytsvs/day=20111130/xaa.tsv.gz
> -rwxrwxrwx 1 138086136 2011-12-01 07:03
> /smartlinks/daytsvs/day=20111130/xab.tsv.gz
> -rwxrwxrwx 1 146165298 2011-12-01 07:03
> /smartlinks/daytsvs/day=20111130/xac.tsv.gz
> -rwxrwxrwx 1 152491197 2011-12-01 07:03
> /smartlinks/daytsvs/day=20111130/xad.tsv.gz
> -rwxrwxrwx 1 154673351 2011-12-01 07:03
> /smartlinks/daytsvs/day=20111130/xae.tsv.gz
> -rwxrwxrwx 1 155920643 2011-12-01 07
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB