Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> How do you load data from S3 on Amazon EMR with Pig 0.10.0?


Copy link to this message
-
Re: How do you load data from S3 on Amazon EMR with Pig 0.10.0?
cd s3://elasticmapreduce/samples/pig-apache/input/

2012-06-22 01:58:56,685 [main] ERROR org.apache.pig.tools.grunt.Grunt -
ERROR 2999: Unexpected internal error. This file system object (hdfs://
10.4.115.51:9000) does not support access to the request path
's3://elasticmapreduce/samples/pig-apache/input' You possibly called
FileSystem.get(conf) when you should have called FileSystem.get(uri, conf)
to obtain a file system supporting your path.

Wait a minute... we fixed this.  I fixed this.  Why isn't it in Pig 0.10?

On Thu, Jun 21, 2012 at 6:57 PM, Russell Jurney <[EMAIL PROTECTED]>wrote:

> My script is simple:
>
> /* Avro */
> register /home/hadoop/pig-0.10.0/build/ivy/lib/Pig/avro-1.5.3.jar
> register /home/hadoop/pig-0.10.0/build/ivy/lib/Pig/json-simple-1.1.jar
> register /home/hadoop/pig-0.10.0/contrib/piggybank/java/piggybank.jar
> register
> /home/hadoop/pig-0.10.0/build/ivy/lib/Pig/jackson-core-asl-1.7.3.jar
> register
> /home/hadoop/pig-0.10.0/build/ivy/lib/Pig/jackson-mapper-asl-1.7.3.jar
>
> define AvroStorage org.apache.pig.piggybank.storage.avro.AvroStorage();
>
> emails = LOAD 's3://rjurney_public_web/hadoop/enron.avro' using
> AvroStorage();
>
>
> The error confuses me. Why can't I load data from s3?
>
> 2012-06-22 01:52:50,893 [main] ERROR org.apache.pig.tools.grunt.Grunt -
> ERROR 2999: Unexpected internal error. Invalid hostname in URI
> s3://rjurney_public_web/hadoop/enron.avro
> 2012-06-22 01:52:50,893 [main] ERROR org.apache.pig.tools.grunt.Grunt -
> java.lang.IllegalArgumentException: Invalid hostname in URI
> s3://rjurney_public_web/hadoop/enron.avro
> at org.apache.hadoop.fs.s3.S3Credentials.initialize(S3Credentials.java:41)
>  at
> org.apache.hadoop.fs.s3native.NativeS3FileSystem.initialize(NativeS3FileSystem.java:436)
> at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1327)
>  at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:65)
> at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1345)
>  at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:244)
> at
> org.apache.pig.backend.hadoop.datastorage.HDataStorage.init(HDataStorage.java:70)
>  at
> org.apache.pig.backend.hadoop.datastorage.HDataStorage.<init>(HDataStorage.java:53)
> at org.apache.pig.builtin.JsonMetadata.findMetaFile(JsonMetadata.java:106)
>  at org.apache.pig.builtin.JsonMetadata.getSchema(JsonMetadata.java:188)
> at org.apache.pig.builtin.PigStorage.getSchema(PigStorage.java:466)
>  at
> org.apache.pig.newplan.logical.relational.LOLoad.getSchemaFromMetaData(LOLoad.java:151)
> at
> org.apache.pig.newplan.logical.relational.LOLoad.getSchema(LOLoad.java:110)
>  at
> org.apache.pig.newplan.logical.visitor.LineageFindRelVisitor.visit(LineageFindRelVisitor.java:100)
> at org.apache.pig.newplan.logical.relational.LOLoad.accept(LOLoad.java:219)
>  at
> org.apache.pig.newplan.DependencyOrderWalker.walk(DependencyOrderWalker.java:75)
> at org.apache.pig.newplan.PlanVisitor.visit(PlanVisitor.java:50)
>  at
> org.apache.pig.newplan.logical.visitor.CastLineageSetter.<init>(CastLineageSetter.java:57)
> at org.apache.pig.PigServer$Graph.compile(PigServer.java:1635)
>  at org.apache.pig.PigServer$Graph.validateQuery(PigServer.java:1566)
> at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1538)
>  at org.apache.pig.PigServer.registerQuery(PigServer.java:540)
> at org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:970)
>  at
> org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:386)
> at
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:189)
>  at
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:165)
> at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:69)
>  at org.apache.pig.Main.run(Main.java:490)
> at org.apache.pig.Main.main(Main.java:111)
>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>  at

Russell Jurney twitter.com/rjurney [EMAIL PROTECTED] datasyndrome.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB