Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> How do you load data from S3 on Amazon EMR with Pig 0.10.0?


Copy link to this message
-
How do you load data from S3 on Amazon EMR with Pig 0.10.0?
My script is simple:

/* Avro */
register /home/hadoop/pig-0.10.0/build/ivy/lib/Pig/avro-1.5.3.jar
register /home/hadoop/pig-0.10.0/build/ivy/lib/Pig/json-simple-1.1.jar
register /home/hadoop/pig-0.10.0/contrib/piggybank/java/piggybank.jar
register
/home/hadoop/pig-0.10.0/build/ivy/lib/Pig/jackson-core-asl-1.7.3.jar
register
/home/hadoop/pig-0.10.0/build/ivy/lib/Pig/jackson-mapper-asl-1.7.3.jar

define AvroStorage org.apache.pig.piggybank.storage.avro.AvroStorage();

emails = LOAD 's3://rjurney_public_web/hadoop/enron.avro' using
AvroStorage();
The error confuses me. Why can't I load data from s3?

2012-06-22 01:52:50,893 [main] ERROR org.apache.pig.tools.grunt.Grunt -
ERROR 2999: Unexpected internal error. Invalid hostname in URI
s3://rjurney_public_web/hadoop/enron.avro
2012-06-22 01:52:50,893 [main] ERROR org.apache.pig.tools.grunt.Grunt -
java.lang.IllegalArgumentException: Invalid hostname in URI
s3://rjurney_public_web/hadoop/enron.avro
at org.apache.hadoop.fs.s3.S3Credentials.initialize(S3Credentials.java:41)
at
org.apache.hadoop.fs.s3native.NativeS3FileSystem.initialize(NativeS3FileSystem.java:436)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1327)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:65)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1345)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:244)
at
org.apache.pig.backend.hadoop.datastorage.HDataStorage.init(HDataStorage.java:70)
at
org.apache.pig.backend.hadoop.datastorage.HDataStorage.<init>(HDataStorage.java:53)
at org.apache.pig.builtin.JsonMetadata.findMetaFile(JsonMetadata.java:106)
at org.apache.pig.builtin.JsonMetadata.getSchema(JsonMetadata.java:188)
at org.apache.pig.builtin.PigStorage.getSchema(PigStorage.java:466)
at
org.apache.pig.newplan.logical.relational.LOLoad.getSchemaFromMetaData(LOLoad.java:151)
at
org.apache.pig.newplan.logical.relational.LOLoad.getSchema(LOLoad.java:110)
at
org.apache.pig.newplan.logical.visitor.LineageFindRelVisitor.visit(LineageFindRelVisitor.java:100)
at org.apache.pig.newplan.logical.relational.LOLoad.accept(LOLoad.java:219)
at
org.apache.pig.newplan.DependencyOrderWalker.walk(DependencyOrderWalker.java:75)
at org.apache.pig.newplan.PlanVisitor.visit(PlanVisitor.java:50)
at
org.apache.pig.newplan.logical.visitor.CastLineageSetter.<init>(CastLineageSetter.java:57)
at org.apache.pig.PigServer$Graph.compile(PigServer.java:1635)
at org.apache.pig.PigServer$Graph.validateQuery(PigServer.java:1566)
at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1538)
at org.apache.pig.PigServer.registerQuery(PigServer.java:540)
at org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:970)
at
org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:386)
at
org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:189)
at
org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:165)
at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:69)
at org.apache.pig.Main.run(Main.java:490)
at org.apache.pig.Main.main(Main.java:111)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.RunJar.main(RunJar.java:156)

--
Russell Jurney twitter.com/rjurney [EMAIL PROTECTED] datasyndrome.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB