Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Pig, mail # user - How can I read Hive text files on S3 from Pig?


+
Martin Goodson 2012-10-12, 15:48
+
Dmitriy Ryaboy 2012-10-12, 17:56
Copy link to this message
-
Re: How can I read Hive text files on S3 from Pig?
Martin Goodson 2012-10-13, 09:53
Hi Dmitriy,
here's is the stack trace:

java.lang.IllegalArgumentException: Wrong FS: s3n://xxx/yyy/zz/, expected:
hdfs://namenode.adsf.companyname.com
        at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:433)
        at
org.apache.hadoop.hdfs.DistributedFileSystem.getPathName(DistributedFileSystem.java:129)
        at
org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:523)
        at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:820)
        at
org.apache.pig.backend.hadoop.datastorage.HDataStorage.isContainer(HDataStorage.java:203)
        at
org.apache.pig.backend.hadoop.datastorage.HDataStorage.asElement(HDataStorage.java:131)
        at
org.apache.pig.backend.hadoop.datastorage.HDataStorage.asElement(HDataStorage.java:147)
        at
org.apache.pig.impl.io.FileLocalizer.fullPath(FileLocalizer.java:316)
        at
org.apache.pig.piggybank.storage.JsonMetadata.findMetaFile(JsonMetadata.java:94)
        at
org.apache.pig.piggybank.storage.JsonMetadata.getSchema(JsonMetadata.java:154)
        at
org.apache.pig.piggybank.storage.AllLoader.getSchema(AllLoader.java:400)
        at
org.apache.pig.newplan.logical.relational.LOLoad.getSchemaFromMetaData(LOLoad.java:150)
        at
org.apache.pig.newplan.logical.relational.LOLoad.getSchema(LOLoad.java:109)
        at
org.apache.pig.newplan.logical.visitor.LineageFindRelVisitor.visit(LineageFindRelVisitor.java:100)
        at
org.apache.pig.newplan.logical.relational.LOLoad.accept(LOLoad.java:218)
        at
org.apache.pig.newplan.DependencyOrderWalker.walk(DependencyOrderWalker.java:75)
        at org.apache.pig.newplan.PlanVisitor.visit(PlanVisitor.java:50)
        at
org.apache.pig.newplan.logical.visitor.CastLineageSetter.<init>(CastLineageSetter.java:57)
        at org.apache.pig.PigServer$Graph.compile(PigServer.java:1679)
        at org.apache.pig.PigServer$Graph.validateQuery(PigServer.java:1610)
        at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1582)
        at org.apache.pig.PigServer.registerQuery(PigServer.java:584)
        at
org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:942)
        at
org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:386)
        at
org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:188)
        at
org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:164)
        at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:69)
        at org.apache.pig.Main.run(Main.java:495)
        at org.apache.pig.Main.main(Main.java:111)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:601)
        at org.apache.hadoop.util.RunJar.main(RunJar.java:197)
Thanks for taking a look. I will start looking into HCatalog too.

Martin
On 12 October 2012 18:56, Dmitriy Ryaboy <[EMAIL PROTECTED]> wrote:

> Martin,
> Do you have the compete stack trace?
> Generally, for Hive interop I recommend HCatalog; AllLoader is neat
> but it's a 3rd party contrib and we don't really know it too well. I
> can check out the error dump and see if there's anything obvious
> though.
>
> D
>
> On Fri, Oct 12, 2012 at 8:48 AM, Martin Goodson
> <[EMAIL PROTECTED]> wrote:
> > I am trying to load some text files in hive partitions on S3 using the
> > AllLoader function with no success. I get an error which indicates that
> > AllLoader is expecting the files to be on hdfs:
> >
> > a = LOAD 's3n://xxxxx/yyyyy/zzz' using
> > org.apache.pig.piggybank.storage.AllLoader();
> > grunt> 2012-10-12 14:51:26,229 [main] ERROR
> > org.apache.pig.tools.grunt.Grunt - ERROR 2999: Unexpected internal error.
> > Wrong FS: s3n://xxxxx/yyyyy/zzz, expected: hdfs://
> > namenode.hadoop.companyname.com
+
Dmitriy Ryaboy 2012-10-18, 04:15
+
Martin Goodson 2012-10-18, 12:22
+
Dmitriy Ryaboy 2012-10-18, 20:59