Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> How can I read Hive text files on S3 from Pig?


Copy link to this message
-
Re: How can I read Hive text files on S3 from Pig?
Yeah that's a bug in FileLocalizer, apparently it assumes local or
hdfs, only. Could you file a jira?

D

On Sat, Oct 13, 2012 at 2:53 AM, Martin Goodson
<[EMAIL PROTECTED]> wrote:
> Hi Dmitriy,
> here's is the stack trace:
>
> java.lang.IllegalArgumentException: Wrong FS: s3n://xxx/yyy/zz/, expected:
> hdfs://namenode.adsf.companyname.com
>         at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:433)
>         at
> org.apache.hadoop.hdfs.DistributedFileSystem.getPathName(DistributedFileSystem.java:129)
>         at
> org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:523)
>         at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:820)
>         at
> org.apache.pig.backend.hadoop.datastorage.HDataStorage.isContainer(HDataStorage.java:203)
>         at
> org.apache.pig.backend.hadoop.datastorage.HDataStorage.asElement(HDataStorage.java:131)
>         at
> org.apache.pig.backend.hadoop.datastorage.HDataStorage.asElement(HDataStorage.java:147)
>         at
> org.apache.pig.impl.io.FileLocalizer.fullPath(FileLocalizer.java:316)
>         at
> org.apache.pig.piggybank.storage.JsonMetadata.findMetaFile(JsonMetadata.java:94)
>         at
> org.apache.pig.piggybank.storage.JsonMetadata.getSchema(JsonMetadata.java:154)
>         at
> org.apache.pig.piggybank.storage.AllLoader.getSchema(AllLoader.java:400)
>         at
> org.apache.pig.newplan.logical.relational.LOLoad.getSchemaFromMetaData(LOLoad.java:150)
>         at
> org.apache.pig.newplan.logical.relational.LOLoad.getSchema(LOLoad.java:109)
>         at
> org.apache.pig.newplan.logical.visitor.LineageFindRelVisitor.visit(LineageFindRelVisitor.java:100)
>         at
> org.apache.pig.newplan.logical.relational.LOLoad.accept(LOLoad.java:218)
>         at
> org.apache.pig.newplan.DependencyOrderWalker.walk(DependencyOrderWalker.java:75)
>         at org.apache.pig.newplan.PlanVisitor.visit(PlanVisitor.java:50)
>         at
> org.apache.pig.newplan.logical.visitor.CastLineageSetter.<init>(CastLineageSetter.java:57)
>         at org.apache.pig.PigServer$Graph.compile(PigServer.java:1679)
>         at org.apache.pig.PigServer$Graph.validateQuery(PigServer.java:1610)
>         at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1582)
>         at org.apache.pig.PigServer.registerQuery(PigServer.java:584)
>         at
> org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:942)
>         at
> org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:386)
>         at
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:188)
>         at
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:164)
>         at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:69)
>         at org.apache.pig.Main.run(Main.java:495)
>         at org.apache.pig.Main.main(Main.java:111)
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>         at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>         at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>         at java.lang.reflect.Method.invoke(Method.java:601)
>         at org.apache.hadoop.util.RunJar.main(RunJar.java:197)
>
>
> Thanks for taking a look. I will start looking into HCatalog too.
>
> Martin
>
>
> On 12 October 2012 18:56, Dmitriy Ryaboy <[EMAIL PROTECTED]> wrote:
>
>> Martin,
>> Do you have the compete stack trace?
>> Generally, for Hive interop I recommend HCatalog; AllLoader is neat
>> but it's a 3rd party contrib and we don't really know it too well. I
>> can check out the error dump and see if there's anything obvious
>> though.
>>
>> D
>>
>> On Fri, Oct 12, 2012 at 8:48 AM, Martin Goodson
>> <[EMAIL PROTECTED]> wrote:
>> > I am trying to load some text files in hive partitions on S3 using the
>> > AllLoader function with no success. I get an error which indicates that
>> > AllLoader is expecting the files to be on hdfs: