Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Pig, mail # user - how to load custom Writable class from sequence file?


+
Yang 2013-09-17, 00:37
+
Pradeep Gollakota 2013-09-17, 01:22
+
Yang 2013-09-17, 01:43
+
Pradeep Gollakota 2013-09-17, 01:56
+
Pradeep Gollakota 2013-09-17, 02:47
Copy link to this message
-
Re: how to load custom Writable class from sequence file?
Yang 2013-09-17, 16:20
Thanks Pradeep.

it seems in this case just using scala/cascalog is easier for my purposes.
I tried out scala yesterday, works fine for me in local mode
On Mon, Sep 16, 2013 at 7:47 PM, Pradeep Gollakota <[EMAIL PROTECTED]>wrote:

> It doesn't look like the SequenceFileLoader from the piggybank has much
> support. The elephant bird version looks like it does what you need it to
> do.
>
> https://github.com/kevinweil/elephant-bird/blob/master/pig/src/main/java/com/twitter/elephantbird/pig/load/SequenceFileLoader.java
>
> You'll have to write the converters from your types to Pig data types and
> pass it into the constructor of the SequenceFileLoader.
>
> Hope this helps!
>
>
> On Mon, Sep 16, 2013 at 6:56 PM, Pradeep Gollakota <[EMAIL PROTECTED]
> >wrote:
>
> > Thats correct...
> >
> > The "load ... AS (k:chararray, v:charrary);" doesn't actually do what you
> > think it does. The AS statement tell Pig what the schema types are, so it
> > will call the appropriate LoadCaster method to get it into the right
> type.
> > A LoadCaster object defines how to map byte[] into appropriate Pig
> > datatypes. If the LoadFunc is not schema aware and you don't have the
> > schema defined when you load, everything will be loaded as a bytearray.
> >
> > The problem you have is that the custom writable isn't a Pig datatype. I
> > don't think you'll be able to do this without writing some custom code.
> > I'll take a look at the source code for the SequenceFileLoader and see if
> > there's a way to specify your own LoadCaster. If there is, then you'll
> just
> > have to write a custom LoadCaster and specify it in the configuration. If
> > not, you'll have to extend and roll out your own SequenceFileLoader.
> >
> >
> > On Mon, Sep 16, 2013 at 6:43 PM, Yang <[EMAIL PROTECTED]> wrote:
> >
> >> I think my custom type has toString(), well at least writable() says
> it's
> >> writable to bytes, so supposedly if I force it to bytes or string, pig
> >> should be able to cast
> >> like
> >>
> >> load ... AS ( k:chararray, v:chararray);
> >>
> >> but this actually fails
> >>
> >>
> >> On Mon, Sep 16, 2013 at 6:22 PM, Pradeep Gollakota <
> [EMAIL PROTECTED]
> >> >wrote:
> >>
> >> > The problem is that pig only speaks its data types. So you need to
> tell
> >> it
> >> > how to translate from your custom writable to a pig datatype.
> >> >
> >> > Apparently elephant-bird has some support for doing this type of
> >> thing...
> >> > take a look at this SO post
> >> >
> >> >
> >>
> http://stackoverflow.com/questions/16540651/apache-pig-can-we-convert-a-custom-writable-object-to-pig-format
> >> >
> >> >
> >> > On Mon, Sep 16, 2013 at 5:37 PM, Yang <[EMAIL PROTECTED]> wrote:
> >> >
> >> > > I tried to do a quick and dirty inspection of some of our data
> feeds,
> >> > which
> >> > > are encoded in gzipped SequenceFile.
> >> > >
> >> > > basically I did
> >> > >
> >> > > a = load 'myfile' using ......SequenceFileLoader() AS ( mykey,
> >> myvalue);
> >> > >
> >> > > but it gave me some error:
> >> > > 2013-09-16 17:34:28,915 [Thread-5] INFO
> >> > >  org.apache.hadoop.io.compress.CodecPool - Got brand-new
> decompressor
> >> > > 2013-09-16 17:34:28,915 [Thread-5] INFO
> >> > >  org.apache.hadoop.io.compress.CodecPool - Got brand-new
> decompressor
> >> > > 2013-09-16 17:34:28,915 [Thread-5] INFO
> >> > >  org.apache.hadoop.io.compress.CodecPool - Got brand-new
> decompressor
> >> > > 2013-09-16 17:34:28,961 [Thread-5] WARN
> >> > >  org.apache.pig.piggybank.storage.SequenceFileLoader - Unable to
> >> > translate
> >> > > key class com.mycompany.model.VisitKey to a Pig datatype
> >> > > 2013-09-16 17:34:28,962 [Thread-5] WARN
> >> > >  org.apache.hadoop.mapred.FileOutputCommitter - Output path is null
> in
> >> > > cleanup
> >> > > 2013-09-16 17:34:28,963 [Thread-5] WARN
> >> > >  org.apache.hadoop.mapred.LocalJobRunner - job_local_0001
> >> > > org.apache.pig.backend.BackendException: ERROR 0: Unable to
> translate
> >> > class
> >> > > com.mycompany.model.VisitKey to a Pig datatype
+
Dmitriy Ryaboy 2013-09-24, 09:22
+
John Meagher 2013-09-24, 13:21
+
Dmitriy Ryaboy 2013-09-24, 14:58
+
Yang 2013-09-24, 17:51