|
Gayatri Rao
2011-10-24, 07:21
Daniel Dai
2011-10-24, 07:55
Dmitriy Ryaboy
2011-10-24, 18:55
Gayatri Rao
2011-10-25, 00:57
Gayatri Rao
2011-10-25, 01:33
Dmitriy Ryaboy
2011-10-25, 01:49
Gayatri Rao
2011-10-25, 01:52
Dmitriy Ryaboy
2011-10-25, 03:33
Gayatri Rao
2011-10-25, 14:41
Dmitriy Ryaboy
2011-10-26, 07:57
Gayatri Rao
2011-10-28, 12:49
Ashutosh Chauhan
2011-10-28, 17:51
Gayatri Rao
2011-10-28, 23:09
Gayatri Rao
2011-10-29, 00:49
|
-
Regarding Sequence File Loader in piggybankGayatri Rao 2011-10-24, 07:21
Hi All,
I am trying to use the sequence file loader in piggybank for my custom writable object. I am working with pig 0.8, It looks like it does not work for user defined custom writables? Any pointers on how I can write a loader for my own custom writable? Thanks, Gayatri
-
Re: Regarding Sequence File Loader in piggybankDaniel Dai 2011-10-24, 07:55
I think it is the SequenceFileLoader.translateWritableToPigDataType which
does not support custom writable. Try to enhance translateWritableToPigDataType. Daniel On Mon, Oct 24, 2011 at 12:21 AM, Gayatri Rao <[EMAIL PROTECTED]> wrote: > Hi All, > > I am trying to use the sequence file loader in piggybank for my custom > writable object. I am working with pig 0.8, It looks like it does not work > for user defined custom writables? > Any pointers on how I can write a loader for my own custom writable? > > Thanks, > Gayatri >
-
Re: Regarding Sequence File Loader in piggybankDmitriy Ryaboy 2011-10-24, 18:55
We have a massively improved (well, rewritten from scratch) SequenceLoader
in elephantbird. Take a look here: https://github.com/kevinweil/elephant-bird/blob/master/src/java/com/twitter/elephantbird/pig/load/SequenceFileLoader.java No separate readme on usage, but all the related classes are well-documented in Javadocs. D On Mon, Oct 24, 2011 at 12:55 AM, Daniel Dai <[EMAIL PROTECTED]> wrote: > I think it is the SequenceFileLoader.translateWritableToPigDataType which > does not support custom writable. Try to enhance > translateWritableToPigDataType. > > Daniel > > On Mon, Oct 24, 2011 at 12:21 AM, Gayatri Rao <[EMAIL PROTECTED]> wrote: > > > Hi All, > > > > I am trying to use the sequence file loader in piggybank for my custom > > writable object. I am working with pig 0.8, It looks like it does not > work > > for user defined custom writables? > > Any pointers on how I can write a loader for my own custom writable? > > > > Thanks, > > Gayatri > > >
-
Re: Regarding Sequence File Loader in piggybankGayatri Rao 2011-10-25, 00:57
Thank Dmitriy. Are the jars available in maven repository?
Thanks, Gayatri On Mon, Oct 24, 2011 at 11:55 AM, Dmitriy Ryaboy <[EMAIL PROTECTED]> wrote: > We have a massively improved (well, rewritten from scratch) SequenceLoader > in elephantbird. Take a look here: > > https://github.com/kevinweil/elephant-bird/blob/master/src/java/com/twitter/elephantbird/pig/load/SequenceFileLoader.java > > No separate readme on usage, but all the related classes are > well-documented > in Javadocs. > > D > > On Mon, Oct 24, 2011 at 12:55 AM, Daniel Dai <[EMAIL PROTECTED]> > wrote: > > > I think it is the SequenceFileLoader.translateWritableToPigDataType which > > does not support custom writable. Try to enhance > > translateWritableToPigDataType. > > > > Daniel > > > > On Mon, Oct 24, 2011 at 12:21 AM, Gayatri Rao <[EMAIL PROTECTED]> > wrote: > > > > > Hi All, > > > > > > I am trying to use the sequence file loader in piggybank for my custom > > > writable object. I am working with pig 0.8, It looks like it does not > > work > > > for user defined custom writables? > > > Any pointers on how I can write a loader for my own custom writable? > > > > > > Thanks, > > > Gayatri > > > > > >
-
Re: Regarding Sequence File Loader in piggybankGayatri Rao 2011-10-25, 01:33
I have checked the SequenceFileLoader from elephantbird and it seems to use
a different SequenceFileLoader as oppose to the one there is in piggybank Is there any reason for that? On Mon, Oct 24, 2011 at 5:57 PM, Gayatri Rao <[EMAIL PROTECTED]> wrote: > Thank Dmitriy. Are the jars available in maven repository? > > Thanks, > Gayatri > > > On Mon, Oct 24, 2011 at 11:55 AM, Dmitriy Ryaboy <[EMAIL PROTECTED]>wrote: > >> We have a massively improved (well, rewritten from scratch) SequenceLoader >> in elephantbird. Take a look here: >> >> https://github.com/kevinweil/elephant-bird/blob/master/src/java/com/twitter/elephantbird/pig/load/SequenceFileLoader.java >> >> No separate readme on usage, but all the related classes are >> well-documented >> in Javadocs. >> >> D >> >> On Mon, Oct 24, 2011 at 12:55 AM, Daniel Dai <[EMAIL PROTECTED]> >> wrote: >> >> > I think it is the SequenceFileLoader.translateWritableToPigDataType >> which >> > does not support custom writable. Try to enhance >> > translateWritableToPigDataType. >> > >> > Daniel >> > >> > On Mon, Oct 24, 2011 at 12:21 AM, Gayatri Rao <[EMAIL PROTECTED]> >> wrote: >> > >> > > Hi All, >> > > >> > > I am trying to use the sequence file loader in piggybank for my custom >> > > writable object. I am working with pig 0.8, It looks like it does not >> > work >> > > for user defined custom writables? >> > > Any pointers on how I can write a loader for my own custom writable? >> > > >> > > Thanks, >> > > Gayatri >> > > >> > >> > >
-
Re: Regarding Sequence File Loader in piggybankDmitriy Ryaboy 2011-10-25, 01:49
Correct -- it's completely rewritten.
We haven't published EB to a public maven repo, though I believe we did add a "maven-install" ant target to publish to your local maven repo. D On Mon, Oct 24, 2011 at 6:33 PM, Gayatri Rao <[EMAIL PROTECTED]> wrote: > I have checked the SequenceFileLoader from elephantbird and it seems to > use > a different SequenceFileLoader as oppose to the one there is in piggybank > Is there any reason for that? > > On Mon, Oct 24, 2011 at 5:57 PM, Gayatri Rao <[EMAIL PROTECTED]> wrote: > > > Thank Dmitriy. Are the jars available in maven repository? > > > > Thanks, > > Gayatri > > > > > > On Mon, Oct 24, 2011 at 11:55 AM, Dmitriy Ryaboy <[EMAIL PROTECTED] > >wrote: > > > >> We have a massively improved (well, rewritten from scratch) > SequenceLoader > >> in elephantbird. Take a look here: > >> > >> > https://github.com/kevinweil/elephant-bird/blob/master/src/java/com/twitter/elephantbird/pig/load/SequenceFileLoader.java > >> > >> No separate readme on usage, but all the related classes are > >> well-documented > >> in Javadocs. > >> > >> D > >> > >> On Mon, Oct 24, 2011 at 12:55 AM, Daniel Dai <[EMAIL PROTECTED]> > >> wrote: > >> > >> > I think it is the SequenceFileLoader.translateWritableToPigDataType > >> which > >> > does not support custom writable. Try to enhance > >> > translateWritableToPigDataType. > >> > > >> > Daniel > >> > > >> > On Mon, Oct 24, 2011 at 12:21 AM, Gayatri Rao <[EMAIL PROTECTED]> > >> wrote: > >> > > >> > > Hi All, > >> > > > >> > > I am trying to use the sequence file loader in piggybank for my > custom > >> > > writable object. I am working with pig 0.8, It looks like it does > not > >> > work > >> > > for user defined custom writables? > >> > > Any pointers on how I can write a loader for my own custom writable? > >> > > > >> > > Thanks, > >> > > Gayatri > >> > > > >> > > >> > > > > >
-
Re: Regarding Sequence File Loader in piggybankGayatri Rao 2011-10-25, 01:52
Thats great, thanks, I ll check it out. Is thrift a dependency for building?
On Mon, Oct 24, 2011 at 6:49 PM, Dmitriy Ryaboy <[EMAIL PROTECTED]> wrote: > Correct -- it's completely rewritten. > > We haven't published EB to a public maven repo, though I believe we did add > a "maven-install" ant target to publish to your local maven repo. > > D > > On Mon, Oct 24, 2011 at 6:33 PM, Gayatri Rao <[EMAIL PROTECTED]> wrote: > > > I have checked the SequenceFileLoader from elephantbird and it seems to > > use > > a different SequenceFileLoader as oppose to the one there is in piggybank > > Is there any reason for that? > > > > On Mon, Oct 24, 2011 at 5:57 PM, Gayatri Rao <[EMAIL PROTECTED]> > wrote: > > > > > Thank Dmitriy. Are the jars available in maven repository? > > > > > > Thanks, > > > Gayatri > > > > > > > > > On Mon, Oct 24, 2011 at 11:55 AM, Dmitriy Ryaboy <[EMAIL PROTECTED] > > >wrote: > > > > > >> We have a massively improved (well, rewritten from scratch) > > SequenceLoader > > >> in elephantbird. Take a look here: > > >> > > >> > > > https://github.com/kevinweil/elephant-bird/blob/master/src/java/com/twitter/elephantbird/pig/load/SequenceFileLoader.java > > >> > > >> No separate readme on usage, but all the related classes are > > >> well-documented > > >> in Javadocs. > > >> > > >> D > > >> > > >> On Mon, Oct 24, 2011 at 12:55 AM, Daniel Dai <[EMAIL PROTECTED]> > > >> wrote: > > >> > > >> > I think it is the SequenceFileLoader.translateWritableToPigDataType > > >> which > > >> > does not support custom writable. Try to enhance > > >> > translateWritableToPigDataType. > > >> > > > >> > Daniel > > >> > > > >> > On Mon, Oct 24, 2011 at 12:21 AM, Gayatri Rao <[EMAIL PROTECTED]> > > >> wrote: > > >> > > > >> > > Hi All, > > >> > > > > >> > > I am trying to use the sequence file loader in piggybank for my > > custom > > >> > > writable object. I am working with pig 0.8, It looks like it does > > not > > >> > work > > >> > > for user defined custom writables? > > >> > > Any pointers on how I can write a loader for my own custom > writable? > > >> > > > > >> > > Thanks, > > >> > > Gayatri > > >> > > > > >> > > > >> > > > > > > > > >
-
Re: Regarding Sequence File Loader in piggybankDmitriy Ryaboy 2011-10-25, 03:33
you can compile it with "ant -Dnothrift=true"
There's also a "-Dnoprotobuf=true" option, but I just tried it and it seems we do require protobufs in 1 place that's not excluded when we skip protocol buffers, so you still need protoc version 2.3 D On Mon, Oct 24, 2011 at 6:52 PM, Gayatri Rao <[EMAIL PROTECTED]> wrote: > Thats great, thanks, I ll check it out. Is thrift a dependency for > building? > > On Mon, Oct 24, 2011 at 6:49 PM, Dmitriy Ryaboy <[EMAIL PROTECTED]> > wrote: > > > Correct -- it's completely rewritten. > > > > We haven't published EB to a public maven repo, though I believe we did > add > > a "maven-install" ant target to publish to your local maven repo. > > > > D > > > > On Mon, Oct 24, 2011 at 6:33 PM, Gayatri Rao <[EMAIL PROTECTED]> > wrote: > > > > > I have checked the SequenceFileLoader from elephantbird and it seems > to > > > use > > > a different SequenceFileLoader as oppose to the one there is in > piggybank > > > Is there any reason for that? > > > > > > On Mon, Oct 24, 2011 at 5:57 PM, Gayatri Rao <[EMAIL PROTECTED]> > > wrote: > > > > > > > Thank Dmitriy. Are the jars available in maven repository? > > > > > > > > Thanks, > > > > Gayatri > > > > > > > > > > > > On Mon, Oct 24, 2011 at 11:55 AM, Dmitriy Ryaboy <[EMAIL PROTECTED] > > > >wrote: > > > > > > > >> We have a massively improved (well, rewritten from scratch) > > > SequenceLoader > > > >> in elephantbird. Take a look here: > > > >> > > > >> > > > > > > https://github.com/kevinweil/elephant-bird/blob/master/src/java/com/twitter/elephantbird/pig/load/SequenceFileLoader.java > > > >> > > > >> No separate readme on usage, but all the related classes are > > > >> well-documented > > > >> in Javadocs. > > > >> > > > >> D > > > >> > > > >> On Mon, Oct 24, 2011 at 12:55 AM, Daniel Dai <[EMAIL PROTECTED] > > > > > >> wrote: > > > >> > > > >> > I think it is the > SequenceFileLoader.translateWritableToPigDataType > > > >> which > > > >> > does not support custom writable. Try to enhance > > > >> > translateWritableToPigDataType. > > > >> > > > > >> > Daniel > > > >> > > > > >> > On Mon, Oct 24, 2011 at 12:21 AM, Gayatri Rao < > [EMAIL PROTECTED]> > > > >> wrote: > > > >> > > > > >> > > Hi All, > > > >> > > > > > >> > > I am trying to use the sequence file loader in piggybank for my > > > custom > > > >> > > writable object. I am working with pig 0.8, It looks like it > does > > > not > > > >> > work > > > >> > > for user defined custom writables? > > > >> > > Any pointers on how I can write a loader for my own custom > > writable? > > > >> > > > > > >> > > Thanks, > > > >> > > Gayatri > > > >> > > > > > >> > > > > >> > > > > > > > > > > > > > >
-
Re: Regarding Sequence File Loader in piggybankGayatri Rao 2011-10-25, 14:41
Thanks Dmitriy.
I was trying to implement the MyClassConverter for my custom class and I overided and implemented the method @Override public Object bytesToObject(DataByteArray dataByteArray) throws IOException { MyClass o = (MyClass) ReflectionUtils.newInstance(MyClass.class, null); o.readFields(new DataInputStream(new ByteArrayInputStream(dataByteArray .get()))); return o; } and my MyClass.readFields is as follows: @Override public void readFields(DataInput in) throws IOException { num = in.readInt(); list = new ArrayList<String>(); for (int i = 0; i < 3; i++) { list.add(WritableUtils.readString(in)); } } This puts some weird data in num and list. Any idea what I might be doing wrong? On Tue, Oct 25, 2011 at 9:03 AM, Dmitriy Ryaboy <[EMAIL PROTECTED]> wrote: > you can compile it with "ant -Dnothrift=true" > > There's also a "-Dnoprotobuf=true" option, but I just tried it and it seems > we do require protobufs in 1 place that's not excluded when we skip > protocol > buffers, so you still need protoc version 2.3 > > D > > On Mon, Oct 24, 2011 at 6:52 PM, Gayatri Rao <[EMAIL PROTECTED]> wrote: > > > Thats great, thanks, I ll check it out. Is thrift a dependency for > > building? > > > > On Mon, Oct 24, 2011 at 6:49 PM, Dmitriy Ryaboy <[EMAIL PROTECTED]> > > wrote: > > > > > Correct -- it's completely rewritten. > > > > > > We haven't published EB to a public maven repo, though I believe we did > > add > > > a "maven-install" ant target to publish to your local maven repo. > > > > > > D > > > > > > On Mon, Oct 24, 2011 at 6:33 PM, Gayatri Rao <[EMAIL PROTECTED]> > > wrote: > > > > > > > I have checked the SequenceFileLoader from elephantbird and it seems > > to > > > > use > > > > a different SequenceFileLoader as oppose to the one there is in > > piggybank > > > > Is there any reason for that? > > > > > > > > On Mon, Oct 24, 2011 at 5:57 PM, Gayatri Rao <[EMAIL PROTECTED]> > > > wrote: > > > > > > > > > Thank Dmitriy. Are the jars available in maven repository? > > > > > > > > > > Thanks, > > > > > Gayatri > > > > > > > > > > > > > > > On Mon, Oct 24, 2011 at 11:55 AM, Dmitriy Ryaboy < > [EMAIL PROTECTED] > > > > >wrote: > > > > > > > > > >> We have a massively improved (well, rewritten from scratch) > > > > SequenceLoader > > > > >> in elephantbird. Take a look here: > > > > >> > > > > >> > > > > > > > > > > https://github.com/kevinweil/elephant-bird/blob/master/src/java/com/twitter/elephantbird/pig/load/SequenceFileLoader.java > > > > >> > > > > >> No separate readme on usage, but all the related classes are > > > > >> well-documented > > > > >> in Javadocs. > > > > >> > > > > >> D > > > > >> > > > > >> On Mon, Oct 24, 2011 at 12:55 AM, Daniel Dai < > [EMAIL PROTECTED] > > > > > > > >> wrote: > > > > >> > > > > >> > I think it is the > > SequenceFileLoader.translateWritableToPigDataType > > > > >> which > > > > >> > does not support custom writable. Try to enhance > > > > >> > translateWritableToPigDataType. > > > > >> > > > > > >> > Daniel > > > > >> > > > > > >> > On Mon, Oct 24, 2011 at 12:21 AM, Gayatri Rao < > > [EMAIL PROTECTED]> > > > > >> wrote: > > > > >> > > > > > >> > > Hi All, > > > > >> > > > > > > >> > > I am trying to use the sequence file loader in piggybank for > my > > > > custom > > > > >> > > writable object. I am working with pig 0.8, It looks like it > > does > > > > not > > > > >> > work > > > > >> > > for user defined custom writables? > > > > >> > > Any pointers on how I can write a loader for my own custom > > > writable? > > > > >> > > > > > > >> > > Thanks, > > > > >> > > Gayatri > > > > >> > > > > > > >> > > > > > >> > > > > > > > > > > > > > > > > > > > >
-
Re: Regarding Sequence File Loader in piggybankDmitriy Ryaboy 2011-10-26, 07:57
What do you expect to see, how did you create it, and what are the weird values?
Any chance your compression settings are different for writing and reading? On Tue, Oct 25, 2011 at 7:41 AM, Gayatri Rao <[EMAIL PROTECTED]> wrote: > Thanks Dmitriy. > I was trying to implement the MyClassConverter for my custom class > and I overided and implemented the method > > @Override > public Object bytesToObject(DataByteArray dataByteArray) throws > IOException { > > MyClass o = (MyClass) ReflectionUtils.newInstance(MyClass.class, > null); > o.readFields(new DataInputStream(new > ByteArrayInputStream(dataByteArray > .get()))); > return o; > > } > > and my MyClass.readFields is as follows: > > @Override > public void readFields(DataInput in) throws IOException { > num = in.readInt(); > list = new ArrayList<String>(); > for (int i = 0; i < 3; i++) { > list.add(WritableUtils.readString(in)); > } > > } > > This puts some weird data in num and list. Any idea what I might be doing > wrong? > > > On Tue, Oct 25, 2011 at 9:03 AM, Dmitriy Ryaboy <[EMAIL PROTECTED]> wrote: > >> you can compile it with "ant -Dnothrift=true" >> >> There's also a "-Dnoprotobuf=true" option, but I just tried it and it seems >> we do require protobufs in 1 place that's not excluded when we skip >> protocol >> buffers, so you still need protoc version 2.3 >> >> D >> >> On Mon, Oct 24, 2011 at 6:52 PM, Gayatri Rao <[EMAIL PROTECTED]> wrote: >> >> > Thats great, thanks, I ll check it out. Is thrift a dependency for >> > building? >> > >> > On Mon, Oct 24, 2011 at 6:49 PM, Dmitriy Ryaboy <[EMAIL PROTECTED]> >> > wrote: >> > >> > > Correct -- it's completely rewritten. >> > > >> > > We haven't published EB to a public maven repo, though I believe we did >> > add >> > > a "maven-install" ant target to publish to your local maven repo. >> > > >> > > D >> > > >> > > On Mon, Oct 24, 2011 at 6:33 PM, Gayatri Rao <[EMAIL PROTECTED]> >> > wrote: >> > > >> > > > I have checked the SequenceFileLoader from elephantbird and it seems >> > to >> > > > use >> > > > a different SequenceFileLoader as oppose to the one there is in >> > piggybank >> > > > Is there any reason for that? >> > > > >> > > > On Mon, Oct 24, 2011 at 5:57 PM, Gayatri Rao <[EMAIL PROTECTED]> >> > > wrote: >> > > > >> > > > > Thank Dmitriy. Are the jars available in maven repository? >> > > > > >> > > > > Thanks, >> > > > > Gayatri >> > > > > >> > > > > >> > > > > On Mon, Oct 24, 2011 at 11:55 AM, Dmitriy Ryaboy < >> [EMAIL PROTECTED] >> > > > >wrote: >> > > > > >> > > > >> We have a massively improved (well, rewritten from scratch) >> > > > SequenceLoader >> > > > >> in elephantbird. Take a look here: >> > > > >> >> > > > >> >> > > > >> > > >> > >> https://github.com/kevinweil/elephant-bird/blob/master/src/java/com/twitter/elephantbird/pig/load/SequenceFileLoader.java >> > > > >> >> > > > >> No separate readme on usage, but all the related classes are >> > > > >> well-documented >> > > > >> in Javadocs. >> > > > >> >> > > > >> D >> > > > >> >> > > > >> On Mon, Oct 24, 2011 at 12:55 AM, Daniel Dai < >> [EMAIL PROTECTED] >> > > >> > > > >> wrote: >> > > > >> >> > > > >> > I think it is the >> > SequenceFileLoader.translateWritableToPigDataType >> > > > >> which >> > > > >> > does not support custom writable. Try to enhance >> > > > >> > translateWritableToPigDataType. >> > > > >> > >> > > > >> > Daniel >> > > > >> > >> > > > >> > On Mon, Oct 24, 2011 at 12:21 AM, Gayatri Rao < >> > [EMAIL PROTECTED]> >> > > > >> wrote: >> > > > >> > >> > > > >> > > Hi All, >> > > > >> > > >> > > > >> > > I am trying to use the sequence file loader in piggybank for >> my >> > > > custom >> > > > >> > > writable object. I am working with pig 0.8, It looks like it >> > does >> > > > not >> > > > >> > work >> > > > >> > > for user defined custom writables? >> > > > >> > > Any pointers on how I can write a loader for my own custom >> >
-
Re: Regarding Sequence File Loader in piggybankGayatri Rao 2011-10-28, 12:49
Sorry that was some bug at my writeFields method. its fixed now and I am
able to load and dump the data. In SequenceFileLoader I have defined the corresponding keyconverter and value converter classes. So, when I say raw = load 'in.txt' using SequenceFileLoader; dump raw It dumps the data but when I want to project the fields, it gives an error do i have to explicity specify the schema in load ? like: raw = load 'in.txt' using SequenceFileLoader as (t:(a:int, b:chararray,...)) On Wed, Oct 26, 2011 at 1:27 PM, Dmitriy Ryaboy <[EMAIL PROTECTED]> wrote: > What do you expect to see, how did you create it, and what are the weird > values? > Any chance your compression settings are different for writing and reading? > > On Tue, Oct 25, 2011 at 7:41 AM, Gayatri Rao <[EMAIL PROTECTED]> wrote: > > Thanks Dmitriy. > > I was trying to implement the MyClassConverter for my custom class > > and I overided and implemented the method > > > > @Override > > public Object bytesToObject(DataByteArray dataByteArray) throws > > IOException { > > > > MyClass o = (MyClass) ReflectionUtils.newInstance(MyClass.class, > > null); > > o.readFields(new DataInputStream(new > > ByteArrayInputStream(dataByteArray > > .get()))); > > return o; > > > > } > > > > and my MyClass.readFields is as follows: > > > > @Override > > public void readFields(DataInput in) throws IOException { > > num = in.readInt(); > > list = new ArrayList<String>(); > > for (int i = 0; i < 3; i++) { > > list.add(WritableUtils.readString(in)); > > } > > > > } > > > > This puts some weird data in num and list. Any idea what I might be doing > > wrong? > > > > > > On Tue, Oct 25, 2011 at 9:03 AM, Dmitriy Ryaboy <[EMAIL PROTECTED]> > wrote: > > > >> you can compile it with "ant -Dnothrift=true" > >> > >> There's also a "-Dnoprotobuf=true" option, but I just tried it and it > seems > >> we do require protobufs in 1 place that's not excluded when we skip > >> protocol > >> buffers, so you still need protoc version 2.3 > >> > >> D > >> > >> On Mon, Oct 24, 2011 at 6:52 PM, Gayatri Rao <[EMAIL PROTECTED]> > wrote: > >> > >> > Thats great, thanks, I ll check it out. Is thrift a dependency for > >> > building? > >> > > >> > On Mon, Oct 24, 2011 at 6:49 PM, Dmitriy Ryaboy <[EMAIL PROTECTED]> > >> > wrote: > >> > > >> > > Correct -- it's completely rewritten. > >> > > > >> > > We haven't published EB to a public maven repo, though I believe we > did > >> > add > >> > > a "maven-install" ant target to publish to your local maven repo. > >> > > > >> > > D > >> > > > >> > > On Mon, Oct 24, 2011 at 6:33 PM, Gayatri Rao <[EMAIL PROTECTED]> > >> > wrote: > >> > > > >> > > > I have checked the SequenceFileLoader from elephantbird and it > seems > >> > to > >> > > > use > >> > > > a different SequenceFileLoader as oppose to the one there is in > >> > piggybank > >> > > > Is there any reason for that? > >> > > > > >> > > > On Mon, Oct 24, 2011 at 5:57 PM, Gayatri Rao <[EMAIL PROTECTED] > > > >> > > wrote: > >> > > > > >> > > > > Thank Dmitriy. Are the jars available in maven repository? > >> > > > > > >> > > > > Thanks, > >> > > > > Gayatri > >> > > > > > >> > > > > > >> > > > > On Mon, Oct 24, 2011 at 11:55 AM, Dmitriy Ryaboy < > >> [EMAIL PROTECTED] > >> > > > >wrote: > >> > > > > > >> > > > >> We have a massively improved (well, rewritten from scratch) > >> > > > SequenceLoader > >> > > > >> in elephantbird. Take a look here: > >> > > > >> > >> > > > >> > >> > > > > >> > > > >> > > >> > https://github.com/kevinweil/elephant-bird/blob/master/src/java/com/twitter/elephantbird/pig/load/SequenceFileLoader.java > >> > > > >> > >> > > > >> No separate readme on usage, but all the related classes are > >> > > > >> well-documented > >> > > > >> in Javadocs. > >> > > > >> > >> > > > >> D > >> > > > >> > >> > > > >> On Mon, Oct 24, 2011 at 12:55 AM, Daniel Dai < > >> [EMAIL PROTECTED] > >> > > > >> > >
-
Re: Regarding Sequence File Loader in piggybankAshutosh Chauhan 2011-10-28, 17:51
Please paste the error that you are getting.
Ashutosh On Fri, Oct 28, 2011 at 05:49, Gayatri Rao <[EMAIL PROTECTED]> wrote: > Sorry that was some bug at my writeFields method. its fixed now and I am > able to load and dump the data. > In SequenceFileLoader I have defined the corresponding keyconverter and > value converter classes. > > So, when I say > raw = load 'in.txt' using SequenceFileLoader; > dump raw > > It dumps the data but when I want to project the fields, it gives an error > do i have to explicity specify the schema in load ? like: > > raw = load 'in.txt' using SequenceFileLoader as (t:(a:int, > b:chararray,...)) > > > > On Wed, Oct 26, 2011 at 1:27 PM, Dmitriy Ryaboy <[EMAIL PROTECTED]> > wrote: > > > What do you expect to see, how did you create it, and what are the weird > > values? > > Any chance your compression settings are different for writing and > reading? > > > > On Tue, Oct 25, 2011 at 7:41 AM, Gayatri Rao <[EMAIL PROTECTED]> > wrote: > > > Thanks Dmitriy. > > > I was trying to implement the MyClassConverter for my custom class > > > and I overided and implemented the method > > > > > > @Override > > > public Object bytesToObject(DataByteArray dataByteArray) throws > > > IOException { > > > > > > MyClass o = (MyClass) ReflectionUtils.newInstance(MyClass.class, > > > null); > > > o.readFields(new DataInputStream(new > > > ByteArrayInputStream(dataByteArray > > > .get()))); > > > return o; > > > > > > } > > > > > > and my MyClass.readFields is as follows: > > > > > > @Override > > > public void readFields(DataInput in) throws IOException { > > > num = in.readInt(); > > > list = new ArrayList<String>(); > > > for (int i = 0; i < 3; i++) { > > > list.add(WritableUtils.readString(in)); > > > } > > > > > > } > > > > > > This puts some weird data in num and list. Any idea what I might be > doing > > > wrong? > > > > > > > > > On Tue, Oct 25, 2011 at 9:03 AM, Dmitriy Ryaboy <[EMAIL PROTECTED]> > > wrote: > > > > > >> you can compile it with "ant -Dnothrift=true" > > >> > > >> There's also a "-Dnoprotobuf=true" option, but I just tried it and it > > seems > > >> we do require protobufs in 1 place that's not excluded when we skip > > >> protocol > > >> buffers, so you still need protoc version 2.3 > > >> > > >> D > > >> > > >> On Mon, Oct 24, 2011 at 6:52 PM, Gayatri Rao <[EMAIL PROTECTED]> > > wrote: > > >> > > >> > Thats great, thanks, I ll check it out. Is thrift a dependency for > > >> > building? > > >> > > > >> > On Mon, Oct 24, 2011 at 6:49 PM, Dmitriy Ryaboy <[EMAIL PROTECTED] > > > > >> > wrote: > > >> > > > >> > > Correct -- it's completely rewritten. > > >> > > > > >> > > We haven't published EB to a public maven repo, though I believe > we > > did > > >> > add > > >> > > a "maven-install" ant target to publish to your local maven repo. > > >> > > > > >> > > D > > >> > > > > >> > > On Mon, Oct 24, 2011 at 6:33 PM, Gayatri Rao <[EMAIL PROTECTED] > > > > >> > wrote: > > >> > > > > >> > > > I have checked the SequenceFileLoader from elephantbird and it > > seems > > >> > to > > >> > > > use > > >> > > > a different SequenceFileLoader as oppose to the one there is in > > >> > piggybank > > >> > > > Is there any reason for that? > > >> > > > > > >> > > > On Mon, Oct 24, 2011 at 5:57 PM, Gayatri Rao < > [EMAIL PROTECTED] > > > > > >> > > wrote: > > >> > > > > > >> > > > > Thank Dmitriy. Are the jars available in maven repository? > > >> > > > > > > >> > > > > Thanks, > > >> > > > > Gayatri > > >> > > > > > > >> > > > > > > >> > > > > On Mon, Oct 24, 2011 at 11:55 AM, Dmitriy Ryaboy < > > >> [EMAIL PROTECTED] > > >> > > > >wrote: > > >> > > > > > > >> > > > >> We have a massively improved (well, rewritten from scratch) > > >> > > > SequenceLoader > > >> > > > >> in elephantbird. Take a look here: > > >> > > > >> > > >> > > > >> > > >> > > > > > >> > > > > >> > > > >> > > > https://github.com/kevinweil/elephant-bird/blob/master/src/java/com/twitter/elephantbird/pig/load/SequenceFileLoader.java
-
Re: Regarding Sequence File Loader in piggybankGayatri Rao 2011-10-28, 23:09
I get the following error
My script: raw = LOAD 'MycustomData.seq' USING com.twitter.elephantbird.pig.load.SequenceFileLoader( '-c com.twitter.elephantbird.pig.load.MyCustomDataConverter', '-c com.twitter.elephantbird.pig.load.NullWritableConverter') ; first = FOREACH raw GENERATE $0; userIds= FOREACH first GENERATE key.userId ; dump userIds; org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: Unable to open iterator for alias userIds at org.apache.pig.PigServer.openIterator(PigServer.java:765) at com.glassdoor.bigdata.pigUDF.storage.TestSequenceFileLoader.testLoad(TestSequenceFileLoader.java:54) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at junit.framework.TestCase.runTest(TestCase.java:168) at junit.framework.TestCase.runBare(TestCase.java:134) at junit.framework.TestResult$1.protect(TestResult.java:110) at junit.framework.TestResult.runProtected(TestResult.java:128) at junit.framework.TestResult.run(TestResult.java:113) at junit.framework.TestCase.run(TestCase.java:124) at junit.framework.TestSuite.runTest(TestSuite.java:232) at junit.framework.TestSuite.run(TestSuite.java:227) at org.junit.internal.runners.JUnit38ClassRunner.run(JUnit38ClassRunner.java:81) at org.eclipse.jdt.internal.junit4.runner.JUnit4TestReference.run(JUnit4TestReference.java:49) at org.eclipse.jdt.internal.junit.runner.TestExecution.run(TestExecution.java:38) at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:467) at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:683) at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.run(RemoteTestRunner.java:390) at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.main(RemoteTestRunner.java:197) Caused by: java.io.IOException: Job terminated with anomalous status FAILED at org.apache.pig.PigServer.openIterator(PigServer.java:755) ... 20 more Thanks Gayatri On Fri, Oct 28, 2011 at 10:51 AM, Ashutosh Chauhan <[EMAIL PROTECTED]>wrote: > Please paste the error that you are getting. > > Ashutosh > On Fri, Oct 28, 2011 at 05:49, Gayatri Rao <[EMAIL PROTECTED]> wrote: > > > Sorry that was some bug at my writeFields method. its fixed now and I am > > able to load and dump the data. > > In SequenceFileLoader I have defined the corresponding keyconverter and > > value converter classes. > > > > So, when I say > > raw = load 'in.txt' using SequenceFileLoader; > > dump raw > > > > It dumps the data but when I want to project the fields, it gives an > error > > do i have to explicity specify the schema in load ? like: > > > > raw = load 'in.txt' using SequenceFileLoader as (t:(a:int, > > b:chararray,...)) > > > > > > > > On Wed, Oct 26, 2011 at 1:27 PM, Dmitriy Ryaboy <[EMAIL PROTECTED]> > > wrote: > > > > > What do you expect to see, how did you create it, and what are the > weird > > > values? > > > Any chance your compression settings are different for writing and > > reading? > > > > > > On Tue, Oct 25, 2011 at 7:41 AM, Gayatri Rao <[EMAIL PROTECTED]> > > wrote: > > > > Thanks Dmitriy. > > > > I was trying to implement the MyClassConverter for my custom class > > > > and I overided and implemented the method > > > > > > > > @Override > > > > public Object bytesToObject(DataByteArray dataByteArray) throws > > > > IOException { > > > > > > > > MyClass o = (MyClass) > ReflectionUtils.newInstance(MyClass.class, > > > > null); > > > > o.readFields(new DataInputStream(new > > > > ByteArrayInputStream(dataByteArray > > > > .get()))); > > > > return o; > > > > > > > > } > > > > > > > > and my MyClass.readFields is as follows: > > > > > > > > @Override
-
Re: Regarding Sequence File Loader in piggybankGayatri Rao 2011-10-29, 00:49
I implemented the following methods in my MyCustomDataConverter which
extends AbstractWritableConverter<MyCustomData> *MyCustomData* toWritable(*DataByteArray *value) *Object* bytesToObject(*DataByteArray* dataByteArray) *ResourceFieldSchema *getLoadSchema() *Tuple* toTuple(*MyCustomData *customData, *ResourceFieldSchema *schema) Are there any more methods that I need to implement ? On Fri, Oct 28, 2011 at 4:09 PM, Gayatri Rao <[EMAIL PROTECTED]> wrote: > I get the following error > > My script: > > raw = LOAD 'MycustomData.seq' USING > com.twitter.elephantbird.pig.load.SequenceFileLoader( '-c > com.twitter.elephantbird.pig.load.MyCustomDataConverter', '-c > com.twitter.elephantbird.pig.load.NullWritableConverter') ; > first = FOREACH raw GENERATE $0; > userIds= FOREACH first GENERATE key.userId ; > dump userIds; > > > org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: Unable to > open iterator for alias userIds > at org.apache.pig.PigServer.openIterator(PigServer.java:765) > at > com.glassdoor.bigdata.pigUDF.storage.TestSequenceFileLoader.testLoad(TestSequenceFileLoader.java:54) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at junit.framework.TestCase.runTest(TestCase.java:168) > at junit.framework.TestCase.runBare(TestCase.java:134) > at junit.framework.TestResult$1.protect(TestResult.java:110) > at junit.framework.TestResult.runProtected(TestResult.java:128) > at junit.framework.TestResult.run(TestResult.java:113) > at junit.framework.TestCase.run(TestCase.java:124) > at junit.framework.TestSuite.runTest(TestSuite.java:232) > at junit.framework.TestSuite.run(TestSuite.java:227) > at > org.junit.internal.runners.JUnit38ClassRunner.run(JUnit38ClassRunner.java:81) > at > org.eclipse.jdt.internal.junit4.runner.JUnit4TestReference.run(JUnit4TestReference.java:49) > at > org.eclipse.jdt.internal.junit.runner.TestExecution.run(TestExecution.java:38) > at > org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:467) > at > org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:683) > at > org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.run(RemoteTestRunner.java:390) > at > org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.main(RemoteTestRunner.java:197) > Caused by: java.io.IOException: Job terminated with anomalous status FAILED > at org.apache.pig.PigServer.openIterator(PigServer.java:755) > ... 20 more > > Thanks > Gayatri > > > On Fri, Oct 28, 2011 at 10:51 AM, Ashutosh Chauhan <[EMAIL PROTECTED]>wrote: > >> Please paste the error that you are getting. >> >> Ashutosh >> On Fri, Oct 28, 2011 at 05:49, Gayatri Rao <[EMAIL PROTECTED]> wrote: >> >> > Sorry that was some bug at my writeFields method. its fixed now and I am >> > able to load and dump the data. >> > In SequenceFileLoader I have defined the corresponding keyconverter and >> > value converter classes. >> > >> > So, when I say >> > raw = load 'in.txt' using SequenceFileLoader; >> > dump raw >> > >> > It dumps the data but when I want to project the fields, it gives an >> error >> > do i have to explicity specify the schema in load ? like: >> > >> > raw = load 'in.txt' using SequenceFileLoader as (t:(a:int, >> > b:chararray,...)) >> > >> > >> > >> > On Wed, Oct 26, 2011 at 1:27 PM, Dmitriy Ryaboy <[EMAIL PROTECTED]> >> > wrote: >> > >> > > What do you expect to see, how did you create it, and what are the >> weird >> > > values? >> > > Any chance your compression settings are different for writing and >> > reading? >> > > >> > > On Tue, Oct 25, 2011 at 7:41 AM, Gayatri Rao <[EMAIL PROTECTED]> >> > wrote: >> > > > Thanks Dmitriy. |