Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> Regarding Sequence File Loader in piggybank


Copy link to this message
-
Re: Regarding Sequence File Loader in piggybank
Please paste the error that you are getting.

Ashutosh
On Fri, Oct 28, 2011 at 05:49, Gayatri Rao <[EMAIL PROTECTED]> wrote:

> Sorry that was some bug at my writeFields method. its fixed now and I am
> able to load and dump the data.
> In SequenceFileLoader I have defined the corresponding keyconverter and
> value converter classes.
>
> So, when I say
>  raw = load  'in.txt' using SequenceFileLoader;
> dump raw
>
> It dumps the data but when I want to project the fields, it gives an error
> do i have to explicity specify the schema in load ? like:
>
> raw = load 'in.txt' using SequenceFileLoader as (t:(a:int,
> b:chararray,...))
>
>
>
> On Wed, Oct 26, 2011 at 1:27 PM, Dmitriy Ryaboy <[EMAIL PROTECTED]>
> wrote:
>
> > What do you expect to see, how did you create it, and what are the weird
> > values?
> > Any chance your compression settings are different for writing and
> reading?
> >
> > On Tue, Oct 25, 2011 at 7:41 AM, Gayatri Rao <[EMAIL PROTECTED]>
> wrote:
> > > Thanks Dmitriy.
> > > I was trying to implement the MyClassConverter for my custom class
> > > and I overided and implemented the method
> > >
> > > @Override
> > >    public Object bytesToObject(DataByteArray dataByteArray) throws
> > > IOException {
> > >
> > >        MyClass o = (MyClass) ReflectionUtils.newInstance(MyClass.class,
> > > null);
> > >        o.readFields(new DataInputStream(new
> > > ByteArrayInputStream(dataByteArray
> > >                .get())));
> > >        return o;
> > >
> > >    }
> > >
> > > and my MyClass.readFields is as follows:
> > >
> > > @Override
> > >    public void readFields(DataInput in) throws IOException {
> > >        num = in.readInt();
> > >        list = new ArrayList<String>();
> > >        for (int i = 0; i < 3; i++) {
> > >            list.add(WritableUtils.readString(in));
> > >        }
> > >
> > >    }
> > >
> > > This puts some weird data in num and list. Any idea what I might be
> doing
> > > wrong?
> > >
> > >
> > > On Tue, Oct 25, 2011 at 9:03 AM, Dmitriy Ryaboy <[EMAIL PROTECTED]>
> > wrote:
> > >
> > >> you can compile it with "ant -Dnothrift=true"
> > >>
> > >> There's also a "-Dnoprotobuf=true" option, but I just tried it and it
> > seems
> > >> we do require protobufs in 1 place that's not excluded when we skip
> > >> protocol
> > >> buffers, so you still need protoc version 2.3
> > >>
> > >> D
> > >>
> > >> On Mon, Oct 24, 2011 at 6:52 PM, Gayatri Rao <[EMAIL PROTECTED]>
> > wrote:
> > >>
> > >> > Thats great, thanks, I ll check it out. Is thrift a dependency for
> > >> > building?
> > >> >
> > >> > On Mon, Oct 24, 2011 at 6:49 PM, Dmitriy Ryaboy <[EMAIL PROTECTED]
> >
> > >> > wrote:
> > >> >
> > >> > > Correct -- it's completely rewritten.
> > >> > >
> > >> > > We haven't published EB to a public maven repo, though I believe
> we
> > did
> > >> > add
> > >> > > a "maven-install" ant target to publish to your local maven repo.
> > >> > >
> > >> > > D
> > >> > >
> > >> > > On Mon, Oct 24, 2011 at 6:33 PM, Gayatri Rao <[EMAIL PROTECTED]
> >
> > >> > wrote:
> > >> > >
> > >> > > > I have  checked the SequenceFileLoader from elephantbird and it
> > seems
> > >> > to
> > >> > > > use
> > >> > > > a different SequenceFileLoader as oppose to the one there is in
> > >> > piggybank
> > >> > > > Is there any reason for that?
> > >> > > >
> > >> > > > On Mon, Oct 24, 2011 at 5:57 PM, Gayatri Rao <
> [EMAIL PROTECTED]
> > >
> > >> > > wrote:
> > >> > > >
> > >> > > > > Thank Dmitriy.  Are the jars available in maven repository?
> > >> > > > >
> > >> > > > > Thanks,
> > >> > > > > Gayatri
> > >> > > > >
> > >> > > > >
> > >> > > > > On Mon, Oct 24, 2011 at 11:55 AM, Dmitriy Ryaboy <
> > >> [EMAIL PROTECTED]
> > >> > > > >wrote:
> > >> > > > >
> > >> > > > >> We have a massively improved (well, rewritten from scratch)
> > >> > > > SequenceLoader
> > >> > > > >> in elephantbird. Take a look here:
> > >> > > > >>
> > >> > > > >>
> > >> > > >
> > >> > >
> > >> >
> > >>
> >
> https://github.com/kevinweil/elephant-bird/blob/master/src/java/com/twitter/elephantbird/pig/load/SequenceFileLoader.java
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB