Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> Java UDF and incompatible schema


Copy link to this message
-
RE: Java UDF and incompatible schema
Hi Pradeep,
Yes, I implemented the outputSchema method and it fixed that issue.

We are also planning to evaluate to store intermediate and final results in Cassandra.
> Date: Mon, 4 Nov 2013 17:08:56 -0800
> Subject: Re: Java UDF and incompatible schema
> From: [EMAIL PROTECTED]
> To: [EMAIL PROTECTED]
>
> This is most likely because you haven't defined the outputSchema method of
> the UDF. The AS keyword merges the schema generated by the UDF with the
> user specified schema. If the UDF does not override the method and specify
> the output schema, it is considered null and you will not be able to use AS
> to override the schema.
>
> Out of curiosity, if each one of your small files describes a user, is
> there any reason why you can't use a database (e.g. HBase) to store this
> information? It seems like any file based storage may not be the best
> solution given my extremely limited knowledge of your problem domain.
>
>
> On Mon, Nov 4, 2013 at 4:26 PM, Sameer Tilak <[EMAIL PROTECTED]> wrote:
>
> > Hi everyone,
> >
> > I have written my custom parser and since my files are sm,all I am using
> > sequence file for efficiency. Each file in the equence file has info about
> > one user and I am parsing that file and I would like to get a bag of tuples
> > for every user/file/.  In my Parser class I have implemented exec function
> > that will be called for each file/user.  I then gather the info and package
> > it as tuples. Each user will generate multiple tuples sine the file is
> > quite rich and complex. Is it correct to assume that the  the relation AU
> > will contact one bag per user?
> >
> > When I execute the following script, I get the following error. Any help
> > with this would be great!
> > ERROR 1031: Incompatable field schema: declared is
> >
> > "bag_0:bag{:tuple(id:int,class:chararray,name:chararray,begin:int,end:int,probone:chararray,probtwo:chararray)}",
> >  infered is ":Unknown"
> >
> >
> > Java UDF code snippet
> >
> > PopulateBag
> > {
> >
> >                 for (MyItems item : items)
> >                 {
> >
> >
> >                     Tuple output = TupleFactory.getInstance().newTuple(7);
> >
> >
> >                     output.set(0, item.getId());
> >
> >                     output.set(1, item.getClass());
> >
> >                     output.set(2,item.getName());
> >
> >                     output.set(3,item.Begin());
> >
> >                     output.set(4,item.End());
> >
> >                     output.set(5,item.Probabilityone());
> >
> >                     output.set(6,item.Probtwo());
> >
> >                     m_defaultDataBag.add(output);
> >
> >
> >                 }
> >  }
> >
> >      public DefaultDataBag exec(Tuple input) throws IOException {
> >
> >      try
> >            {
> >
> >                this.ParseFile((String)input.get(0));
> >                this.PopulateBag();
> >            return m_defaultDataBag;
> >            } catch (Exception e) {
> >            System.err.println("Failed to process th i/p \n");
> >            return null;
> >        }
> > }
> >
> >
> > Pig Script
> >
> > REGISTER
> > /users/p529444/software/pig-0.11.1/contrib/piggybank/java/piggybank.jar;
> > REGISTER /users/p529444/software/pig-0.11.1/parser.jar
> >
> > DEFINE SequenceFileLoader
> > org.apache.pig.piggybank.storage.SequenceFileLoader();
> >
> > A = LOAD '/scratch/file.seq' USING SequenceFileLoader AS (key: chararray,
> > value: chararray);
> > DESCRIBE A;
> > STORE A into '/scratch/A';
> >
> > AU = FOREACH A GENERATE parser.Parser(key) AS {(id: int, class: chararray,
> > name: chararray, begin: int, end: int, probone: chararray, probtwo:
> > chararray)};
> >
> >
> >
> >
> >