Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Pig >> mail # user >> Java UDF and incompatible schema


Copy link to this message
-
Java UDF and incompatible schema
Hi everyone,

I have written my custom parser and since my files are sm,all I am using sequence file for efficiency. Each file in the equence file has info about one user and I am parsing that file and I would like to get a bag of tuples for every user/file/.  In my Parser class I have implemented exec function that will be called for each file/user.  I then gather the info and package it as tuples. Each user will generate multiple tuples sine the file is quite rich and complex. Is it correct to assume that the  the relation AU will contact one bag per user?

When I execute the following script, I get the following error. Any help with this would be great!
ERROR 1031: Incompatable field schema: declared is
"bag_0:bag{:tuple(id:int,class:chararray,name:chararray,begin:int,end:int,probone:chararray,probtwo:chararray)}",
 infered is ":Unknown"
Java UDF code snippet

PopulateBag
{

                for (MyItems item : items)
                {
                    Tuple output = TupleFactory.getInstance().newTuple(7);
                    output.set(0, item.getId());

                    output.set(1, item.getClass());

                    output.set(2,item.getName());
                    
                    output.set(3,item.Begin());

                    output.set(4,item.End());

                    output.set(5,item.Probabilityone());

                    output.set(6,item.Probtwo());

                    m_defaultDataBag.add(output);
                }
 }
 
     public DefaultDataBag exec(Tuple input) throws IOException {
    
     try
           {

               this.ParseFile((String)input.get(0));
               this.PopulateBag();
           return m_defaultDataBag;
           } catch (Exception e) {
           System.err.println("Failed to process th i/p \n");
           return null;
       }
}
Pig Script

REGISTER /users/p529444/software/pig-0.11.1/contrib/piggybank/java/piggybank.jar;
REGISTER /users/p529444/software/pig-0.11.1/parser.jar

DEFINE SequenceFileLoader org.apache.pig.piggybank.storage.SequenceFileLoader();

A = LOAD '/scratch/file.seq' USING SequenceFileLoader AS (key: chararray, value: chararray);
DESCRIBE A;
STORE A into '/scratch/A';

AU = FOREACH A GENERATE parser.Parser(key) AS {(id: int, class: chararray, name: chararray, begin: int, end: int, probone: chararray, probtwo: chararray)};
     
+
Pradeep Gollakota 2013-11-05, 01:08
+
Sameer Tilak 2013-11-05, 17:11
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB