Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> IOException appearing during dump but not illustrate


Copy link to this message
-
IOException appearing during dump but not illustrate
Hi,

I've recently gotten stumped by a problem where my attempts to dump the
relations produced by a GROUP command give the following error (though
illustrating the same relation works fine):

java.io.IOException: Type mismatch in key from map: expected
org.apache.pig.impl.io.NullableBytesWritable, recieved org.apache.pig.impl.io.NullableText
        at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:807)
        at org.apache.hadoop.mapred.MapTask$OldOutputCollector.collect(MapTask.java:466)
        at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.collect(PigMapReduce.java:108)
.
.
.

for a little background, the relation that's failing is called y5, and
is produced by the following string of commands (in grunt):

y2 = foreach y1 generate $0 as timestamp, myudfs.httpArgParse($1) as argMap;
y3 = foreach y2 generate argMap#'s' as uid, timestamp as timestamp;
y4 = FILTER y3 BY (uid is not null);
y5 = GROUP y4 BY uid;

and to get an idea what sort of data is involved, ILLUSTRATE y4 yields:

-----------------------------------------------------------------------------------------------------
| y1     | timestamp: int | args: bag({tuple_of_tokens: (token: chararray)})                        |
-----------------------------------------------------------------------------------------------------
|        | 1265950806     | {(s=1381688313), (u=F68FFA1F655FDF494ABA520D95E1D99E), (ts=1265950805)} |
-----------------------------------------------------------------------------------------------------
-----------------------------------------------------------------------------------------------
| y2     | timestamp: int | argMap: map                                                       |
-----------------------------------------------------------------------------------------------
|        | 1265950806     | {u=F68FFA1F655FDF494ABA520D95E1D99E, ts=1265950805, s=1381688313} |
-----------------------------------------------------------------------------------------------
--------------------------------------------
| y3     | uid: bytearray | timestamp: int |
--------------------------------------------
|        | 1381688313     | 1265950806     |
--------------------------------------------
--------------------------------------------
| y4     | uid: bytearray | timestamp: int |
--------------------------------------------
|        | 1381688313     | 1265950806     |
--------------------------------------------

The same problem was also produced when the FILTER command was omitted,
and the relevant chunk of code in myudfs.httpArgParse is:

    StringTokenizer tok = new StringTokenizer((String)pair, "=", false);
    if (tok.hasMoreTokens() ) {
    String oKey = tok.nextToken();
        if (tok.hasMoreTokens() ) {
            Object oValue = tok.nextToken();
            output.put(oKey, oValue);
        } else {
            output.put(oKey, null);
        }
    }

If anyone has any insight how I could get this to work, that'd really
help me out.

Thanks,
Kris

P.S. For those who remember my earlier post about getting httpArgParse
to compile, I took the advice to ditch the InternalMap in favour of a
HashMap<String,Object>

--
Kris Coward http://unripe.melon.org/
GPG Fingerprint: 2BF3 957D 310A FEEC 4733  830E 21A4 05C7 1FEB 12B3
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB