Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig, mail # user - IOException appearing during dump but not illustrate


Copy link to this message
-
IOException appearing during dump but not illustrate
Kris Coward 2010-12-08, 21:53
Hi,

I've recently gotten stumped by a problem where my attempts to dump the
relations produced by a GROUP command give the following error (though
illustrating the same relation works fine):

java.io.IOException: Type mismatch in key from map: expected
org.apache.pig.impl.io.NullableBytesWritable, recieved org.apache.pig.impl.io.NullableText
        at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:807)
        at org.apache.hadoop.mapred.MapTask$OldOutputCollector.collect(MapTask.java:466)
        at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.collect(PigMapReduce.java:108)
.
.
.

for a little background, the relation that's failing is called y5, and
is produced by the following string of commands (in grunt):

y2 = foreach y1 generate $0 as timestamp, myudfs.httpArgParse($1) as argMap;
y3 = foreach y2 generate argMap#'s' as uid, timestamp as timestamp;
y4 = FILTER y3 BY (uid is not null);
y5 = GROUP y4 BY uid;

and to get an idea what sort of data is involved, ILLUSTRATE y4 yields:

-----------------------------------------------------------------------------------------------------
| y1     | timestamp: int | args: bag({tuple_of_tokens: (token: chararray)})                        |
-----------------------------------------------------------------------------------------------------
|        | 1265950806     | {(s=1381688313), (u=F68FFA1F655FDF494ABA520D95E1D99E), (ts=1265950805)} |
-----------------------------------------------------------------------------------------------------
-----------------------------------------------------------------------------------------------
| y2     | timestamp: int | argMap: map                                                       |
-----------------------------------------------------------------------------------------------
|        | 1265950806     | {u=F68FFA1F655FDF494ABA520D95E1D99E, ts=1265950805, s=1381688313} |
-----------------------------------------------------------------------------------------------
--------------------------------------------
| y3     | uid: bytearray | timestamp: int |
--------------------------------------------
|        | 1381688313     | 1265950806     |
--------------------------------------------
--------------------------------------------
| y4     | uid: bytearray | timestamp: int |
--------------------------------------------
|        | 1381688313     | 1265950806     |
--------------------------------------------

The same problem was also produced when the FILTER command was omitted,
and the relevant chunk of code in myudfs.httpArgParse is:

    StringTokenizer tok = new StringTokenizer((String)pair, "=", false);
    if (tok.hasMoreTokens() ) {
    String oKey = tok.nextToken();
        if (tok.hasMoreTokens() ) {
            Object oValue = tok.nextToken();
            output.put(oKey, oValue);
        } else {
            output.put(oKey, null);
        }
    }

If anyone has any insight how I could get this to work, that'd really
help me out.

Thanks,
Kris

P.S. For those who remember my earlier post about getting httpArgParse
to compile, I took the advice to ditch the InternalMap in favour of a
HashMap<String,Object>

--
Kris Coward http://unripe.melon.org/
GPG Fingerprint: 2BF3 957D 310A FEEC 4733  830E 21A4 05C7 1FEB 12B3