Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce, mail # user - Re: Types and SequenceFiles


Copy link to this message
-
Re: Types and SequenceFiles
Harsh J 2013-05-31, 03:17
Hi Jens,

Please read this old thread at http://search-hadoop.com/m/WHvZDCfVsD
which covers the issue, the solution and more.

On Fri, May 31, 2013 at 1:39 AM, Jens Scheidtmann
<[EMAIL PROTECTED]> wrote:
> Dear list,
>
> I have created a sequence file like this:
>
>     seqWriter = SequenceFile.createWriter(fs, getConf(), new Path(hdfsPath),
> IntWritable.class, BytesWritable.class, SequenceFile.CompressionType.NONE);
>     seqWriter.append(new IntWritable(index++), new BytesWritable(buf));
>
> (with buf a byte array.)
>
> Now, when reading the same sequence file in a map reduce job, I specify the
> mapper like this:
>
>     public static class NoOfMovesMapper
>         extends Mapper<IntWritable, BytesWritable, IntWritable, IntWritable>
> {
>
> and configure the SequenceFile as:
>
>     SequenceFileAsBinaryInputFormat.addInputPath(jobConf, new
> Path(args[i]));
>
> This job fails with:
>
>     java.lang.ClassCastException: org.apache.hadoop.io.LongWritable cannot
> be cast to org.apache.hadoop.io.IntWritable
>         at
> org.gostats.hadoop.NoOfMoves$NoOfMovesMapper.map(NoOfMoves.java:1)
>
> I have to specify the mapper as
>
>      extends Mapper<LongWritable, Text, IntWritable, IntWritable> {
>
> to read the sequence file. But then the number of records and invocations of
> the map is much larger than I would expect. I thought that I will have as
> many invocations of map as records in the sequence file.
>
> What am I doing wrong? Were am I wrong?
>
> Thanks in advance,
>
> Jens

--
Harsh J