Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce >> mail # user >> Re: Types and SequenceFiles


Copy link to this message
-
Re: Types and SequenceFiles
Hi Jens,

Please read this old thread at http://search-hadoop.com/m/WHvZDCfVsD
which covers the issue, the solution and more.

On Fri, May 31, 2013 at 1:39 AM, Jens Scheidtmann
<[EMAIL PROTECTED]> wrote:
> Dear list,
>
> I have created a sequence file like this:
>
>     seqWriter = SequenceFile.createWriter(fs, getConf(), new Path(hdfsPath),
> IntWritable.class, BytesWritable.class, SequenceFile.CompressionType.NONE);
>     seqWriter.append(new IntWritable(index++), new BytesWritable(buf));
>
> (with buf a byte array.)
>
> Now, when reading the same sequence file in a map reduce job, I specify the
> mapper like this:
>
>     public static class NoOfMovesMapper
>         extends Mapper<IntWritable, BytesWritable, IntWritable, IntWritable>
> {
>
> and configure the SequenceFile as:
>
>     SequenceFileAsBinaryInputFormat.addInputPath(jobConf, new
> Path(args[i]));
>
> This job fails with:
>
>     java.lang.ClassCastException: org.apache.hadoop.io.LongWritable cannot
> be cast to org.apache.hadoop.io.IntWritable
>         at
> org.gostats.hadoop.NoOfMoves$NoOfMovesMapper.map(NoOfMoves.java:1)
>
> I have to specify the mapper as
>
>      extends Mapper<LongWritable, Text, IntWritable, IntWritable> {
>
> to read the sequence file. But then the number of records and invocations of
> the map is much larger than I would expect. I thought that I will have as
> many invocations of map as records in the sequence file.
>
> What am I doing wrong? Were am I wrong?
>
> Thanks in advance,
>
> Jens

--
Harsh J
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB