|
|
-
old problem: mapper output as sequence file
Shi Yu 2011-09-19, 19:19
Hi,
I am stuck again in a probably very simple problem. I couldn't generate the map output in sequence file format. I always get this error:
java.io.IOException: wrong key class: org.apache.hadoop.io.Text is not class org.apache.hadoop.io.LongWritable at org.apache.hadoop.io.SequenceFile$Writer.append(SequenceFile.java:985) at org.apache.hadoop.mapreduce.lib.output.SequenceFileOutputFormat$1.write(SequenceFileOutputFormat.java:74) at org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.write(MapTask.java:498) at org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80) at edu.uchicago.naivetagger.lzocheck.MapperSequenceCompression$Map.map(MapperSequenceCompression.java:29) at edu.uchicago.naivetagger.lzocheck.MapperSequenceCompression$Map.map(MapperSequenceCompression.java:27) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305) at org.apache.hadoop.mapred.Child.main(Child.java:170) Why hadoop always trying to cast the Text class of my key into LongWritable? I notice that in old API there might be some problems processing the index file and the sequence file in Mapper output. But I am using the 0.20.2 API so I guess that is not the issue. I guess I missed something naive here, but took me a long time figuring that out. Thanks for any suggestion.
Here is my complete code. It contains a mapper only job and I ran it on random input because it simply outputs a static <"key","a"> as <text, text> output.
import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.conf.Configured; import org.apache.hadoop.fs.Path; import org.apache.hadoop.io.LongWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.Job; import org.apache.hadoop.mapreduce.lib.input.TextInputFormat; import org.apache.hadoop.util.GenericOptionsParser; import org.apache.hadoop.util.Tool; import org.apache.hadoop.util.ToolRunner; import java.io.IOException;
public class MapperSequenceCompression extends Configured implements Tool{
public static class Map extends org.apache.hadoop.mapreduce.Mapper<LongWritable, Text, Text, Text> { public void map(LongWritable key, Text value, org.apache.hadoop.mapreduce.Mapper.Context context) throws IOException, InterruptedException{ context.write(new Text("key"), new Text("a")); } }
public int run(String[] args) throws IOException { Configuration conf = new Configuration(); String[] otherArgs = new GenericOptionsParser(conf, args).getRemainingArgs();
Path in = new Path(args[0]); Path out = new Path(args[1]);
Job job = new Job(conf, "MyJob");
job.setJarByClass(MapperSequenceCompression.class); job.setMapperClass(MapperSequenceCompression.Map.class);
job.setMapOutputKeyClass(Text.class); job.setMapOutputValueClass(Text.class);
org.apache.hadoop.mapreduce.lib.input.FileInputFormat.setInputPaths(job, in); org.apache.hadoop.mapreduce.lib.output.FileOutputFormat.setOutputPath(job, out);
job.setNumReduceTasks(0);
job.setInputFormatClass(TextInputFormat.class); job.setOutputFormatClass(org.apache.hadoop.mapreduce.lib.output.SequenceFileOutputFormat.class); try { System.exit(job.waitForCompletion(true) ? 0 : 1); return 0;
} catch( Throwable e) { return -1; } }
public static void main(String[] args) throws Exception { int exitCode = ToolRunner.run(new MapperSequenceCompression(),args); System.exit(exitCode); } }
-
Re: old problem: mapper output as sequence file
Brock Noland 2011-09-19, 20:15
Hi,
On Mon, Sep 19, 2011 at 3:19 PM, Shi Yu <[EMAIL PROTECTED]> wrote: > > I am stuck again in a probably very simple problem. I couldn't generate the > map output in sequence file format. I always get this error: > java.io.IOException: wrong key class: org.apache.hadoop.io.Text is not class org.apache.hadoop.io.LongWritable No worries.
> job.setMapOutputKeyClass(Text.class); > job.setMapOutputValueClass(Text.class);
You are running a map only job, so I think you want:
job.setOutputKeyClass(Text.class); job.setOutputValueClass(Text.class);
But I also recommend adding @Override on your map method because it's easy to accidentally not override your superclass method.
@Override public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException{ Brock
-
Re: old problem: mapper output as sequence file
Shi Yu 2011-09-19, 20:28
Oh that's brilliant. Thanks a lot Brock! On 9/19/2011 3:15 PM, Brock Noland wrote: > Hi, > > On Mon, Sep 19, 2011 at 3:19 PM, Shi Yu<[EMAIL PROTECTED]> wrote: >> I am stuck again in a probably very simple problem. I couldn't generate the >> map output in sequence file format. I always get this error: >> java.io.IOException: wrong key class: org.apache.hadoop.io.Text is not class org.apache.hadoop.io.LongWritable > > No worries. > >> job.setMapOutputKeyClass(Text.class); >> job.setMapOutputValueClass(Text.class); > You are running a map only job, so I think you want: > > job.setOutputKeyClass(Text.class); > job.setOutputValueClass(Text.class);
Yes, that was the exact reason.
> > But I also recommend adding @Override on your map method because it's > easy to accidentally not override your superclass method. > > @Override > public void map(LongWritable key, Text value, Context context) > throws IOException, InterruptedException{ > > > Brock
|
|