The problem for the second question can be solved if you use the
SequenceFileOutputFormat for the first job output and the
SequenceFileInputFormat for the second job input.
On Thu, Jun 14, 2012 at 11:11 PM, Michael Parker <[EMAIL PROTECTED]
> Hi all,
> One more question. I have two jobs to run serially using a JobControl.
> The key-value types for the outputs of the reducer of the first job
> are <ActiveDayKey, Text>, where ActiveDayKey is a class that
> implements WritableComparable. And so the key-value types for the
> inputs to the mapper of the second job are <ActiveDayKey, Text>. I'm
> noticing two things:
> First, in the output of the reducer from the first job, each
> ActiveDayKey object is being written as a string using its toString
> method. Since it's a subclass of WritableComparable that already knows
> how to serialize itself using write(DataOuptut), is there any way to
> exploit that to write it in binary format? Otherwise, do I need to
> write a subclass of FileOutputFormat?
> Second, the second job fails with "java.lang.ClassCastException:
> org.apache.hadoop.io.LongWritable cannot be cast to
> co.adhoclabs.LogProcessor$ActiveDayKey." I'm assuming this is because
> by default the key type is Long for the line number, and here I want
> to ignore the line number and use the ActiveDayKey written on the line
> itself as the key. Again, since ActiveDayKey knows how to deserialize
> itself using readFields(DataInput), is there any way to exploit that
> to read it from the line in binary format? Do I need to write a
> subclass of FileInputFormat?
> Assuming I need to write subclasses of FileOutputFormat and
> FileInputFormat classes, what's a good example of this? The terasort