Mark Kerzner 2012-06-18, 14:19
John Armstrong 2012-06-18, 14:24
Mark Kerzner 2012-06-18, 14:40
John Armstrong 2012-06-18, 14:53
-Re: Do I have to sort?
Mark Kerzner 2012-06-18, 15:12
Thank you for the great instructions!
On Mon, Jun 18, 2012 at 9:53 AM, John Armstrong <[EMAIL PROTECTED]> wrote:
> On 06/18/2012 10:40 AM, Mark Kerzner wrote:
>> that sounds very interesting, and I may implement such a workflow, but
>> can I write back to HDFS in the mapper? In the reducer it is a standard
>> context.write(), but it is a different context.
> Both Mapper.Context and Reducer.Context descend from
> TaskInputOutputContext, which is where the write() method is defined, so
> they're both outputting their data in the same way.
> If you don't have a Reducer -- only Mappers and fully parallel data
> processing -- then when you configure your job you set the number of
> reducers to zero. Then the mapper context knows that mapper output is the
> last step, so it uses the specified OutputFormat to write out the data,
> just like your reducer context currently does with reducer output.
Minh Nguyen 2012-06-18, 14:48