Assume I have a large file called *BigData.unsorted* ( say 500GB)
consisting of lines of text. Assume that these lines are in random order -
I understand how to assign a key to lines and that Hadoop will pass the
lines to my reducers in order of that key.
Now assume I want a single file called *BigData.sorted* with the lines in
the order of the keys.
I think I understand how to get files part00000, part000001 ,,, but not
1) How I get just the lines from the reducer not the keys
2) How I make the reducer generate a file with the name that I want "*
*3) How without using a single reducer instance I get a single output file
or is a single reducer the right choice for this task.*
*Also it would be very nice if the output of the reducer were compressed -
say BigData.sorted.gz *
Steven M. Lewis PhD
Institute for Systems Biology