|
|
-
Re: how to write outputs sequentially?Luca Pireddu 2011-03-22, 16:03
On March 22, 2011 16:54:34 Shi Yu wrote:
> I guess you need to define a Partitioner to send hased keys to different > reducers (sorry, I am still using the old API so probably there is > something new in the trunk release). Basically you try to segment the > keys into different zones, 0-10, 11-20, ... > > maybe check the hashCode() function and see how to categorize these zones? > > Shi > > On 3/22/2011 9:24 AM, JunYoung Kim wrote: > > hi, > > > > I run almost 60 ruduce tasks for a single job. > > > > if the outputs of a job are from part00 to part 59. > > > > is there way to write rows sequentially by sorted keys? > > > > curretly my outputs are like this. > > > > part00) > > 1 > > 10 > > 12 > > 14 > > > > part 01) > > 2 > > 4 > > 6 > > 11 > > 13 > > > > part 02) > > 3 > > 5 > > 7 > > 8 > > 9 > > > > but, my aim is to get the following results. > > > > part00) > > 1 > > 2 > > 3 > > 4 > > 5 > > > > part01) > > 6 > > 7 > > 8 > > 9 > > 10 > > > > part02) > > 11 > > 12 > > 13 > > 14 > > 15 > > > > the hadoop is able to support this kind of one? > > > > thanks You can look at TeraSort in the examples to see how to do this. There's even a short write-up by Owen O'Malley about it here: http://sortbenchmark.org/YahooHadoop.pdf -- Luca Pireddu CRS4 - Distributed Computing Group Loc. Pixina Manna Edificio 1 Pula 09010 (CA), Italy Tel: +39 0709250452 |