Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop, mail # user - how to write outputs sequentially?


Copy link to this message
-
Re: how to write outputs sequentially?
Luca Pireddu 2011-03-22, 16:03
On March 22, 2011 16:54:34 Shi Yu wrote:
> I guess you need to define a Partitioner to send hased keys to different
> reducers (sorry, I am still using the old API so probably there is
> something new in the trunk release).  Basically you try to segment the
> keys into different zones, 0-10, 11-20, ...
>
> maybe check the hashCode() function and see how to categorize these zones?
>
> Shi
>
> On 3/22/2011 9:24 AM, JunYoung Kim wrote:
> > hi,
> >
> > I run almost 60 ruduce tasks for a single job.
> >
> > if the outputs of a job are from part00 to part 59.
> >
> > is there way to write rows sequentially by sorted keys?
> >
> > curretly my outputs are like this.
> >
> > part00)
> > 1
> > 10
> > 12
> > 14
> >
> > part 01)
> > 2
> > 4
> > 6
> > 11
> > 13
> >
> > part 02)
> > 3
> > 5
> > 7
> > 8
> > 9
> >
> > but, my aim is to get the following results.
> >
> > part00)
> > 1
> > 2
> > 3
> > 4
> > 5
> >
> > part01)
> > 6
> > 7
> > 8
> > 9
> > 10
> >
> > part02)
> > 11
> > 12
> > 13
> > 14
> > 15
> >
> > the hadoop is able to support this kind of one?
> >
> > thanks
You can look at TeraSort in the examples to see how to do this.  There's even
a short write-up  by Owen O'Malley about it here:  
http://sortbenchmark.org/YahooHadoop.pdf

--
Luca Pireddu
CRS4 - Distributed Computing Group
Loc. Pixina Manna Edificio 1
Pula 09010 (CA), Italy
Tel:  +39 0709250452