Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # user >> how to write outputs sequentially?


Copy link to this message
-
Re: how to write outputs sequentially?
On March 22, 2011 16:54:34 Shi Yu wrote:
> I guess you need to define a Partitioner to send hased keys to different
> reducers (sorry, I am still using the old API so probably there is
> something new in the trunk release).  Basically you try to segment the
> keys into different zones, 0-10, 11-20, ...
>
> maybe check the hashCode() function and see how to categorize these zones?
>
> Shi
>
> On 3/22/2011 9:24 AM, JunYoung Kim wrote:
> > hi,
> >
> > I run almost 60 ruduce tasks for a single job.
> >
> > if the outputs of a job are from part00 to part 59.
> >
> > is there way to write rows sequentially by sorted keys?
> >
> > curretly my outputs are like this.
> >
> > part00)
> > 1
> > 10
> > 12
> > 14
> >
> > part 01)
> > 2
> > 4
> > 6
> > 11
> > 13
> >
> > part 02)
> > 3
> > 5
> > 7
> > 8
> > 9
> >
> > but, my aim is to get the following results.
> >
> > part00)
> > 1
> > 2
> > 3
> > 4
> > 5
> >
> > part01)
> > 6
> > 7
> > 8
> > 9
> > 10
> >
> > part02)
> > 11
> > 12
> > 13
> > 14
> > 15
> >
> > the hadoop is able to support this kind of one?
> >
> > thanks
You can look at TeraSort in the examples to see how to do this.  There's even
a short write-up  by Owen O'Malley about it here:  
http://sortbenchmark.org/YahooHadoop.pdf

--
Luca Pireddu
CRS4 - Distributed Computing Group
Loc. Pixina Manna Edificio 1
Pula 09010 (CA), Italy
Tel:  +39 0709250452
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB