Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # dev >> Hadoop - Distributed sorting


Copy link to this message
-
Re: Hadoop - Distributed sorting
Hi
  Steps to do this:
1) Map: It will only define the key value for each number
 2) Combiner : To sort locally  over chunk of dataset .
 3) Reducer: It will sort after over whole chunk globally-------------->
OUT PUT as sorted

Note: set combiner and reducer as Same class.

Example:
  Let us assume that our data set (integers) is constrained between 100 to
200 and we have 5 files each containing 1000 random integers between 100
and 200 (so a total of 5000 integers between 100 and 200). We read each
file into a Map and then in the Reduce phase, we produce a final Map which
contains the count of all the integers. Now if we sort all the integers
from the final Map and output it
into a list data structure in the form of <Integer, Count> then we have
sorted all the data (see figure below). Aside : In Java, you don’t even
have to come up with the data-structure that I am talking about, if you
just use a TreeMap<http://java.sun.com/javase/6/docs/api/index.html?java/util/TreeMap.html>in
the final Reduce phase, then all the keys (i.e. data) are already
sorted
as long as the key type (e.g. String, Integer, etc.) implements the
Comparable<http://java.sun.com/javase/6/docs/api/index.html?java/lang/Comparable.html>interface
(
Hadoop <http://hadoop.apache.org/> has something similar called
WritableComparable<http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/io/WritableComparable.html>and
I am using a TreeMap that takes Strings as keys in
Reducer<http://code.google.com/p/dalalstreet/source/browse/trunk/MapReduce/src/org/karticks/mapreduce/Reducer.java>
Thanks
   Samir
On Tue, May 15, 2012 at 11:31 PM, @dataElGrande <[EMAIL PROTECTED]>wrote:

>
> Check out Pentaho's howto's when dealing with Hadoop or NoSQL or anything
> big
> data related. http://wiki.pentaho.com/display/BAD/How+To%27s
>
>
> madhu_sushmi wrote:
> >
> > Hi,
> > I need to implement distributed sorting using Hadoop. I am quite new to
> > Hadoop and I am getting confused. If I want to implement Merge sort, what
> > my Map and reduce should be doing. ? Should all the sorting happen at
> > reduce side?
> >
> > Please help. This is an urgent requirement. Please guide me.
> >
> > Thanks,
> > Madhu
> >
>
> --
> View this message in context:
> http://old.nabble.com/Hadoop---Distributed-sorting-tp32876784p33849704.html
> Sent from the Hadoop core-dev mailing list archive at Nabble.com.
>
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB