Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce >> mail # user >> question for understanding partitioning


Copy link to this message
-
Re: question for understanding partitioning
1) you need not have 26 reducers but you want 26 partitions - you might send
to
    int reducer = character % Math.min(26,nreducers);  // this insures that
all items with A go to a single reducer but with less than 26 reducers some
reducers will get other characters -
   The scheme is inefficient as some letters are much more common than
others so some reducers will get a lot more data - you also cannot use more
than 26 reducers with this scheme.

   It is a good idea to think carefully about why you want all items with A
to go to a single reducer

On Tue, Jan 18, 2011 at 12:09 PM, Mapred Learn <[EMAIL PROTECTED]>wrote:

> hi,
> I have a basic question. How does partitioning work ?
>
> Following is a scenario I created to put up my question.
>
> i) A parttition function is defined as partitioning map-output based on
> aphabetical sorting of the key i.e. a partition for keys starting with 'a',
> partition for keys starting with 'b'... partition for keys starting with
> 'z'. So, it means each map may have atmost 26 partitions ?
>
> ii) What input will Reducer get ? Reducer will get first partition
> (partition starting with 'a') of all the maps as it's input ? Does it mean
> we will need 26 reduce tasks ?
>
> Any inputs/documents/examples on this are appreciated. I am bit confused by
> this.
>
> Thanks in advance
>

--
Steven M. Lewis PhD
4221 105th Ave Ne
Kirkland, WA 98033
206-384-1340 (cell)
Institute for Systems Biology
Seattle WA
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB