Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
MapReduce >> mail # user >> Map output files and partitions.

Pedro Sá da Costa 2012-12-14, 07:15
Harsh J 2012-12-14, 07:29
Copy link to this message
Re: Map output files and partitions.
Hello Pedro,

       The first part of your question is very well covered by Harsh.

For the second part, the generation and no. of partitions is governed by
the getPartition() Method present in the 'Partition' Interface. The default
behavior is to create partitions based on Hashing. You can have your own
implementation of getPartion() to write your custom Partitioner.


    Mohammad Tariq

On Fri, Dec 14, 2012 at 12:59 PM, Harsh J <[EMAIL PROTECTED]> wrote:

> Map output files, by which you perhaps mean intermediate data files
> for temporary K/V persistence, are stored in IFiles. They do not use
> text nor sequence files (historically though, they did use sequence
> files at some point).
> You can read the IFile's sources at
> http://svn.apache.org/repos/asf/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/IFile.java
> for more technical details on it. It is very similar to SequenceFiles
> in some ways.
> On Fri, Dec 14, 2012 at 12:45 PM, Pedro Sá da Costa <[EMAIL PROTECTED]>
> wrote:
> > Hi,
> >
> > There only 2 types of map output files, Sequence and Text files. If
> > those files are going to be used as input to several reduce tasks,
> > they need to be partitioned into blocks. Is there any SEPARATOR bits
> > that limits each partition? Can I read a specific partition of a map
> > output file? Is there an API for that?
> >
> > --
> > Best regards,
> --
> Harsh J