Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce >> mail # dev >> Understanding the MapOutput


Copy link to this message
-
Re: Understanding the MapOutput
Hi Pedro,

The format is called IFile. Check out the source for more info on the
format - it's fairly simple. The partition starts are recorded in a
separate index file next to the output file.

I don't think you'll find significant docs on this format since it's
MR-internal - the code is your best resource.

-Todd

On Fri, Nov 4, 2011 at 8:37 AM, Pedro Costa <[EMAIL PROTECTED]> wrote:
> Hi,
>
> I'm trying to understand the structure of the map output file. Here's an
> example of a mapoutput file that contains 2 partitions:
>
> [code]
> <FF><FF><FF><FF>^@^@716banana banana apple banana carrot carrot apple
> banana 0apple carrot carrot carrot banana carrot carrot 5^N4carrot apple
> carrot apple apple carrot banana apple ^Mbanana apple <FF><FF><DF>|<8E><B7>
> [/code]
>
> 1 - I would like to understand what are the ASCII characters parts. What
> they means?
>
> 2 - What type of file is a map output? Is it a SequenceFileOutputFormat, or
> a TextOutputFormat?
>
> 3 - I've a small program that runs independently of the MR that has the
> goal to digest each partition and give the correspondent hash. How do I
> know where each partition starts?
>
>
> --
> Thanks,
> PSC
>

--
Todd Lipcon
Software Engineer, Cloudera
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB