Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce >> mail # dev >> Understanding the MapOutput


Copy link to this message
-
Re: Understanding the MapOutput
On Fri, Nov 4, 2011 at 10:04 AM, Pedro Costa <[EMAIL PROTECTED]> wrote:
> 1- I think that IFIle.reader can only read the whole map output file. I
> want to read a partition of the map output. How can I do that? How do I set
> the size of a partition in the I

Look at the code for MapOutputServlet - it uses the index mechanism to
find a particular partition.

>
> 2 - I know that map output is composed by blocks. What is the size of a
> block? Is it 64MB by default?

Nope, it doesn't use blocks. That's HDFS you're thinking of.

-Todd

> 2011/11/4 Todd Lipcon <[EMAIL PROTECTED]>
>
>> Hi Pedro,
>>
>> The format is called IFile. Check out the source for more info on the
>> format - it's fairly simple. The partition starts are recorded in a
>> separate index file next to the output file.
>>
>> I don't think you'll find significant docs on this format since it's
>> MR-internal - the code is your best resource.
>>
>> -Todd
>>
>> On Fri, Nov 4, 2011 at 8:37 AM, Pedro Costa <[EMAIL PROTECTED]> wrote:
>> > Hi,
>> >
>> > I'm trying to understand the structure of the map output file. Here's an
>> > example of a mapoutput file that contains 2 partitions:
>> >
>> > [code]
>> > <FF><FF><FF><FF>^@^@716banana banana apple banana carrot carrot apple
>> > banana 0apple carrot carrot carrot banana carrot carrot 5^N4carrot apple
>> > carrot apple apple carrot banana apple ^Mbanana apple
>> <FF><FF><DF>|<8E><B7>
>> > [/code]
>> >
>> > 1 - I would like to understand what are the ASCII characters parts. What
>> > they means?
>> >
>> > 2 - What type of file is a map output? Is it a SequenceFileOutputFormat,
>> or
>> > a TextOutputFormat?
>> >
>> > 3 - I've a small program that runs independently of the MR that has the
>> > goal to digest each partition and give the correspondent hash. How do I
>> > know where each partition starts?
>> >
>> >
>> > --
>> > Thanks,
>> > PSC
>> >
>>
>>
>>
>> --
>> Todd Lipcon
>> Software Engineer, Cloudera
>>
>
>
>
> --
> Thanks,
>

--
Todd Lipcon
Software Engineer, Cloudera
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB