Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce >> mail # dev >> Understanding the MapOutput


Copy link to this message
-
Re: Understanding the MapOutput
On Fri, Nov 4, 2011 at 10:04 AM, Pedro Costa <[EMAIL PROTECTED]> wrote:
> 1- I think that IFIle.reader can only read the whole map output file. I
> want to read a partition of the map output. How can I do that? How do I set
> the size of a partition in the I

Look at the code for MapOutputServlet - it uses the index mechanism to
find a particular partition.

>
> 2 - I know that map output is composed by blocks. What is the size of a
> block? Is it 64MB by default?

Nope, it doesn't use blocks. That's HDFS you're thinking of.

-Todd

> 2011/11/4 Todd Lipcon <[EMAIL PROTECTED]>
>
>> Hi Pedro,
>>
>> The format is called IFile. Check out the source for more info on the
>> format - it's fairly simple. The partition starts are recorded in a
>> separate index file next to the output file.
>>
>> I don't think you'll find significant docs on this format since it's
>> MR-internal - the code is your best resource.
>>
>> -Todd
>>
>> On Fri, Nov 4, 2011 at 8:37 AM, Pedro Costa <[EMAIL PROTECTED]> wrote:
>> > Hi,
>> >
>> > I'm trying to understand the structure of the map output file. Here's an
>> > example of a mapoutput file that contains 2 partitions:
>> >
>> > [code]
>> > <FF><FF><FF><FF>^@^@716banana banana apple banana carrot carrot apple
>> > banana 0apple carrot carrot carrot banana carrot carrot 5^N4carrot apple
>> > carrot apple apple carrot banana apple ^Mbanana apple
>> <FF><FF><DF>|<8E><B7>
>> > [/code]
>> >
>> > 1 - I would like to understand what are the ASCII characters parts. What
>> > they means?
>> >
>> > 2 - What type of file is a map output? Is it a SequenceFileOutputFormat,
>> or
>> > a TextOutputFormat?
>> >
>> > 3 - I've a small program that runs independently of the MR that has the
>> > goal to digest each partition and give the correspondent hash. How do I
>> > know where each partition starts?
>> >
>> >
>> > --
>> > Thanks,
>> > PSC
>> >
>>
>>
>>
>> --
>> Todd Lipcon
>> Software Engineer, Cloudera
>>
>
>
>
> --
> Thanks,
>

--
Todd Lipcon
Software Engineer, Cloudera