Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce, mail # user - Re: Map Shuffle Bytes


Copy link to this message
-
Re: Map Shuffle Bytes
Harsh J 2012-12-26, 11:50
This isn't called 'shuffle' (but rather a plain remote read) so your
original question was confusing, thanks for clarifying!

In that case, you could count the bytes coming in from the required
record reader - for example a TextRecordReader uses a Long key that
denotes current offset in file, which you could use as a simple,
progressing counter of bytes read thus far.

On Wed, Dec 26, 2012 at 5:16 PM, Eduard Skaley <[EMAIL PROTECTED]> wrote:
> Hi,
>
> I mean TO the mappers. I'm using the CompositeInputFormat for my application
> to compute map-side joins.
> I want to join two datasets A and B one is stored on node 1 and the other
> one on node 2.
> For example if the join will be computed on node 2 then the inputsplit of
> the dataset which is stored on node 1 has to be transferred to node 2.
> I want to count the bytes which are shuffled (transferred) TO the mapper of
> node 2.
>
>> Hi,
>>
>> What do you mean by "shuffled bytes [to] the mappers"? If you mean
>> "from", it is "Reduce shuffle bytes" you look for; otherwise, you may
>> be looking for the per-map counter of "Map output bytes".
>>
>> Per-partition counters can be constructed on the user side if needed,
>> by pre-computing the partition before emit (using the same
>> partitioner) and counting up the bytes of your objects for its
>> counter.
>>
>> On Tue, Dec 25, 2012 at 6:03 PM, Eduard Skaley <[EMAIL PROTECTED]>
>> wrote:
>>>
>>> Hello guys,
>>>
>>> I need a counter for shuffled bytes to the mappers.
>>> Is there existing one or should I define one myself ?
>>> How can I implement such a counter?
>>>
>>> Thank you and happy Christmas time,
>>> Eduard
>>
>>
>>
>

--
Harsh J