Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce, mail # user - Why LineRecordWriter.write(..) is synchronized


Copy link to this message
-
Re: Why LineRecordWriter.write(..) is synchronized
Niels Basjes 2013-08-08, 13:37
I would say yes make this a Jira.
The actual change can fall (as proposed by Jay) in two directions: Put
in synchronization
in all implementations OR take it out of all implementations.

I think the first thing to determine is why the synchronization was put
into the  LineRecordWriter in the first place.

https://github.com/apache/hadoop-common/blame/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/output/TextOutputFormat.java

The oldest I have been able to find is a commit on 2009-05-18 for
HADOOP-4687 that is about moving stuff around (i.e. this code is even older
than that).

Niels

On Thu, Aug 8, 2013 at 2:21 PM, Sathwik B P <[EMAIL PROTECTED]> wrote:

> Hi Harsh,
>
> Do you want me to raise a Jira on this.
>
> regards,
> sathwik
>
>
> On Thu, Aug 8, 2013 at 5:23 PM, Jay Vyas <[EMAIL PROTECTED]> wrote:
>
>> Then is this a bug?  Synchronization in absence of any race condition is
>> normally considered "bad".
>>
>> In any case id like to know why this writer is synchronized whereas the
>> other one are not.. That is, I think, then point at issue: either other
>> writers should be synchronized or else this one shouldn't be - consistency
>> across the write implementations is probably desirable so that changes to
>> output formats or record writers don't lead to bugs in multithreaded
>> environments .
>>
>> On Aug 8, 2013, at 6:50 AM, Harsh J <[EMAIL PROTECTED]> wrote:
>>
>> While we don't fork by default, we do provide a MultithreadedMapper
>> implementation that would require such synchronization. But if you are
>> asking is it necessary, then perhaps the answer is no.
>> On Aug 8, 2013 3:43 PM, "Azuryy Yu" <[EMAIL PROTECTED]> wrote:
>>
>>> its not hadoop forked threads, we may create a line record writer, then
>>> call this writer concurrently.
>>> On Aug 8, 2013 4:00 PM, "Sathwik B P" <[EMAIL PROTECTED]> wrote:
>>>
>>>> Hi,
>>>> Thanks for your reply.
>>>> May I know where does hadoop fork multiple threads to use a single
>>>> RecordWriter.
>>>>
>>>> regards,
>>>> sathwik
>>>>
>>>> On Thu, Aug 8, 2013 at 7:06 AM, Azuryy Yu <[EMAIL PROTECTED]> wrote:
>>>>
>>>>> because we may use multi-threads to write a single file.
>>>>> On Aug 8, 2013 2:54 PM, "Sathwik B P" <[EMAIL PROTECTED]> wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> LineRecordWriter.write(..) is synchronized. I did not find any other
>>>>>> RecordWriter implementations define the write as synchronized.
>>>>>> Any specific reason for this.
>>>>>>
>>>>>> regards,
>>>>>> sathwik
>>>>>>
>>>>>
>>>>
>
--
Best regards / Met vriendelijke groeten,

Niels Basjes