Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HDFS >> mail # user >> Re: Why LineRecordWriter.write(..) is synchronized

Copy link to this message
Re: Why LineRecordWriter.write(..) is synchronized
I suppose I should have been clearer. There's no problem out of box if
people stick to the libraries we offer :)

Yes the LRW was marked synchronized at some point over 8 years ago [1]
in support for multi-threaded maps, but the framework has changed much
since then. The MultithreadedMapper/etc. API we offer now
automatically shields the devs away from having to think of output
thread safety [2].

I can imagine there can only be a problem if a user writes their own
unsafe multi threaded task. I suppose we could document that in the
Mapper/MapRunner and Reducer APIs.

[1] - http://svn.apache.org/viewvc?view=revision&revision=171186 -
Commit added a synchronized to the write call.
[2] - MultiThreadedMapper/etc. synchronize over the collector -

On Thu, Aug 8, 2013 at 7:52 PM, Azuryy Yu <[EMAIL PROTECTED]> wrote:
> sequence writer is also synchronized, I dont think this is bad.
> if you call HDFS api to write concurrently, then its necessary.
> On Aug 8, 2013 7:53 PM, "Jay Vyas" <[EMAIL PROTECTED]> wrote:
>> Then is this a bug?  Synchronization in absence of any race condition is
>> normally considered "bad".
>> In any case id like to know why this writer is synchronized whereas the
>> other one are not.. That is, I think, then point at issue: either other
>> writers should be synchronized or else this one shouldn't be - consistency
>> across the write implementations is probably desirable so that changes to
>> output formats or record writers don't lead to bugs in multithreaded
>> environments .
>> On Aug 8, 2013, at 6:50 AM, Harsh J <[EMAIL PROTECTED]> wrote:
>> While we don't fork by default, we do provide a MultithreadedMapper
>> implementation that would require such synchronization. But if you are
>> asking is it necessary, then perhaps the answer is no.
>> On Aug 8, 2013 3:43 PM, "Azuryy Yu" <[EMAIL PROTECTED]> wrote:
>>> its not hadoop forked threads, we may create a line record writer, then
>>> call this writer concurrently.
>>> On Aug 8, 2013 4:00 PM, "Sathwik B P" <[EMAIL PROTECTED]> wrote:
>>>> Hi,
>>>> Thanks for your reply.
>>>> May I know where does hadoop fork multiple threads to use a single
>>>> RecordWriter.
>>>> regards,
>>>> sathwik
>>>> On Thu, Aug 8, 2013 at 7:06 AM, Azuryy Yu <[EMAIL PROTECTED]> wrote:
>>>>> because we may use multi-threads to write a single file.
>>>>> On Aug 8, 2013 2:54 PM, "Sathwik B P" <[EMAIL PROTECTED]> wrote:
>>>>>> Hi,
>>>>>> LineRecordWriter.write(..) is synchronized. I did not find any other
>>>>>> RecordWriter implementations define the write as synchronized.
>>>>>> Any specific reason for this.
>>>>>> regards,
>>>>>> sathwik

Harsh J