Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce, mail # user - Why LineRecordWriter.write(..) is synchronized


Copy link to this message
-
Re: Why LineRecordWriter.write(..) is synchronized
Harsh J 2013-08-11, 12:59
Yes, I feel we could discuss this over a JIRA to remove it if it hurts
perf. too much, but it would have to be a marked incompatible change,
and we have to add a note about the lack of thread safety in the
javadoc of base Mapper/Reducer classes.

On Sun, Aug 11, 2013 at 1:26 PM, Sathwik B P <[EMAIL PROTECTED]> wrote:
> Hi Harsh,
>
> Does it make any sense to keep the method in LRW still synchronized. Isn't
> it creating unnecessary overhead for non multi threaded implementations.
>
> regards,
> sathwik
>
>
> On Fri, Aug 9, 2013 at 7:16 AM, Harsh J <[EMAIL PROTECTED]> wrote:
>>
>> I suppose I should have been clearer. There's no problem out of box if
>> people stick to the libraries we offer :)
>>
>> Yes the LRW was marked synchronized at some point over 8 years ago [1]
>> in support for multi-threaded maps, but the framework has changed much
>> since then. The MultithreadedMapper/etc. API we offer now
>> automatically shields the devs away from having to think of output
>> thread safety [2].
>>
>> I can imagine there can only be a problem if a user writes their own
>> unsafe multi threaded task. I suppose we could document that in the
>> Mapper/MapRunner and Reducer APIs.
>>
>> [1] - http://svn.apache.org/viewvc?view=revision&revision=171186 -
>> Commit added a synchronized to the write call.
>> [2] - MultiThreadedMapper/etc. synchronize over the collector -
>>
>> http://svn.apache.org/viewvc/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/map/MultithreadedMapper.java?view=markup
>>
>> On Thu, Aug 8, 2013 at 7:52 PM, Azuryy Yu <[EMAIL PROTECTED]> wrote:
>> > sequence writer is also synchronized, I dont think this is bad.
>> >
>> > if you call HDFS api to write concurrently, then its necessary.
>> >
>> > On Aug 8, 2013 7:53 PM, "Jay Vyas" <[EMAIL PROTECTED]> wrote:
>> >>
>> >> Then is this a bug?  Synchronization in absence of any race condition
>> >> is
>> >> normally considered "bad".
>> >>
>> >> In any case id like to know why this writer is synchronized whereas the
>> >> other one are not.. That is, I think, then point at issue: either other
>> >> writers should be synchronized or else this one shouldn't be -
>> >> consistency
>> >> across the write implementations is probably desirable so that changes
>> >> to
>> >> output formats or record writers don't lead to bugs in multithreaded
>> >> environments .
>> >>
>> >> On Aug 8, 2013, at 6:50 AM, Harsh J <[EMAIL PROTECTED]> wrote:
>> >>
>> >> While we don't fork by default, we do provide a MultithreadedMapper
>> >> implementation that would require such synchronization. But if you are
>> >> asking is it necessary, then perhaps the answer is no.
>> >>
>> >> On Aug 8, 2013 3:43 PM, "Azuryy Yu" <[EMAIL PROTECTED]> wrote:
>> >>>
>> >>> its not hadoop forked threads, we may create a line record writer,
>> >>> then
>> >>> call this writer concurrently.
>> >>>
>> >>> On Aug 8, 2013 4:00 PM, "Sathwik B P" <[EMAIL PROTECTED]> wrote:
>> >>>>
>> >>>> Hi,
>> >>>> Thanks for your reply.
>> >>>> May I know where does hadoop fork multiple threads to use a single
>> >>>> RecordWriter.
>> >>>>
>> >>>> regards,
>> >>>> sathwik
>> >>>>
>> >>>> On Thu, Aug 8, 2013 at 7:06 AM, Azuryy Yu <[EMAIL PROTECTED]> wrote:
>> >>>>>
>> >>>>> because we may use multi-threads to write a single file.
>> >>>>>
>> >>>>> On Aug 8, 2013 2:54 PM, "Sathwik B P" <[EMAIL PROTECTED]> wrote:
>> >>>>>>
>> >>>>>> Hi,
>> >>>>>>
>> >>>>>> LineRecordWriter.write(..) is synchronized. I did not find any
>> >>>>>> other
>> >>>>>> RecordWriter implementations define the write as synchronized.
>> >>>>>> Any specific reason for this.
>> >>>>>>
>> >>>>>> regards,
>> >>>>>> sathwik
>> >>>>
>> >>>>
>> >
>>
>>
>>
>> --
>> Harsh J
>
>

--
Harsh J