Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce >> mail # user >> Why LineRecordWriter.write(..) is synchronized


Copy link to this message
-
Re: Why LineRecordWriter.write(..) is synchronized
I expect the impact on the IO speed to be almost 0 because waiting for a
single disk seek is longer than many thousands of calls to a synchronized
method.

Niels
On Aug 11, 2013 3:00 PM, "Harsh J" <[EMAIL PROTECTED]> wrote:

> Yes, I feel we could discuss this over a JIRA to remove it if it hurts
> perf. too much, but it would have to be a marked incompatible change,
> and we have to add a note about the lack of thread safety in the
> javadoc of base Mapper/Reducer classes.
>
> On Sun, Aug 11, 2013 at 1:26 PM, Sathwik B P <[EMAIL PROTECTED]> wrote:
> > Hi Harsh,
> >
> > Does it make any sense to keep the method in LRW still synchronized.
> Isn't
> > it creating unnecessary overhead for non multi threaded implementations.
> >
> > regards,
> > sathwik
> >
> >
> > On Fri, Aug 9, 2013 at 7:16 AM, Harsh J <[EMAIL PROTECTED]> wrote:
> >>
> >> I suppose I should have been clearer. There's no problem out of box if
> >> people stick to the libraries we offer :)
> >>
> >> Yes the LRW was marked synchronized at some point over 8 years ago [1]
> >> in support for multi-threaded maps, but the framework has changed much
> >> since then. The MultithreadedMapper/etc. API we offer now
> >> automatically shields the devs away from having to think of output
> >> thread safety [2].
> >>
> >> I can imagine there can only be a problem if a user writes their own
> >> unsafe multi threaded task. I suppose we could document that in the
> >> Mapper/MapRunner and Reducer APIs.
> >>
> >> [1] - http://svn.apache.org/viewvc?view=revision&revision=171186 -
> >> Commit added a synchronized to the write call.
> >> [2] - MultiThreadedMapper/etc. synchronize over the collector -
> >>
> >>
> http://svn.apache.org/viewvc/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/map/MultithreadedMapper.java?view=markup
> >>
> >> On Thu, Aug 8, 2013 at 7:52 PM, Azuryy Yu <[EMAIL PROTECTED]> wrote:
> >> > sequence writer is also synchronized, I dont think this is bad.
> >> >
> >> > if you call HDFS api to write concurrently, then its necessary.
> >> >
> >> > On Aug 8, 2013 7:53 PM, "Jay Vyas" <[EMAIL PROTECTED]> wrote:
> >> >>
> >> >> Then is this a bug?  Synchronization in absence of any race condition
> >> >> is
> >> >> normally considered "bad".
> >> >>
> >> >> In any case id like to know why this writer is synchronized whereas
> the
> >> >> other one are not.. That is, I think, then point at issue: either
> other
> >> >> writers should be synchronized or else this one shouldn't be -
> >> >> consistency
> >> >> across the write implementations is probably desirable so that
> changes
> >> >> to
> >> >> output formats or record writers don't lead to bugs in multithreaded
> >> >> environments .
> >> >>
> >> >> On Aug 8, 2013, at 6:50 AM, Harsh J <[EMAIL PROTECTED]> wrote:
> >> >>
> >> >> While we don't fork by default, we do provide a MultithreadedMapper
> >> >> implementation that would require such synchronization. But if you
> are
> >> >> asking is it necessary, then perhaps the answer is no.
> >> >>
> >> >> On Aug 8, 2013 3:43 PM, "Azuryy Yu" <[EMAIL PROTECTED]> wrote:
> >> >>>
> >> >>> its not hadoop forked threads, we may create a line record writer,
> >> >>> then
> >> >>> call this writer concurrently.
> >> >>>
> >> >>> On Aug 8, 2013 4:00 PM, "Sathwik B P" <[EMAIL PROTECTED]> wrote:
> >> >>>>
> >> >>>> Hi,
> >> >>>> Thanks for your reply.
> >> >>>> May I know where does hadoop fork multiple threads to use a single
> >> >>>> RecordWriter.
> >> >>>>
> >> >>>> regards,
> >> >>>> sathwik
> >> >>>>
> >> >>>> On Thu, Aug 8, 2013 at 7:06 AM, Azuryy Yu <[EMAIL PROTECTED]>
> wrote:
> >> >>>>>
> >> >>>>> because we may use multi-threads to write a single file.
> >> >>>>>
> >> >>>>> On Aug 8, 2013 2:54 PM, "Sathwik B P" <[EMAIL PROTECTED]> wrote:
> >> >>>>>>
> >> >>>>>> Hi,
> >> >>>>>>
> >> >>>>>> LineRecordWriter.write(..) is synchronized. I did not find any
> >> >>>>