Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
MapReduce >> mail # user >> Why LineRecordWriter.write(..) is synchronized


+
Sathwik B P 2013-08-08, 06:53
+
Azuryy Yu 2013-08-08, 07:06
+
Azuryy Yu 2013-08-08, 10:12
+
Harsh J 2013-08-08, 10:50
+
Niels Basjes 2013-08-08, 11:00
+
Jay Vyas 2013-08-08, 11:53
+
Sathwik B P 2013-08-08, 12:21
+
Niels Basjes 2013-08-08, 13:37
+
Azuryy Yu 2013-08-08, 14:22
Copy link to this message
-
Re: Why LineRecordWriter.write(..) is synchronized
Yes, I feel we could discuss this over a JIRA to remove it if it hurts
perf. too much, but it would have to be a marked incompatible change,
and we have to add a note about the lack of thread safety in the
javadoc of base Mapper/Reducer classes.

On Sun, Aug 11, 2013 at 1:26 PM, Sathwik B P <[EMAIL PROTECTED]> wrote:
> Hi Harsh,
>
> Does it make any sense to keep the method in LRW still synchronized. Isn't
> it creating unnecessary overhead for non multi threaded implementations.
>
> regards,
> sathwik
>
>
> On Fri, Aug 9, 2013 at 7:16 AM, Harsh J <[EMAIL PROTECTED]> wrote:
>>
>> I suppose I should have been clearer. There's no problem out of box if
>> people stick to the libraries we offer :)
>>
>> Yes the LRW was marked synchronized at some point over 8 years ago [1]
>> in support for multi-threaded maps, but the framework has changed much
>> since then. The MultithreadedMapper/etc. API we offer now
>> automatically shields the devs away from having to think of output
>> thread safety [2].
>>
>> I can imagine there can only be a problem if a user writes their own
>> unsafe multi threaded task. I suppose we could document that in the
>> Mapper/MapRunner and Reducer APIs.
>>
>> [1] - http://svn.apache.org/viewvc?view=revision&revision=171186 -
>> Commit added a synchronized to the write call.
>> [2] - MultiThreadedMapper/etc. synchronize over the collector -
>>
>> http://svn.apache.org/viewvc/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/map/MultithreadedMapper.java?view=markup
>>
>> On Thu, Aug 8, 2013 at 7:52 PM, Azuryy Yu <[EMAIL PROTECTED]> wrote:
>> > sequence writer is also synchronized, I dont think this is bad.
>> >
>> > if you call HDFS api to write concurrently, then its necessary.
>> >
>> > On Aug 8, 2013 7:53 PM, "Jay Vyas" <[EMAIL PROTECTED]> wrote:
>> >>
>> >> Then is this a bug?  Synchronization in absence of any race condition
>> >> is
>> >> normally considered "bad".
>> >>
>> >> In any case id like to know why this writer is synchronized whereas the
>> >> other one are not.. That is, I think, then point at issue: either other
>> >> writers should be synchronized or else this one shouldn't be -
>> >> consistency
>> >> across the write implementations is probably desirable so that changes
>> >> to
>> >> output formats or record writers don't lead to bugs in multithreaded
>> >> environments .
>> >>
>> >> On Aug 8, 2013, at 6:50 AM, Harsh J <[EMAIL PROTECTED]> wrote:
>> >>
>> >> While we don't fork by default, we do provide a MultithreadedMapper
>> >> implementation that would require such synchronization. But if you are
>> >> asking is it necessary, then perhaps the answer is no.
>> >>
>> >> On Aug 8, 2013 3:43 PM, "Azuryy Yu" <[EMAIL PROTECTED]> wrote:
>> >>>
>> >>> its not hadoop forked threads, we may create a line record writer,
>> >>> then
>> >>> call this writer concurrently.
>> >>>
>> >>> On Aug 8, 2013 4:00 PM, "Sathwik B P" <[EMAIL PROTECTED]> wrote:
>> >>>>
>> >>>> Hi,
>> >>>> Thanks for your reply.
>> >>>> May I know where does hadoop fork multiple threads to use a single
>> >>>> RecordWriter.
>> >>>>
>> >>>> regards,
>> >>>> sathwik
>> >>>>
>> >>>> On Thu, Aug 8, 2013 at 7:06 AM, Azuryy Yu <[EMAIL PROTECTED]> wrote:
>> >>>>>
>> >>>>> because we may use multi-threads to write a single file.
>> >>>>>
>> >>>>> On Aug 8, 2013 2:54 PM, "Sathwik B P" <[EMAIL PROTECTED]> wrote:
>> >>>>>>
>> >>>>>> Hi,
>> >>>>>>
>> >>>>>> LineRecordWriter.write(..) is synchronized. I did not find any
>> >>>>>> other
>> >>>>>> RecordWriter implementations define the write as synchronized.
>> >>>>>> Any specific reason for this.
>> >>>>>>
>> >>>>>> regards,
>> >>>>>> sathwik
>> >>>>
>> >>>>
>> >
>>
>>
>>
>> --
>> Harsh J
>
>

--
Harsh J
+
Niels Basjes 2013-08-11, 16:02
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB