Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Hadoop >> mail # dev >> Re: Compatibility in Apache Hadoop


+
Steve Loughran 2013-04-23, 16:00
+
Karthik Kambatla 2013-04-23, 20:09
+
Andrew Purtell 2013-04-23, 18:32
+
Alejandro Abdelnur 2013-04-23, 18:50
+
Andrew Purtell 2013-04-23, 18:58
+
Steve Loughran 2013-04-23, 18:44
Copy link to this message
-
Re: Compatibility in Apache Hadoop
On 23 April 2013 09:00, Steve Loughran <[EMAIL PROTECTED]> wrote:

>
>
> On 22 April 2013 18:32, Eli Collins <[EMAIL PROTECTED]> wrote:
>
>>
>>
>
>> However if a change made FileSystem#close three times slower, this
>> perhaps a smaller semantic change (eg doesn't change what exceptions
>> get thrown) but probably much less tolerable for end users.
>>
>
> You know that the blobstores all buffer their data so that
>
>    1. flush() is a no-op
>    2. the write takes place on close()
>
> #1 changes durability expectations, while #2 means the time to close() is
> O(data)*O(latency); P(fail) scales with time and distance, and as lots of
> code swallows exceptions on close, those failures may even miss.
>
>
for the curious, there are some tests that I plan to get into bigtop that
not only generate various large files, they collect stats on the duration
of operations. On a remote blobstore, its close() that takes most of the
time, even for only a few MB of data

2013-04-23 11:23:21,911 [main] INFO  tools.DataGenerator (?:call(?)) -
Generating 100000 lines of data
2013-04-23 11:23:22,122 [main] DEBUG snative.SwiftNativeOutputStream
(SwiftNativeOutputStream.java:uploadOnClose(146)) - Closing write of file
/tmp/data/massive/csv/data-0014.csv;
localfile=target/build/test/output-4786965937321057354.tmp of length 3301583
2013-04-23 11:23:23,437 [main] INFO  generate.GenerateManyCSVFilesTest
(?:call(?)) - Total time = 0:02:031; create time=0:00:505; write time
=0:00:210; close time = 0:01:316 partitions=0
+
Karthik Kambatla 2013-04-22, 21:00
+
Steve Loughran 2013-04-23, 00:42
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB