Are there any rough numbers one can give me regarding the latency of creating, writing, and closing a small HDFS-based file? Does replication have a big impact? I am trying to decide whether to communicate some modestly-sized (~200KB) information via HDFS files or go to the trouble of creating a protocol. Thanks John
Small file creation is a well-documented major problem (and bottleneck) in HDFS. You can either roll your own protocol, or use MapR which is about 100x faster and 1000x scalable than HDFS for this particular problem.
All projects made searchable here are trademarks of the Apache Software Foundation.
Service operated by Sematext