Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # user >> hdfs write files in streaming fashion


Copy link to this message
-
Re: hdfs write files in streaming fashion
For 1, yes the data is written in chunks to hdfs if you are using the FiLe
System API. The whole file is not first stored in memory.

For 2, I think you should anyway shouldn't rely on an exception or
'not-closing' the writer for cleaning up the partially written file. It is
not a safe and recommended practice anyway. You should do the cleanup and
free resources and delete anything that you want, explicitly for better
visibility, control and robustness.

Regards,
Shahab
On Mon, Aug 19, 2013 at 3:38 PM, Adeel Qureshi <[EMAIL PROTECTED]>wrote:

> I have a servlet that receives files in a streaming fashion and our
> original design was to receive the file in /tmp directory and then move it
> to hdfs via an external process but that seems to add an additional (may be
> unnecessary step). My question is if I receive files in a servlet as a post
> request (file is in body of request) and I open a bufferedwriter on hdfs
> then
>
> 1. are the files really written in a streaming fashion such that nothing
> is held in memory because these are huge files and maintaining in memory
> and then at the end sending the whole file to hdfs wont make sense
>
> 2. if for some reason we decide half way down the file to reject it and
> not move it to hdfs, since it was being streamed do we have to remove the
> file or simply because the write stream isnt closed or some exception is
> thrown that it will be automatically cleaned by file system.
>
> Thanks
> Adeel
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB