Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HDFS >> mail # user >> Best way to write files to hdfs (from a Python app)


Copy link to this message
-
Re: Best way to write files to hdfs (from a Python app)
Hi Bjoern,

To give you an example of how this may be done, HUE, under the covers, pipes
your data to 'bin/hadoop fs -Dhadoop.job.ugi=user,group put - path'.
 (That's from memory, but it's approximately right; the full python code is
at
http://github.com/cloudera/hue/blob/master/desktop/libs/hadoop/src/hadoop/fs/hadoopfs.py#L692
)

Cheers,

-- Philip

On Mon, Aug 9, 2010 at 9:18 AM, Bjoern Schiessle <[EMAIL PROTECTED]>wrote:

> Hi all,
>
> I develop a web application with Django(Python) which should access an
> hbase database and store large files to hdfs.
>
> I wonder what is the best way to write files to hdfs from my Django app?
> Basically I thought about two ways but maybe you know a better option:
>
> 1. First store the file on the local file system and than move it with
> the thrift interface to hdfs. (downside: needs always enough space on the
> web application server)
>
> 2. Use hdfs-fuse to mount the hdfs file system and write the file directly
> to hdfs. (downside: I don't know how well hdfs-fuse is supported and I'm
> not sure if it is a good idea to mount the file system and run large
> operation on it).
>
> Since I'm new to hdfs and Hadoop in general I'm not sure what's the best
> and less error-prone way.
>
> What would be your recommendation?
>
> Thanks a lot!
> Björn
>
>