Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HDFS >> mail # user >> Best way to write files to hdfs (from a Python app)

Copy link to this message
Re: Best way to write files to hdfs (from a Python app)
Has anyone had tried using swig to wrap libhdfs?

I spent some time today doing this, and it seems like it could be a
great solution, but its also a fair amount of work (especially having
never used swig before). If this seems generally worthwhile I could
finish it up.

Or is the thrift interface the API to use? Is anyone successfully using it?

I'm primarily interested in building some filesystem management +
reporting tools, so being slower than the Java interface is not
problematic. I'd prefer to not to parse the command-line tool output
though :)


On Tue, Aug 10, 2010 at 9:39 AM, Philip Zeyliger <[EMAIL PROTECTED]> wrote:
> On Tue, Aug 10, 2010 at 5:06 AM, Bjoern Schiessle <[EMAIL PROTECTED]>
> wrote:
>> Hi Philip,
>> On Mon, 9 Aug 2010 16:35:07 -0700 Philip Zeyliger wrote:
>> > To give you an example of how this may be done, HUE, under the covers,
>> > pipes your data to 'bin/hadoop fs -Dhadoop.job.ugi=user,group put -
>> > path'. (That's from memory, but it's approximately right; the full
>> > python code is at
>> >
>> > http://github.com/cloudera/hue/blob/master/desktop/libs/hadoop/src/hadoop/fs/hadoopfs.py#L692
>> > )
>> Thank you! If I understand it correctly this only works if my python app
>> runs on the same server as hadoop, right?
> It works only if your python app has network connectivity to your namenode.
>  You can access an explicitly specified HDFS by passing
> -Dfs.default.name=hdfs://<namenode>:<namenode_port>/ .  (The default is read
> from hadoop-site.xml (or perhaps hdfs-site.xml), and, I think, defaults to
> file:///).