Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HDFS >> mail # user >> Best way to write files to hdfs (from a Python app)


Copy link to this message
-
Re: Best way to write files to hdfs (from a Python app)
Has anyone had tried using swig to wrap libhdfs?

I spent some time today doing this, and it seems like it could be a
great solution, but its also a fair amount of work (especially having
never used swig before). If this seems generally worthwhile I could
finish it up.

Or is the thrift interface the API to use? Is anyone successfully using it?

I'm primarily interested in building some filesystem management +
reporting tools, so being slower than the Java interface is not
problematic. I'd prefer to not to parse the command-line tool output
though :)

--travis

On Tue, Aug 10, 2010 at 9:39 AM, Philip Zeyliger <[EMAIL PROTECTED]> wrote:
>
>
> On Tue, Aug 10, 2010 at 5:06 AM, Bjoern Schiessle <[EMAIL PROTECTED]>
> wrote:
>>
>> Hi Philip,
>>
>> On Mon, 9 Aug 2010 16:35:07 -0700 Philip Zeyliger wrote:
>> > To give you an example of how this may be done, HUE, under the covers,
>> > pipes your data to 'bin/hadoop fs -Dhadoop.job.ugi=user,group put -
>> > path'. (That's from memory, but it's approximately right; the full
>> > python code is at
>> >
>> > http://github.com/cloudera/hue/blob/master/desktop/libs/hadoop/src/hadoop/fs/hadoopfs.py#L692
>> > )
>>
>> Thank you! If I understand it correctly this only works if my python app
>> runs on the same server as hadoop, right?
>
> It works only if your python app has network connectivity to your namenode.
>  You can access an explicitly specified HDFS by passing
> -Dfs.default.name=hdfs://<namenode>:<namenode_port>/ .  (The default is read
> from hadoop-site.xml (or perhaps hdfs-site.xml), and, I think, defaults to
> file:///).
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB