Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # user >> Re: Is it possible to access the HDFS via Java OUTSIDE the Cluster?


Copy link to this message
-
Re: Is it possible to access the HDFS via Java OUTSIDE the Cluster?
On 09/05/2011 05:42 PM, Sofia Georgiakaki wrote:
> Good evening,
>
> this topic seems very interesting. To be sure I understood the case -
> do you mean that I can write a simple Java program and access a file
> stored in HDFS from within the java application?
>
> Assuming that I have e.g. 10 files of size 30GB each stored on HDFS
> on a cluster of 15 nodes, how can I run a java program that accesses
> these files and reads some blocks from them? Is it possible to do it
> without copying the files via -copyToLocal ?
>
> If yes, could anyone give some general directions on the general form
> of such a java code, and on how to run such a program?
>
> Thank  you in advance Sofia

You certainly can access a file on HDFS through a simple Java program.
You can also access your files with an even simpler Python program using
the Pydoop HDFS module (http://pydoop.sf.net/).  Here's a simple Python
script to print a file:
import pydoop.hdfs as py_hdfs

fs = py_hdfs.hdfs('default', 0)

for line in fs.open_file("myfile", 'r'):
print line

--
Luca Pireddu
CRS4 - Distributed Computing Group
Loc. Pixina Manna Edificio 1
Pula 09010 (CA), Italy
Tel: +39 0709250452
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB