Re: Is it possible to access the HDFS via Java OUTSIDE the Cluster?

Re: Is it possible to access the HDFS via Java OUTSIDE the Cluster?
On 09/05/2011 05:42 PM, Sofia Georgiakaki wrote:
> Good evening,
> this topic seems very interesting. To be sure I understood the case -
> do you mean that I can write a simple Java program and access a file
> stored in HDFS from within the java application?
> Assuming that I have e.g. 10 files of size 30GB each stored on HDFS
> on a cluster of 15 nodes, how can I run a java program that accesses
> these files and reads some blocks from them? Is it possible to do it
> without copying the files via -copyToLocal ?
> If yes, could anyone give some general directions on the general form
> of such a java code, and on how to run such a program?
> Thank  you in advance Sofia

You certainly can access a file on HDFS through a simple Java program.
You can also access your files with an even simpler Python program using
the Pydoop HDFS module (http://pydoop.sf.net/).  Here's a simple Python
script to print a file:
import pydoop.hdfs as py_hdfs

fs = py_hdfs.hdfs('default', 0)

for line in fs.open_file("myfile", 'r'):
print line

Luca Pireddu
CRS4 - Distributed Computing Group
Loc. Pixina Manna Edificio 1
Pula 09010 (CA), Italy
Tel: +39 0709250452