|
Denim Live
2010-08-25, 07:36
Steve Lewis
2010-08-25, 16:04
Sudhir Vallamkondu
2010-08-25, 19:28
Raj V
2010-08-25, 21:00
path2727
2010-10-12, 07:16
path2727
2010-10-12, 08:22
|
-
How to enumerate files in the directories?Denim Live 2010-08-25, 07:36
Hello, how can one determine the names of the files in a particular hadoop
directory, programmatically?
-
Re: How to enumerate files in the directories?Steve Lewis 2010-08-25, 16:04
@Override
public HDFSFile[] getFiles(String directory) { String result = executeCommand("hadoop fs -ls " + directory); String[] items = result.split("\n"); List<HDFSFile> holder = new ArrayList<HDFSFile>(); for (int i = 1; i < items.length; i++) { String item = items[i]; if (item.length() > MIN__FILE_LENGTH) { try { holder.add(new HDFSFile(item)); } catch (Exception e) { } } } HDFSFile[] ret = new HDFSFile[holder.size()]; holder.toArray(ret); return ret; } On Wed, Aug 25, 2010 at 12:36 AM, Denim Live <[EMAIL PROTECTED]> wrote: > Hello, how can one determine the names of the files in a particular hadoop > directory, programmatically? > > > > -- Steven M. Lewis PhD Institute for Systems Biology Seattle WA
-
Re: How to enumerate files in the directories?Sudhir Vallamkondu 2010-08-25, 19:28
You should use the FileStatus API to access file metadata. See below a
example. http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/fs/FileSt atus.html Configuration conf = new Configuration(); // takes default conf FileSystem fs = FileSystem.get(conf); Path dir = new Path("/dir"); FileStatus[] stats = fs.listStatus(dir); foreach(FileStatus stat : stats) { stat.getPath().toUri().getPath(); // gives directory name stat.getModificationTime(); stat.getReplication(); stat.getBlockSize(); stat.getOwner(); stat.getGroup(); stat.getPermission().toString(); } > From: Denim Live <[EMAIL PROTECTED]> > Date: Wed, 25 Aug 2010 07:36:11 +0000 (GMT) > To: <[EMAIL PROTECTED]> > Subject: How to enumerate files in the directories? > > Hello, how can one determine the names of the files in a particular hadoop > directory, programmatically? iCrossing Privileged and Confidential Information This email message is for the sole use of the intended recipient(s) and may contain confidential and privileged information of iCrossing. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message.
-
Re: How to enumerate files in the directories?Raj V 2010-08-25, 21:00
I would use the FileSystem API.
Here is a Q&D example import java.io.*; import java.util.*; import java.lang.*; import java.net.URI; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.fs.FileSystem; import org.apache.hadoop.fs.Path; import org.apache.hadoop.fs.FileStatus; public class dirc { public static void main ( String args[]) { try { String dirname = args[0]; Configuration conf = new Configuration(true); FileSystem fs = FileSystem.get(conf); Path path = new Path(dirname); FileStatus fstatus[] = fs.listStatus(path); for ( FileStatus f: fstatus ) { System.out.println(f.getPath().toUri().getPath()); } }catch ( IOException e ) { System.out.println("Usage dirc <directory> "); return ; } catch (ArrayIndexOutOfBoundsException e) { System.out.println("Usage dirc <directory> "); return ; } } } ________________________________ From: Steve Lewis <[EMAIL PROTECTED]> To: [EMAIL PROTECTED] Sent: Wed, August 25, 2010 9:04:41 AM Subject: Re: How to enumerate files in the directories? @Override public HDFSFile[] getFiles(String directory) { String result = executeCommand("hadoop fs -ls " + directory); String[] items = result.split("\n"); List<HDFSFile> holder = new ArrayList<HDFSFile>(); for (int i = 1; i < items.length; i++) { String item = items[i]; if (item.length() > MIN__FILE_LENGTH) { try { holder.add(new HDFSFile(item)); } catch (Exception e) { } } } HDFSFile[] ret = new HDFSFile[holder.size()]; holder.toArray(ret); return ret; } On Wed, Aug 25, 2010 at 12:36 AM, Denim Live <[EMAIL PROTECTED]> wrote: Hello, how can one determine the names of the files in a particular hadoop >directory, programmatically? > > > > -- Steven M. Lewis PhD Institute for Systems Biology Seattle WA
-
Re: How to enumerate files in the directories?path2727 2010-10-12, 07:16
I think this might be a better answer to your question. I took a lot of the code out of the web interface they made. $HADOOP_HOME/hdfs/src/java/org/apache/hadoop/hdfs/server/common/JspHelper.java and $HADOOP_HOME/hdfs/src/java/org/apache/hadoop/hdfs/server/datanode/DatanodeJspHelper.java import java.io.File; import java.io.IOException; import java.net.InetAddress; import java.net.InetSocketAddress; import java.security.PrivilegedExceptionAction; import java.util.Date; import java.util.List; import org.apache.hadoop.hdfs.DFSClient; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.fs.Path; import org.apache.hadoop.security.UserGroupInformation; import org.apache.hadoop.hdfs.protocol.HdfsFileStatus; import org.apache.hadoop.hdfs.protocol.DirectoryListing; public class FileTest { /** * * @param server the name of the namenode server. 'namenode.example.com' * @param port the port of the name node. * mine is 54310 right now. * this is the info port not the other port that the slaves connect to. * @param dir the directory you wish to enumerate. I used '/' in this example. * */ public FileTest( String server, String port, String dir ) { String tDir = validatePath( dir ); int namenodePort = Integer.parseInt(port); if( tDir != null ) { Configuration conf = new Configuration(true); UserGroupInformation ugi = null; try { ugi = UserGroupInformation.getCurrentUser(); } catch ( IOException ioe ) { ioe.printStackTrace(); } InetSocketAddress inet = new InetSocketAddress( server, namenodePort ); if( ugi != null && inet != null && conf != null ) { try { DFSClient dfs = getDFSClient( ugi, inet, conf); String target = dir; final HdfsFileStatus targetStatus = dfs.getFileInfo(target); if( targetStatus.isDir() ) { //System.out.println("it is a directory"); DirectoryListing thisListing dfs.listPaths(target, HdfsFileStatus.EMPTY_NAME); if (thisListing == null || thisListing.getPartialListing().length == 0) { System.out.println("Empty directory"); } else { //System.out.println("directory not empty"); HdfsFileStatus[] files thisListing.getPartialListing(); for (int i = 0; i < files.length; i++) { if( files[i].isDir() ) { System.out.println(" dir " + files[i].getLocalName() ); } else { System.out.println(" file " + files[i].getLocalName() + files[i].getReplication()+files[i].getBlockSize() ); } } } } else { System.out.println("it is not a directory"); } } catch ( Exception e ) { // Could be IOException or InterruptedException e.printStackTrace(); } } else { System.out.println("a requirement is null"); } } } private static DFSClient getDFSClient(final UserGroupInformation user, final InetSocketAddress addr, final Configuration conf ) throws IOException, InterruptedException { return user.doAs(new PrivilegedExceptionAction<DFSClient>() { public DFSClient run() throws IOException { return new DFSClient(addr, conf); } }); } public static String validatePath(String p) { return p == null || p.length() == 0? null: new Path(p).toUri().getPath(); } public static void main( String[] args ) { if( args.length == 3 && args[2].contains("/") && args[0].contains(".") ) { FileTest ft = new FileTest( args[0], args[1], args[2] ); } else { System.out.println("Usage: java FileTest <serverName> <nameNodeInfoPort> <dir> "); System.out.println( "a valid dir must have '/' in the string somewhere"); System.out.println("a valid server must have '.' in the string somewhere"); } } } View this message in context: http://lucene.472066.n3.nabble.com/How-to-enumerate-files-in-the-directories-tp1325738p1685723.html Sent from the Hadoop lucene-users mailing list archive at Nabble.com.
-
Re: How to enumerate files in the directories?path2727 2010-10-12, 08:22
Configuration conf = new Configuration(true); conf.set( "fs.default.name", "hdfs://<namenode>:<port>"); I noticed that simply adding this line to a few of the previous posts solved my problems. I was frustrated because I was trying to use their examples and it only printed my LOCAL file system. I was using the examples as java <program_name> and not via the '$HADOOP_HOME/bin/hadoop jar'. Since i was trying to execute them in this way, my program wasn't loading the Configuration object correctly. Just thought I would add that here since it caused me frustration. -- View this message in context: http://lucene.472066.n3.nabble.com/How-to-enumerate-files-in-the-directories-tp1325738p1686040.html Sent from the Hadoop lucene-users mailing list archive at Nabble.com. |