Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
HDFS >> mail # dev >> DistributedFileSystem.listStatus()  - Why does it do partial listings then assemble?


Copy link to this message
-
DistributedFileSystem.listStatus()  - Why does it do partial listings then assemble?
Could someone explain why the DistributedFileSystem's listStatus() method does a piecemeal assembly of a directory listing within the method?

Is there a locking issue? What if an element is added to the the directory during the operation?  What if elements are removed?

It would make sense to me that the FileSystem class listStatus() method returned an Iterator allowing only partial fetching/chatter as needed.  But I dont understand why you'd want to assemble a giant array of the listing chunk by chunk.
Here's the source of the listStatus() method, and I've linked the entire class below.
---------------------------------

  public FileStatus[] listStatus(Path p) throws IOException {
    String src = getPathName(p);
    
    // fetch the first batch of entries in the directory
    DirectoryListing thisListing = dfs.listPaths(
        src, HdfsFileStatus.EMPTY_NAME);
    
    if (thisListing == null) { // the directory does not exist
      return null;
    }
    
    HdfsFileStatus[] partialListing = thisListing.getPartialListing();
    if (!thisListing.hasMore()) { // got all entries of the directory
      FileStatus[] stats = new FileStatus[partialListing.length];
      for (int i = 0; i < partialListing.length; i++) {
        stats[i] = makeQualified(partialListing[i], p);
      }
      statistics.incrementReadOps(1);
      return stats;
    }
    
    // The directory size is too big that it needs to fetch more
    // estimate the total number of entries in the directory
    int totalNumEntries =
      partialListing.length + thisListing.getRemainingEntries();
    ArrayList<FileStatus> listing =
      new ArrayList<FileStatus>(totalNumEntries);
    // add the first batch of entries to the array list
    for (HdfsFileStatus fileStatus : partialListing) {
      listing.add(makeQualified(fileStatus, p));
    }
    statistics.incrementLargeReadOps(1);

    // now fetch more entries
    do {
      thisListing = dfs.listPaths(src, thisListing.getLastName());
      
      if (thisListing == null) {
        return null; // the directory is deleted
      }
      
      partialListing = thisListing.getPartialListing();
      for (HdfsFileStatus fileStatus : partialListing) {
        listing.add(makeQualified(fileStatus, p));
      }
      statistics.incrementLargeReadOps(1);
    } while (thisListing.hasMore());

    return listing.toArray(new FileStatus[listing.size()]);
  }

--------------------------------------------

Ref:
https://svn.apache.org/repos/asf/hadoop/common/tags/release-1.0.4/src/hdfs/org/apache/hadoop/hdfs/DistributedFileSystem.java
http://docs.oracle.com/javase/6/docs/api/java/util/Iterator.html
thanks!

-bc
+
Todd Lipcon 2013-05-02, 16:28
+
Suresh Srinivas 2013-05-02, 16:34
+
Steve Loughran 2013-05-03, 17:14