Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HDFS, mail # dev - DistributedFileSystem.listStatus()  - Why does it do partial listings then assemble?


Copy link to this message
-
Re: DistributedFileSystem.listStatus() - Why does it do partial listings then assemble?
Steve Loughran 2013-05-03, 17:14
On 2 May 2013 09:28, Todd Lipcon <[EMAIL PROTECTED]> wrote:

> Hi Brad,
>
> The reasoning is that the NameNode locking is somewhat coarse grained. In
> older versions of Hadoop, before it worked this way, we found that listing
> large directories (eg with 100k+ files) could end up holding the namenode's
> lock for a quite long period of time and starve other clients.
>
> Additionally, I believe there is a second API that does the "on-demand"
> fetching of the next set of files from the listing as well, no?
>

HDFS v2; only incompatible change between v1 and v2 FileSystem class.

Chatty over long haul and hangs Amazon S3://  an issue for which there's a
patch to
replicate but not fix the problem
https://issues.apache.org/jira/browse/HADOOP-9410

Good local -but I think it needs test coverage for all the other filesystem
clients that ship w/ Hadoop

FWIW, blobstores do tend to only support paged lists of their blobs, so the
same build-up-as-you-go-along process works there. We should spell out in
the documentation "changes that occur to the filesystem during the
generation of this list MAY not be reflected in the result, and so MAY
result in a partially incomplete or inconsistent view".

-Steve