Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # user >> Re: How to tune fileSystem.listFiles("/", true) if you like walk though almost all files


Copy link to this message
-
Re: How to tune fileSystem.listFiles("/", true) if you like walk though almost all files
Hi, i found out that it works much faster with fileSystem.listStatus() and
a recursion by hand.

listFiles    = 4021 Files in 14.27 s
listStatus = 4021 Files 364.3 ms

Currently i just tested it on localhost. Tomorrow I check it against the
cluster.

public class Main
{
static AtomicInteger count = new AtomicInteger();

static URI uri;
static FileSystem fileSystem;

public static void main(final String... args) throws URISyntaxException,
IOException, InterruptedException
{
uri = new URI("/home/christian/Documents");
fileSystem = FileSystem.get(uri, new Configuration(), "hdfs");

Stopwatch stopwatch = new Stopwatch();

stopwatch.start();
withListAllAndNext();
stopwatch.stop();
System.out.println(count + " - " + stopwatch);

stopwatch.reset();
count.set(0);
 stopwatch.start();
blockwiseWithRecursion(fileSystem.listStatus(new Path(uri)));
stopwatch.stop();
System.out.println(count + " - " + stopwatch);
}

private static void blockwiseWithRecursion(FileStatus... listLocatedStatus)
throws FileNotFoundException, IOException
{
for (FileStatus fileStatus : listLocatedStatus)
{
if (fileStatus.isDirectory())
blockwiseWithRecursion(fileSystem.listStatus(fileStatus.getPath()));
else
count.incrementAndGet();
}

}

private static void withListAllAndNext() throws FileNotFoundException,
IOException
{
RemoteIterator<LocatedFileStatus> listFiles = fileSystem.listFiles(new
Path(uri), true);
 while (true)
{
try
{
LocatedFileStatus next = listFiles.next();
count.incrementAndGet();
}
catch (IOException e)
{
System.err.println(e.getMessage());
}
catch (NoSuchElementException e)
{
break;
}
}
}
}
2013/8/12 Christian Schneider <[EMAIL PROTECTED]>

> Hi, is there a way to tune this?
>
> I walk though the files with:
>
> RemoteIterator<LocatedFileStatus> listFiles = fileSystem.listFiles(new
> Path(uri), true);
> while(listFiles.hasNext()) {
>     listFiles.next();
> };
>
> I need to get some information about those files, therefore i like to scan
> them all.
>
> Is there any way to tune the listFiles.next() call. Like loading a bunch
> of files, or multithreading it?
>
> Best Regards,
> Christian.
>