Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # user >> Re: How to tune fileSystem.listFiles("/", true) if you like walk though almost all files


Copy link to this message
-
Re: How to tune fileSystem.listFiles("/", true) if you like walk though almost all files
Hi, i found out that it works much faster with fileSystem.listStatus() and
a recursion by hand.

listFiles    = 4021 Files in 14.27 s
listStatus = 4021 Files 364.3 ms

Currently i just tested it on localhost. Tomorrow I check it against the
cluster.

public class Main
{
static AtomicInteger count = new AtomicInteger();

static URI uri;
static FileSystem fileSystem;

public static void main(final String... args) throws URISyntaxException,
IOException, InterruptedException
{
uri = new URI("/home/christian/Documents");
fileSystem = FileSystem.get(uri, new Configuration(), "hdfs");

Stopwatch stopwatch = new Stopwatch();

stopwatch.start();
withListAllAndNext();
stopwatch.stop();
System.out.println(count + " - " + stopwatch);

stopwatch.reset();
count.set(0);
 stopwatch.start();
blockwiseWithRecursion(fileSystem.listStatus(new Path(uri)));
stopwatch.stop();
System.out.println(count + " - " + stopwatch);
}

private static void blockwiseWithRecursion(FileStatus... listLocatedStatus)
throws FileNotFoundException, IOException
{
for (FileStatus fileStatus : listLocatedStatus)
{
if (fileStatus.isDirectory())
blockwiseWithRecursion(fileSystem.listStatus(fileStatus.getPath()));
else
count.incrementAndGet();
}

}

private static void withListAllAndNext() throws FileNotFoundException,
IOException
{
RemoteIterator<LocatedFileStatus> listFiles = fileSystem.listFiles(new
Path(uri), true);
 while (true)
{
try
{
LocatedFileStatus next = listFiles.next();
count.incrementAndGet();
}
catch (IOException e)
{
System.err.println(e.getMessage());
}
catch (NoSuchElementException e)
{
break;
}
}
}
}
2013/8/12 Christian Schneider <[EMAIL PROTECTED]>

> Hi, is there a way to tune this?
>
> I walk though the files with:
>
> RemoteIterator<LocatedFileStatus> listFiles = fileSystem.listFiles(new
> Path(uri), true);
> while(listFiles.hasNext()) {
>     listFiles.next();
> };
>
> I need to get some information about those files, therefore i like to scan
> them all.
>
> Is there any way to tune the listFiles.next() call. Like loading a bunch
> of files, or multithreading it?
>
> Best Regards,
> Christian.
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB