Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
MapReduce, mail # user - Re: How to test Hadoop MapReduce under another File System NOT HDFS

Julien Muller 2013-02-21, 13:26
Copy link to this message
Re: How to test Hadoop MapReduce under another File System NOT HDFS
Ling Kun 2013-02-22, 07:40
Dear Julien Muller and Harsh,
   Thanks very much for all your hints.

   Is there any recommended applications beside wordcount and Terasort?

Ling Kun

On Thu, Feb 21, 2013 at 9:26 PM, Julien Muller <[EMAIL PROTECTED]>wrote:

> Some hints:
> 1) For features, you could start with unit tests available with hadoop fs.
> For performance, compare various bench results.
> 3) I could see at least 2 reasons for that. It could be that your
> filesystem does not support locality, so tasks are not executed on the same
> node as the data.
> Lot of stuff are done under the hood, maybe your fs has some lack for a
> very specific use case.
> You should choose your benchmarks very carefully to make sure they
> actually test what you want to test (i.e. not cpu)
> Julien
> 2013/2/21 Ling Kun <[EMAIL PROTECTED]>
>> Dear all,
>>     I am currently look into some other filesystem implementation, like
>> lustre, gluster, or other NAS with POSIX support, and trying to replace
>> HDFS with it.
>>     I have implement a filesystem class( AFS)  which will provide
>> interface to Hadoop MapReduce, like the one of RawLocalFileSystem, and
>> examples like wordcount, terasort works well.
>>    However, I am not sure whether my implementation is correct for all
>> the MapReduce applications that Hadoop MapReduce+Hadoop HDFS can run.
>>    My question is :
>> 1. How Hadoop community do MapReduce regression test for any update of
>> Hadoop HDFS and Hadoop  MapReduce
>> 2. Beside MapReduce wordcount and Terasort examples, are there any
>> missing filesystem interface support for MapReduce application. Since the
>> FileSystem has POSIX support, the hsync have also supported.
>> 3. According to my test, the performance is worse than the
>> HDFS+MapReduce. Any suggestion or hint on the performance analysis? (
>> Without MapReduce, the performance of the filesystem is better than HDFS
>> and also local filesystem).
>> 3.1 the following are the same for the performance comparation:
>> 3.1.1 architecture: 4 node for MR, and another different 4 nodes for
>> 3.1.2 application: the input size , the number of mapper and reducers are
>> the same.
>> Thanks.
>> Ling Kun
>> --
>> http://www.lingcc.com
Ling Kun 2013-02-22, 07:07