Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce, mail # user - Re: How to test Hadoop MapReduce under another File System NOT HDFS


Copy link to this message
-
Re: How to test Hadoop MapReduce under another File System NOT HDFS
Ling Kun 2013-02-22, 07:40
Dear Julien Muller and Harsh,
   Thanks very much for all your hints.

   Is there any recommended applications beside wordcount and Terasort?

Thanks
Ling Kun

On Thu, Feb 21, 2013 at 9:26 PM, Julien Muller <[EMAIL PROTECTED]>wrote:

> Some hints:
>
> 1) For features, you could start with unit tests available with hadoop fs.
> For performance, compare various bench results.
>
> 3) I could see at least 2 reasons for that. It could be that your
> filesystem does not support locality, so tasks are not executed on the same
> node as the data.
> Lot of stuff are done under the hood, maybe your fs has some lack for a
> very specific use case.
>
> You should choose your benchmarks very carefully to make sure they
> actually test what you want to test (i.e. not cpu)
>
> Julien
>
> 2013/2/21 Ling Kun <[EMAIL PROTECTED]>
>
>> Dear all,
>>     I am currently look into some other filesystem implementation, like
>> lustre, gluster, or other NAS with POSIX support, and trying to replace
>> HDFS with it.
>>
>>     I have implement a filesystem class( AFS)  which will provide
>> interface to Hadoop MapReduce, like the one of RawLocalFileSystem, and
>> examples like wordcount, terasort works well.
>>
>>    However, I am not sure whether my implementation is correct for all
>> the MapReduce applications that Hadoop MapReduce+Hadoop HDFS can run.
>>
>>    My question is :
>> 1. How Hadoop community do MapReduce regression test for any update of
>> Hadoop HDFS and Hadoop  MapReduce
>>
>> 2. Beside MapReduce wordcount and Terasort examples, are there any
>> missing filesystem interface support for MapReduce application. Since the
>> FileSystem has POSIX support, the hsync have also supported.
>>
>> 3. According to my test, the performance is worse than the
>> HDFS+MapReduce. Any suggestion or hint on the performance analysis? (
>> Without MapReduce, the performance of the filesystem is better than HDFS
>> and also local filesystem).
>> 3.1 the following are the same for the performance comparation:
>> 3.1.1 architecture: 4 node for MR, and another different 4 nodes for
>> HDFS/AFS
>> 3.1.2 application: the input size , the number of mapper and reducers are
>> the same.
>>
>>
>> Thanks.
>>
>> Ling Kun
>>
>> --
>> http://www.lingcc.com
>>
>