Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HDFS >> mail # user >> why did I achieve such poor performance of HDFS


Copy link to this message
-
Re: why did I achieve such poor performance of HDFS
Hi Hao.

Thanks for the observation. While I'll leave a chance to comment on the
particular situation to someone knowing more about HDFS than me, I would like to
ask you a couple of questions:
   - do you have that particular test in a completely separable form? I.e. is it
automated and can it be reused easily by some one else?
   - could you share this test with the rest of the community through a JIRA or
else?

Thanks,
   Konstantin (aka Cos)

On 8/3/09 12:59 AM, Hao Gong wrote:
> Hi all,
>
> I have used HDFS as distributed storage system for experiment. But in my
> test process, I find that the performance of HDFS is very poor.
>
> I make two scenarios. 1) Middle size file test: I PUT 200,000 middle
> size files (20KB~20MB randomly) into HDFS, and trigger 10 client to GET
> random 5000 files simultaneously. But the average GET throughput of
> client is very poor (approximately less than 14000 KBps). 2) Large size
> file test. I PUT 20,000 large size files (250MB~750MB randomly) into
> HDFS, and trigger 10 client to GET random 100 files simultaneously. But
> the average GET throughput of client is also very poor (approximately
> less than 12500 KBps).
>
> So I�m puzzle about these experiments, why did such a poor performance
> of HDFS, the available throughput of Client is far less than the limit
> of network bandwidth. Is that has any parameter I need to change for
> high performance in HDFS (I chose default parameter value)?
>
> My enviroment is list as follows
>
> 1) 30 common PC as HDFS slaves (core2 E7200, 4G ram, 1.5T hdd)
>
> 2) 10 common PC as HDFS clients (core2 E7200, 4G ram, 1.5T hdd)
>
> 3) A common PC as HDFS master (core2 E7200, 4G ram, 1.5T hdd)
>
> 4) 1000M switcher and link as star network architecture
>
> 5) The hadoop version is 0.20.0, JRE version is 1.6.0_11
>
> Is there has anybody to research the performance of HDFS, please contact
> me. Thank you very much.
>
> Best regards,
>
> Hao Gong
>
> Huawei Technologies Co., Ltd
> ***********************************************
> This e-mail and its attachments contain confidential information from
> HUAWEI, which is intended only for the person or entity whose address is
> listed above. Any use of the information contained herein in any way
> (including, but not limited to, total or partial disclosure,
> reproduction, or dissemination) by persons other than the intended
> recipient(s) is prohibited. If you receive this e-mail in error, please
> notify the sender by phone or email immediately and delete it!
> ***********************************************
>

--
With best regards,
Konstantin Boudnik (aka Cos)

         Yahoo! Grid Computing
         +1 (408) 349-4049

2CAC 8312 4870 D885 8616  6115 220F 6980 1F27 E622
Attention! Streams of consciousness are disallowed
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB