Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HDFS >> mail # user >> why did I achieve such poor performance of HDFS


Copy link to this message
-
Re: why did I achieve such poor performance of HDFS
And here's some reading you can find useful:
http://www.facebook.com/note.php?note_id=53035052002&ref=mf

On 8/4/09 8:52 AM, Konstantin Boudnik wrote:
> Hi Hao.
>
>
> One more question for you - I should've asked it in my first email, though...
> What is your network speed/throughput on such a massive reads WITHOUT HDFS in
> place? While I'm agree that ~14Kbps isn't that much at all, I was wondering what
> would be the speed of 5000 simultaneous reads from a native file systems over
> the same network?
>
> Could such a test be congregated in your setup?
>
> One more issue here is that in your first test the size of a file is smaller
> than a default HDFS block size (64MB i think) and it is likely to create
> significant overhead and affect the performance.
>
> 1) For a sharing of your current test you can simply create new JIRA under
> https://issues.apache.org/jira/browse/ under 'test' or simply send it to me as
> an attachment and I'll take care about JIRA stuff. But I'd love to see the
> result of the other test I've mentioned above if possible.
>
> 2) DFSClient does provide an API for random reads from a file and this API is
> thread safe. However, my uneducated guess would be that it is likely to be a
> responsibility of a client (your) problem to 'rebuild' the file from randomly
> read block in correct order. It is like pretty much any other filesystem out
> there: YOU have to know the sequence of the pieces of your file in order to
> reconstruct them from many concurrent reads.
>
> Hope it helps,
>    Konstantin
>
> On 8/3/09 6:49 PM, Hao Gong wrote:
>> Hi Konstantin,
>>
>>     Thank you for your responsing.
>>     1. Yes. It is automated and can be reused easily by anyone, I think.
>> Because I didn't change the HDFS code and parameter except for the parameter
>> of "hadoop.tmp.dir" and "fs.default.name".
>>     2. Yes. I can share our test with the community. How to do it now?
>>
>>     By the way, I have a little question about HDFS.
>>     1. HDFS client is a single-threaded or multi-threaded when it transmit the
>> blocks of a certain file? I mean that for example, if file A, its size is
>> 256MB, it divide 4 blocks in 4 datanodes. When client PUT or GET this file,
>> the operation is sequential (one block by one) or simultaneous (client GET
>> the 4 block from 4 datanodes at the same time)?
>>     In client source, I used "FSDataInputStream.read(long position, byte[]
>> buffer, int offset, int length)" to GET the file.
>>
>>     Thanks very much.
>>
>> Best regards,
>> Hao Gong
>> Huawei Technologies Co., Ltd
>> ***********************************************
>> This e-mail and its attachments contain confidential information from
>> HUAWEI, which is intended only for the person or entity whose address is
>> listed above. Any use of the information contained herein in any way
>> (including, but not limited to, total or partial disclosure, reproduction,
>> or dissemination) by persons other than the intended recipient(s) is
>> prohibited. If you receive this e-mail in error, please notify the sender by
>> phone or email immediately and delete it!
>> ***********************************************
>> -----锟绞硷拷原锟斤拷-----
>> 锟斤拷锟斤拷锟斤拷: Konstantin Boudnik [mailto:[EMAIL PROTECTED]]
>> 锟斤拷锟斤拷时锟斤拷: 2009锟斤拷8锟斤拷4锟斤拷 1:02
>> 锟秸硷拷锟斤拷: [EMAIL PROTECTED]
>> 锟斤拷锟斤拷: Re: why did I achieve such poor performance of HDFS
>>
>> Hi Hao.
>>
>> Thanks for the observation. While I'll leave a chance to comment on the
>> particular situation to someone knowing more about HDFS than me, I would
>> like to
>> ask you a couple of questions:
>>      - do you have that particular test in a completely separable form? I.e.
>> is it
>> automated and can it be reused easily by some one else?
>>      - could you share this test with the rest of the community through a JIRA
>> or
>> else?
>>
>> Thanks,
>>      Konstantin (aka Cos)
>>
>> On 8/3/09 12:59 AM, Hao Gong wrote:
>>> Hi all,
>>>
>>> I have used HDFS as distributed storage system for experiment. But in my

With best regards,
Konstantin Boudnik (aka Cos)

        Yahoo! Grid Computing
        +1 (408) 349-4049

2CAC 8312 4870 D885 8616  6115 220F 6980 1F27 E622
Attention! Streams of consciousness are disallowed
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB