Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HDFS, mail # user - why did I achieve such poor performance of HDFS

Copy link to this message
Re: why did I achieve such poor performance of HDFS
Konstantin Boudnik 2009-08-04, 16:06
And here's some reading you can find useful:

On 8/4/09 8:52 AM, Konstantin Boudnik wrote:
> Hi Hao.
> One more question for you - I should've asked it in my first email, though...
> What is your network speed/throughput on such a massive reads WITHOUT HDFS in
> place? While I'm agree that ~14Kbps isn't that much at all, I was wondering what
> would be the speed of 5000 simultaneous reads from a native file systems over
> the same network?
> Could such a test be congregated in your setup?
> One more issue here is that in your first test the size of a file is smaller
> than a default HDFS block size (64MB i think) and it is likely to create
> significant overhead and affect the performance.
> 1) For a sharing of your current test you can simply create new JIRA under
> https://issues.apache.org/jira/browse/ under 'test' or simply send it to me as
> an attachment and I'll take care about JIRA stuff. But I'd love to see the
> result of the other test I've mentioned above if possible.
> 2) DFSClient does provide an API for random reads from a file and this API is
> thread safe. However, my uneducated guess would be that it is likely to be a
> responsibility of a client (your) problem to 'rebuild' the file from randomly
> read block in correct order. It is like pretty much any other filesystem out
> there: YOU have to know the sequence of the pieces of your file in order to
> reconstruct them from many concurrent reads.
> Hope it helps,
>    Konstantin
> On 8/3/09 6:49 PM, Hao Gong wrote:
>> Hi Konstantin,
>>     Thank you for your responsing.
>>     1. Yes. It is automated and can be reused easily by anyone, I think.
>> Because I didn't change the HDFS code and parameter except for the parameter
>> of "hadoop.tmp.dir" and "fs.default.name".
>>     2. Yes. I can share our test with the community. How to do it now?
>>     By the way, I have a little question about HDFS.
>>     1. HDFS client is a single-threaded or multi-threaded when it transmit the
>> blocks of a certain file? I mean that for example, if file A, its size is
>> 256MB, it divide 4 blocks in 4 datanodes. When client PUT or GET this file,
>> the operation is sequential (one block by one) or simultaneous (client GET
>> the 4 block from 4 datanodes at the same time)?
>>     In client source, I used "FSDataInputStream.read(long position, byte[]
>> buffer, int offset, int length)" to GET the file.
>>     Thanks very much.
>> Best regards,
>> Hao Gong
>> Huawei Technologies Co., Ltd
>> ***********************************************
>> This e-mail and its attachments contain confidential information from
>> HUAWEI, which is intended only for the person or entity whose address is
>> listed above. Any use of the information contained herein in any way
>> (including, but not limited to, total or partial disclosure, reproduction,
>> or dissemination) by persons other than the intended recipient(s) is
>> prohibited. If you receive this e-mail in error, please notify the sender by
>> phone or email immediately and delete it!
>> ***********************************************
>> -----锟绞硷拷原锟斤拷-----
>> 锟斤拷锟斤拷锟斤拷: Konstantin Boudnik [mailto:[EMAIL PROTECTED]]
>> 锟斤拷锟斤拷时锟斤拷: 2009锟斤拷8锟斤拷4锟斤拷 1:02
>> 锟秸硷拷锟斤拷: [EMAIL PROTECTED]
>> 锟斤拷锟斤拷: Re: why did I achieve such poor performance of HDFS
>> Hi Hao.
>> Thanks for the observation. While I'll leave a chance to comment on the
>> particular situation to someone knowing more about HDFS than me, I would
>> like to
>> ask you a couple of questions:
>>      - do you have that particular test in a completely separable form? I.e.
>> is it
>> automated and can it be reused easily by some one else?
>>      - could you share this test with the rest of the community through a JIRA
>> or
>> else?
>> Thanks,
>>      Konstantin (aka Cos)
>> On 8/3/09 12:59 AM, Hao Gong wrote:
>>> Hi all,
>>> I have used HDFS as distributed storage system for experiment. But in my

With best regards,
Konstantin Boudnik (aka Cos)

        Yahoo! Grid Computing
        +1 (408) 349-4049

2CAC 8312 4870 D885 8616  6115 220F 6980 1F27 E622
Attention! Streams of consciousness are disallowed