Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HDFS, mail # user - HDFS without Hadoop: Why?


Copy link to this message
-
Re: HDFS without Hadoop: Why?
Bharath Mundlapudi 2011-02-05, 06:52
Note that there are other data structures in memory for the Namenode
like BlockMap, Directory etc. Just by having number of bytes for file
and block is not sufficient. But it is true that File and Block
structures occupy most of the memory, I would say these two will be in
the top 10 list of high memory objects. Probably, Konstantin's paper
will give you more holistic information.

Also, There were quite a bit of memory optimizations went into Namenode. With the recent
optimizations, you can expect > 60 million files (with 1 block each)
on a 32GB RAM machine. I am being conservative here. You can work your
way based on these numbers.  Assumption here is Namenode running on a
64-bit JVM.

-Bharath

From: Stuart Smith <[EMAIL PROTECTED]>
To: [EMAIL PROTECTED]
Cc:
Sent: Wednesday, February 2, 2011 7:32 PM
Subject: Re: HDFS without Hadoop: Why?

> Stuart - if Dhruba is giving hdfs file and block sizes used by the
namenode, you really cannot get a more authoritative number elsewhere :)

Yes - very true! :)

I spaced out on the name there ... ;)

One more thing - I believe that if you're storing a lot of your smaller files in hbase, you'll end up with a lot less files on hdfs, since several of your smaller files will end up in one HFile??

I'm storing 5-7 million files, with at least 70-80% ending up in hbase. I only have 16 GB of RAM for my name-node, and it's very far from overloading the memory. Off the top of my head, I think it's << 8 GB of RAM used...
Take care,
  -stu

--- On Wed, 2/2/11, Gaurav Sharma <[EMAIL PROTECTED]> wrote:
>From: Gaurav Sharma <[EMAIL PROTECTED]>
>Subject: Re: HDFS without Hadoop: Why?
>To: [EMAIL PROTECTED]
>Date: Wednesday, February 2, 2011, 9:31 PM
>
>
>Stuart -
> if Dhruba is giving hdfs file and block sizes used by the namenode, you really cannot get a more authoritative number elsewhere :) I would do the back-of-envelope with ~160 bytes/file and ~150 bytes/block.
>
>
>On Wed, Feb 2, 2011 at 9:08 PM, Stuart Smith <[EMAIL PROTECTED]> wrote:
>
>>>
>>
>>This is the best coverage I've seen from a source that would know:
>>
>>http://developer.yahoo.com/blogs/hadoop/posts/2010/05/scalability_of_the_hadoop_dist/
>>
>>One relevant quote:
>>
>>To store 100 million files (referencing 200 million blocks), a name-node should have at least 60 GB of RAM.
>>
>>But, honestly, if you're just building out your cluster, you'll probably run into a lot of other limits first: hard drive space, regionserver memory, the infamous ulimit/xciever :), etc...the
>>
>>Take care,
>>  -stu
>>
>>--- On Wed, 2/2/11, Dhruba Borthakur <[EMAIL PROTECTED]> wrote:
>>
>>>>>
>>>From: Dhruba Borthakur <[EMAIL PROTECTED]>
>>>
>>>Subject: Re: HDFS without Hadoop: Why?
>>>
>>>To: [EMAIL PROTECTED]
>>>Date:
>>> Wednesday, February 2, 2011, 9:00 PM
>>>
>>>
>>>
>>>The Namenode uses around 160 bytes/file and 150 bytes/block in HDFS. This is a very rough calculation.
>>>
>>>
>>>dhruba
>>>
>>>
>>>>>>
>>>On Wed, Feb 2, 2011 at 5:11 PM, Dhodapkar, Chinmay <[EMAIL PROTECTED]> wrote:
>>>
>>>>>>>
>>>>
>>>>
>>>>
>>>>
>>>>What you describe is pretty much my use case as well. Since I don’t know how big the number of files could get , I am trying to figure out if there is a theoretical
>>>> design limitation in hdfs…..
>>>>
>>>>From what I have read, the name node will store all metadata of all files in the RAM. Assuming (in my case), that a file is less than the configured block size….there
>>>> should be a very rough formula that can be used to calculate the max number of files that hdfs can serve based on the configured RAM on the name node?
>>>>
>>>>Can any of the implementers comment on this? Am I even thinking on the right track…?
>>>>
>>>>Thanks Ian for the haystack link���very informative indeed.
>>>>
>>>>-Chinmay
>>>>
>>>>
>>>>
>>>>From:Stuart Smith [mailto:[EMAIL PROTECTED]]
>>>>
>>>>Sent: Wednesday, February 02, 2011 4:41 PM
>>>>
>>>>To: [EMAIL PROTECTED]