Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce >> mail # user >> Using a hard drive instead of

Copy link to this message
Re: Using a hard drive instead of
This is why memory-mapped files were invented.

On Thu, Oct 11, 2012 at 9:34 PM, Gaurav Sharma
> If you don't mind sharing, what hard drive do you have with these
> properties:
> -"performance of RAM"
> -"can accommodate very many threads"
> On Oct 11, 2012, at 21:27, Mark Kerzner <[EMAIL PROTECTED]> wrote:
> Harsh,
> I agree with you about many small files, and I was giving this only in way
> of example. However, the hard drive I am talking about can be 1-2 TB in
> size, and that's pretty good, you can't easily get that much memory. In
> addition, it would be more resistant to power failures than RAM. And yes, it
> has the performance of RAM, and can accommodate very many threads.
> Mark
> On Thu, Oct 11, 2012 at 11:16 PM, Harsh J <[EMAIL PROTECTED]> wrote:
>> Hi Mark,
>> Note that the NameNode does random memory access to serve back any
>> information or mutate request you send to it, and that there can be
>> several number of concurrent clients. So do you mean a 'very fast hard
>> drive' thats faster than the RAM for random access itself? The
>> NameNode does persist its block information onto disk for various
>> purposes, but to actually make the NameNode use disk storage
>> completely (and not specific parts of it disk-cached instead) wouldn't
>> make too much sense to me. That'd feel like trying to communicate with
>> a process thats swapping, performance-wise.
>> The too many files issue is bloated up to sound like its a NameNode
>> issue but it isn't in reality. HDFS allows you to process lots of
>> files really fast, aside of helping store them for long periods, and a
>> lot of tiny files only gets you down in such operations with overheads
>> of opening and closing files in the way of reading them all at a time.
>> With a single or a few large files, all you do is block (data) reads,
>> and very few NameNode communications - ending up going much faster.
>> This is the same for local filesystems as well, but not many think of
>> that.
>> On Fri, Oct 12, 2012 at 9:29 AM, Mark Kerzner <[EMAIL PROTECTED]>
>> wrote:
>> > Hi,
>> >
>> > Imagine I have a very fast hard drive that I want to use for the
>> > NameNode.
>> > That is, I want the NameNode to store its blocks information on this
>> > hard
>> > drive instead of in memory.
>> >
>> > Why would I do it? Scalability (no federation needed), many files are
>> > not a
>> > problem, and warm fail-over is automatic. What would I need to change in
>> > the
>> > NameNode to tell it to use the hard drive?
>> >
>> > Thank you,
>> > Mark
>> --
>> Harsh J

Lance Norskog