Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
MapReduce, mail # user - Re: Using a hard drive instead of


+
Ravi Prakash 2012-10-13, 03:46
+
Mark Kerzner 2012-10-14, 04:07
+
Mark Kerzner 2012-10-12, 03:59
+
Harsh J 2012-10-12, 04:16
+
Mark Kerzner 2012-10-12, 04:27
+
Gaurav Sharma 2012-10-12, 04:34
Copy link to this message
-
Re: Using a hard drive instead of
Lance Norskog 2012-10-12, 05:01
This is why memory-mapped files were invented.

On Thu, Oct 11, 2012 at 9:34 PM, Gaurav Sharma
<[EMAIL PROTECTED]> wrote:
> If you don't mind sharing, what hard drive do you have with these
> properties:
> -"performance of RAM"
> -"can accommodate very many threads"
>
>
> On Oct 11, 2012, at 21:27, Mark Kerzner <[EMAIL PROTECTED]> wrote:
>
> Harsh,
>
> I agree with you about many small files, and I was giving this only in way
> of example. However, the hard drive I am talking about can be 1-2 TB in
> size, and that's pretty good, you can't easily get that much memory. In
> addition, it would be more resistant to power failures than RAM. And yes, it
> has the performance of RAM, and can accommodate very many threads.
>
> Mark
>
> On Thu, Oct 11, 2012 at 11:16 PM, Harsh J <[EMAIL PROTECTED]> wrote:
>>
>> Hi Mark,
>>
>> Note that the NameNode does random memory access to serve back any
>> information or mutate request you send to it, and that there can be
>> several number of concurrent clients. So do you mean a 'very fast hard
>> drive' thats faster than the RAM for random access itself? The
>> NameNode does persist its block information onto disk for various
>> purposes, but to actually make the NameNode use disk storage
>> completely (and not specific parts of it disk-cached instead) wouldn't
>> make too much sense to me. That'd feel like trying to communicate with
>> a process thats swapping, performance-wise.
>>
>> The too many files issue is bloated up to sound like its a NameNode
>> issue but it isn't in reality. HDFS allows you to process lots of
>> files really fast, aside of helping store them for long periods, and a
>> lot of tiny files only gets you down in such operations with overheads
>> of opening and closing files in the way of reading them all at a time.
>> With a single or a few large files, all you do is block (data) reads,
>> and very few NameNode communications - ending up going much faster.
>> This is the same for local filesystems as well, but not many think of
>> that.
>>
>> On Fri, Oct 12, 2012 at 9:29 AM, Mark Kerzner <[EMAIL PROTECTED]>
>> wrote:
>> > Hi,
>> >
>> > Imagine I have a very fast hard drive that I want to use for the
>> > NameNode.
>> > That is, I want the NameNode to store its blocks information on this
>> > hard
>> > drive instead of in memory.
>> >
>> > Why would I do it? Scalability (no federation needed), many files are
>> > not a
>> > problem, and warm fail-over is automatic. What would I need to change in
>> > the
>> > NameNode to tell it to use the hard drive?
>> >
>> > Thank you,
>> > Mark
>>
>>
>>
>> --
>> Harsh J
>
>

--
Lance Norskog
[EMAIL PROTECTED]
+
Colin Patrick McCabe 2012-10-17, 22:37
+
Michael Segel 2012-10-17, 23:27
+
Mark Kerzner 2012-10-17, 22:44
+
Colin McCabe 2012-10-17, 23:21