Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
MapReduce >> mail # user >> Re: Using a hard drive instead of


+
Ravi Prakash 2012-10-13, 03:46
+
Mark Kerzner 2012-10-14, 04:07
+
Mark Kerzner 2012-10-12, 03:59
+
Harsh J 2012-10-12, 04:16
Copy link to this message
-
Re: Using a hard drive instead of
Harsh,

I agree with you about many small files, and I was giving this only in way
of example. However, the hard drive I am talking about can be 1-2 TB in
size, and that's pretty good, you can't easily get that much memory. In
addition, it would be more resistant to power failures than RAM. And yes,
it has the performance of RAM, and can accommodate very many threads.

Mark

On Thu, Oct 11, 2012 at 11:16 PM, Harsh J <[EMAIL PROTECTED]> wrote:

> Hi Mark,
>
> Note that the NameNode does random memory access to serve back any
> information or mutate request you send to it, and that there can be
> several number of concurrent clients. So do you mean a 'very fast hard
> drive' thats faster than the RAM for random access itself? The
> NameNode does persist its block information onto disk for various
> purposes, but to actually make the NameNode use disk storage
> completely (and not specific parts of it disk-cached instead) wouldn't
> make too much sense to me. That'd feel like trying to communicate with
> a process thats swapping, performance-wise.
>
> The too many files issue is bloated up to sound like its a NameNode
> issue but it isn't in reality. HDFS allows you to process lots of
> files really fast, aside of helping store them for long periods, and a
> lot of tiny files only gets you down in such operations with overheads
> of opening and closing files in the way of reading them all at a time.
> With a single or a few large files, all you do is block (data) reads,
> and very few NameNode communications - ending up going much faster.
> This is the same for local filesystems as well, but not many think of
> that.
>
> On Fri, Oct 12, 2012 at 9:29 AM, Mark Kerzner <[EMAIL PROTECTED]>
> wrote:
> > Hi,
> >
> > Imagine I have a very fast hard drive that I want to use for the
> NameNode.
> > That is, I want the NameNode to store its blocks information on this hard
> > drive instead of in memory.
> >
> > Why would I do it? Scalability (no federation needed), many files are
> not a
> > problem, and warm fail-over is automatic. What would I need to change in
> the
> > NameNode to tell it to use the hard drive?
> >
> > Thank you,
> > Mark
>
>
>
> --
> Harsh J
>
+
Gaurav Sharma 2012-10-12, 04:34
+
Lance Norskog 2012-10-12, 05:01
+
Colin Patrick McCabe 2012-10-17, 22:37
+
Michael Segel 2012-10-17, 23:27
+
Mark Kerzner 2012-10-17, 22:44
+
Colin McCabe 2012-10-17, 23:21
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB