Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce >> mail # user >> Re: Hadoop harddrive space usage


Copy link to this message
-
Re: Hadoop harddrive space usage
Perfect, thanks. It's what I was looking for.

I have few nodes, all with 2TB drives, but one with 2x1TB. Which mean
that at the end, for Hadoop, it's almost the same thing.

JM

2012/12/28, Robert Molina <[EMAIL PROTECTED]>:
> Hi Jean,
> Hadoop will not factor in number of disks or directories, but rather mainly
> allocated free space.  Hadoop will do its best to spread the data across
> evenly amongst the nodes.  For instance, let's say you had 3 datanodes
> (replication factor 1) and all have allocated 10GB each, but one of the
> nodes split the 10GB into two directories.  Now if we try to store a file
> that takes up 3 blocks, Hadoop will just place 1 block in each node.
>
> Hope that helps.
>
> Regards,
> Robert
>
> On Fri, Dec 28, 2012 at 9:12 AM, Jean-Marc Spaggiari <
> [EMAIL PROTECTED]> wrote:
>
>> Hi,
>>
>> Quick question regarding hard drive space usage.
>>
>> Hadoop will distribute the data evenly on the cluster. So all the
>> nodes are going to receive almost the same quantity of data to store.
>>
>> Now, if on one node I have 2 directories configured, is hadoop going
>> to assign twice the quantity on this node? Or is each directory going
>> to receive half the load?
>>
>> Thanks,
>>
>> JM
>>
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB