Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # user >> Question about disk space allocation in hadoop


Copy link to this message
-
Re: Question about disk space allocation in hadoop
Hi Chris,

Thanks a lot for your knowledge sharing, I'll have a further
investigation and give it a try on my cluster, hope could get a good
solution from them:)

Best Regards,
Carp

2010/6/30 Chris Smith <csmithx+[EMAIL PROTECTED]>:
> Some thoughts on how to restrict the temporary data, but I have only
> tried (a) in anger:
>
> a)    Partition your disks into HDFS and intermediate temp partitions
> of the relevant size.  This gives a fixed separation but is
> difficult/impossible to modify on a busy cluster especially as there
> may be no way of unloading/recovering the data stored in HDFS if you
> make a mistake resizing partitions;
>
> b)      Implement disk quotas and set relevant hard and soft limits on
> the relevant root directories for intermediate space. This gives you
> the flexibility to change the limits when required but as the limits
> are per user/group some thought may be required as to which user/group
> the limits apply to. There may also be a performance impact?
>
> You could combine this with setting “dfs.datanode.du.reserved” value
> in $HADOOP_HOME/conf/hdfs-site.xml for limiting HDFS disk usage.
>
> c)      Implement intermediate data space as a loopback file, see:
> http://wiki.cita.utoronto.ca/mediawiki/index.php/Fake_Fast_Local_Disk
> This example implements a temporary loopback filesystem on a iSCSI
> mounted Lustre filesystem but the principles are the same. There are
> some performance benchmarks linked to in section 3. The intermediate
> temp data space is limited by the size of the loopback file created.
>
> Chris
>
> -----Original Message-----
> From: Yu Li [mailto:[EMAIL PROTECTED]]
> Sent: 30 June 2010 04:11
> To: [EMAIL PROTECTED]
> Subject: Re: Question about disk space allocation in hadoop
>
> Hi all,
>
> Anybody has experience on this? Any Comments/Suggestions would be
> highly appreciated, Thanks.
>
> Best Regards,
> Carp
>
> 2010/6/29 Yu Li <[EMAIL PROTECTED]>:
>> Hi all,
>>
>> As we all know, machines in hadoop cluster may be both datanode and
>> tasktracker, so one machine may store both MR job intermediate data
>> and HDFS data. My question is: if we have more than one disk per node,
>> say 4 disks, and would like both job intermediate data and HDFS data
>> store into all disks to reduce IO times of each single disk, can we
>> draw a line between space of local FS and HDFS? For example, restrict
>> the intermediate temp data occupy no more than 25% space on each disk?
>> Thanks in advance.
>>
>> Best Regards,
>> Carp
>>
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB