Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HDFS, mail # dev - Block deletion after benchmarks


Copy link to this message
-
Re: Block deletion after benchmarks
Harsh J 2013-06-16, 14:28
Eitan,

I don't completely get your question. TestDFSIO is a test that will
create several files for testing the IO and then delete it at the end
of the test.

Block deletions in HDFS is an asynchronous process. File deletions are
instantaneous (as a transaction in the namespace) but the identified
block's deletions are progressively done over DN heartbeats and are
throttled (to avoid a storm of deletes from affecting DN memory
usage). You can look at dfs.namenode.invalidate.work.pct.per.iteration
in http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/hdfs-default.xml
to control this to make it go faster, but am not sure I got your
question right. The test just uses FS APIs, the FS just has a
different data (not file) deletion behavior - the Test isn't
responsible for that.

On Tue, Jun 11, 2013 at 4:03 AM, Eitan Rosenfeld <[EMAIL PROTECTED]> wrote:
> Hi all,
>
> In my two-datanode cluster, I require that file operations on the
> underlying filesystem take place in the same order.  Essentially, I wish
> for blocks to be created, written, and/or deleted deterministically across
> datanodes.
>
> However, this is not the case towards the end of the TestDFSIO benchmark.
> Several blocks are deleted, but each datanode performs this deletion at a
> *different time* relative to the last few blocks being written.
>
> What component is initiating the block deletion at the end of the
> benchmark?
>
> (It seems to be the Replication Monitor, but I'm unclear on what causes the
> Replication Monitor to suddenly run and delete blocks at the end of the
> benchmark).  I am using Hadoop 1.0.4.
>
> Thank you,
> Eitan Rosenfeld

--
Harsh J