The HDFS client that opens a file for writing is granted a lease for the file; no other client can write to the file. The writing client periodically renews the lease by sending a heartbeat to the NameNode. When the file is closed, the lease is revoked. The lease duration is bound by a soft limit and a hard limit. Until the soft limit expires, the writer is certain of exclusive access to the file. If the soft limit expires and the client fails to close the file or renew the lease, another client can preempt the lease. If after the hard limit expires (one hour) and the client has failed to renew the lease, HDFS assumes that the client has quit and will automatically close the file on behalf of the writer, and recover the lease. The writer's lease does not prevent other clients from reading the file; a file may have many concurrent readers.
From: [EMAIL PROTECTED]
To: [EMAIL PROTECTED]
Subject: File size 0 bytes while open for write
Date: Thu, 12 Dec 2013 20:22:04 -0500
I am writing data from java thread, while it is writing to the file.(fsDataOutputStream = fs.append(pt);).
It shows 0 bytes for that file . while the file is actually has content. i guess the reason is it is still open.
But the question is what if the Thread got killed without closing the file? what should be done in this case? the fille will keep showing 'open for write, size 0'.
hadoop fs -ls /test/-rw-r--r-- 3 storm supergroup 0 2013-12-12 16:44 /test/SinkToHDFS-ip-.us-west-2.compute.internal-6703-22-20131212-0.snappy
hadoop fs -cat /test/SinkToHDFS-i.us-west-2.compute.internal-6703-22-20131212-0.snappy | wc -l243
hdfs fsck /test/ -openforwrite
Connecting to namenode via http://i.us-west-2.compute.internal:50070FSCK started by xiao (auth:SIMPLE) from for path /test/ at Thu Dec 12 16:52:01 PST 2013/test/SinkToHDFS-ip.us-west-2.compute.internal-6703-22-20131212-0.snappy 0 bytes, 1 block(s), OPENFORWRITE: Status: HEALTHY Total size: 0 B Total dirs: 1 Total files: 1 Total blocks (validated): 1 (avg. block size 0 B) Minimally replicated blocks: 1 (100.0 %) Over-replicated blocks: 0 (0.0 %) Under-replicated blocks: 0 (0.0 %) Mis-replicated blocks: 0 (0.0 %) Default replication factor: 3 Average block replication: 3.0 Corrupt blocks: 0 Missing replicas: 0 (0.0 %) Number of data-nodes: 3 Number of racks: 1FSCK ended at Thu Dec 12 16:52:01 PST 2013 in 1 milliseconds