Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HDFS, mail # dev - Re: Datanodes optimizations


Copy link to this message
-
Re: Datanodes optimizations
Hairong Kuang 2012-03-05, 23:40
+ Peter and hdfs-dev

Awesome, Siying! I am so impressed by the amount of work that you've done to improve HDFS I/O.

Could you please revert r23326 r23292 and r23290? I do not think that they will be useful for the warehouse use case.

Thanks!
Hairong

From: Siying Dong <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>>
Date: Mon, 5 Mar 2012 14:48:23 -0800
To: Dmytro Molkov <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>>, Hairong Kuang <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>>, Guoqiang Jerry Chen <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>>
Cc: Zheng Shao <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>>
Subject: RE: Datanodes optimizations

I cut svn+ssh://tubbs/svnhive/hadoop/branches/hive99-r19817-11012011-03022012 ported some changes and deployed it (without changes I ported today) to test cluster last Friday. Here is the patch list I ported:
------------------------------------------------------------------------
r23326 | sdong | 2012-03-05 14:38:48 -0800 (Mon, 05 Mar 2012) | 20 lines
HDFS: Datanode adds received length for blocks being received

Summary:
In order to make the visible length the real stable length client to
use, when doing recover, data nodes should use received length for doing the
recovery
Differential Revision: https://phabricator.fb.com/D417973

------------------------------------------------------------------------
r23325 | sdong | 2012-03-05 14:36:19 -0800 (Mon, 05 Mar 2012) | 24 lines
HDFS Datanode: replace FSDataset.setVisibleLength() when doing read to activeFile.setVisibleLength() to reduce acquiring FSDataset.lock

Summary:
setVisibleLength() in write pipeline may not need to grab locks. Instead, it
can just hold ActiveFile. To avoid recovery thread from modifying the object,
modify the recovery thread codes to make a copy of it, instad of using the old
one.

Differential Revision: https://phabricator.fb.com/D404130

------------------------------------------------------------------------
r23296 | sdong | 2012-03-02 18:18:26 -0800 (Fri, 02 Mar 2012) | 24 lines

HDFS Datanode: ignore the error for block to delete is already not in memory

Summary:
There is a possibility that after block report, namenodes send
duplicate block to invalidate to a datanode, which has just been sent by
previous datanode. In that case, we should just ignore the exception and
shouldn't do a checkdir after that.

ant TestDatanodeRestart
ant TestSimulatedDataset
Differential Revision: https://phabricator.fb.com/D419779

------------------------------------------------------------------------
r23295 | sdong | 2012-03-02 18:15:39 -0800 (Fri, 02 Mar 2012) | 34 lines
HDFS: Datanode starts to wait for ack of a packet as soon as it is sent.

Summary:
Curently, HDFS Datanodes do this sequence:
1. receive a packet
2. forward the packet
3. write the packet to file
4. start to wait for ack of the packet with a timeout
The timeout for 4. is only slightly shorter than the client to wait for the
ack.
If for a DN1, 3. takes 3+ seconds and there is real connectivity or load issue
with the next DN2 (which DN1 forwards the packet to), client will timeout first
and determine DN1 to be bad, but leaves the real bad DN2 in the pipeline, which
will cause more permanent erorrs than needed

This patch makes the datanode start to wait for ack as soon as the packet is
forwarded to fix this issue.

Differential Revision: https://phabricator.fb.com/D406408

------------------------------------------------------------------------
r23294 | sdong | 2012-03-02 18:14:36 -0800 (Fri, 02 Mar 2012) | 25 lines

HDFS: eliminate locking in VolumeSet

Summary:
This patch tries to eliminate retrying global lock to access volumne map and
volumns.
It is a risky change. I first make it and see whether people are comfortable
with it before moving forward

Differential Revision: https://phabricator.fb.com/D398241

------------------------------------------------------------------------
r23293 | sdong | 2012-03-02 17:56:34 -0800 (Fri, 02 Mar 2012) | 21 lines
HDFS Datanode
Summary:
This is to remove a locking from read pipeline.
Notice this function call is only recenlty added and is still not in production
yet.

Differential Revision: https://phabricator.fb.com/D406447
r23292 | sdong | 2012-03-02 17:53:03 -0800 (Fri, 02 Mar 2012) | 27 lines
HDFS: Client to update available() for files under-construction: part1 - Data
Protocol Change

Summary:
This is the first part of the feature for clients to update available() for
files under-construction. The data protocols. Changes are:
(1) Send the length of the block in the end of the block. Use negative sign to
show whether the block has been completed or not
(2) allow clients to ask for 0 bytes. If 0 byte is asked and no byte available,
send -2 or -3.
(3) datanodes setVisible() when receiving acks, instead of after receivint it.

Differential Revision: https://phabricator.fb.com/D386945

r23290 | sdong | 2012-03-02 17:43:48 -0800 (Fri, 02 Mar 2012) | 22 lines
HDFS BlockReceiver to update visibility of a block when ack is received instead
of packet received

Summary:
This change could make a visibility variable more reliable for clients
to read. Also, it will help performance as we move a global lock from the
crucial data thread.

Differential Revision: https://phabricator.fb.com/D389226
r23289 | sdong | 2012-03-02 17:32:45 -0800 (Fri, 02 Mar 2012) | 27 lines
HDFS Client: only start to wait for packet ack when there is something sent.

Summary:
Since heartbeats are only guarantee to send from client timeout/2,
while client started to wait for ack as soon as it received the previous one,
there is a good chance that client hit time-out earlier than the first
Datanode,
if the second or third data node has problem. And then the client will falsely
remove the first DN from the pipeline, which is wrong and it will cause more
permanent failure then it should.

Differential Revision: https://phabricator.fb.com/D405345

r23287 | sdong | 2012-03-02 17:30:09 -0800 (Fri, 02