-
Notifications
You must be signed in to change notification settings - Fork 8.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HDFS-17150. EC: Fix the bug of failed lease recovery. #5937
Conversation
@zhangshuyan0 Thanks a lot for reporting this bug. This phenomenon also happens on our clusters. Leave some questions, hope to receive your reply~ Thanks. |
minLocationsNum = ((BlockInfoStriped) lastBlock).getRealDataBlockNum(); | ||
} | ||
if (uc.getNumExpectedLocations() < minLocationsNum && | ||
lastBlock.getNumBytes() == 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@zhangshuyan0 Whether should we log this special case or not?
Line 3819 in 4e7cd88
NameNode.stateChangeLog.warn("BLOCK* internalReleaseLease: " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your review. This log is not very suitable, I will make some changes.
if (lastBlock.isStriped()) { | ||
minLocationsNum = ((BlockInfoStriped) lastBlock).getRealDataBlockNum(); | ||
} | ||
if (uc.getNumExpectedLocations() < minLocationsNum && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, @zhangshuyan0 , Also have a question here, is uc.getNumExpectedLocations()
always >=
minLocationsNum? because when invoke getAdditionalBlock rpc, the field replicas in
classBlockUnderConstructionFeature has been set to targets.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First, the writing process of EC files never call getAdditionalBlock
. Second, after a failover in NameNode, the content of BlockUnderConstructionFeature#replicas
is totally depends on block reports (IBR or FBR).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@zhangshuyan0 Thanks a lot. I ignored the failover condition.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here I think the writing process of EC files will call getAdditionalBlock and will set ReplicaUnderConstruction[] replicas , for standby name need run failover ha will set BlockUnderConstructionFeature#replicas is totally when block reports (IBR or FBR). ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First, the writing process of EC files never call
getAdditionalBlock
. Second, after a failover in NameNode, the content ofBlockUnderConstructionFeature#replicas
is totally depends on block reports (IBR or FBR).
Sorry, I mistook getAdditionalBlock
for getAdditionalDatanode
just now, but the conclusion still holds because the failover occurs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@zhangshuyan0 Hi, shuyan. please also check below code snippet in method FSDirWriteFileOp#storeAllocatedBlock:
final BlockType blockType = pendingFile.getBlockType();
// allocate new block, record block locations in INode.
Block newBlock = fsn.createNewBlock(blockType);
INodesInPath inodesInPath = INodesInPath.fromINode(pendingFile);
saveAllocatedBlock(fsn, src, inodesInPath, newBlock, targets, blockType);
persistNewBlock(fsn, src, pendingFile);
Does the BlockUnderConstructionFeature#replicas also write to editlog because it is a part of lastBlock.
Thanks a lot.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@hfutatzhanghb AddBlockOp
only stores blockId, numBytes and generationStamp for the last block.
// There is no datanode reported to this block. | ||
// may be client have crashed before writing data to pipeline. | ||
// This blocks doesn't need any recovery. | ||
// We can remove this block and close the file. | ||
pendingFile.removeLastBlock(lastBlock); | ||
finalizeINodeFileUnderConstruction(src, pendingFile, | ||
iip.getLatestSnapshotId(), false); | ||
NameNode.stateChangeLog.warn("BLOCK* internalReleaseLease: " | ||
+ "Removed empty last block and closed file " + src); | ||
if (uc.getNumExpectedLocations() == 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here how about update to this judgment logic?
if (lastBlock.isStriped()) {
NameNode.stateChangeLog.warn("BLOCK* internalReleaseLease: "
+ "Removed last unrecoverable block group and closed file " + src);
} else {
NameNode.stateChangeLog.warn("BLOCK* internalReleaseLease: "
+ "Removed empty last block and closed file " + src);
}```
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If uc.getNumExpectedLocations()
is 0, regardless of whether it is a striped block or not, I think we should all consider it to be an empty block, not unrecoverable. So I think the code before is better. What's your opinion?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, I understand your thoughts,
Perhaps it would be better to add some comment description.
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great catch! leave one nit comment inline.
NameNode.stateChangeLog.warn("BLOCK* internalReleaseLease: " | ||
+ "Removed empty last block and closed file " + src); | ||
} else { | ||
NameNode.stateChangeLog.warn("BLOCK* internalReleaseLease: " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A little weird that if we can determine it is EC file here when uc.getNumExpectedLocations() != 0
, if not it will be ambiguity of this log print.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
uc.getNumExpectedLocations() != 0
means minLocationsNum != 1
, so it must be a EC file according to line 3806-3809.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Totally true, but poor readability. Any other way to improve it, such as lastBlock.isStriped()
or others?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If uc.getNumExpectedLocations() is 0, regardless of whether it is a striped block or not, I think we should all consider it to be an empty block, not unrecoverable.
How about adding some comments here?
@Hexiaoqiao @haiyang1987 Some comments have been added. Please take a check when you have free time. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. +1.
@haiyang1987 @hfutatzhanghb Please let me know if any furthermore comments.
LGTM. +1. |
💔 -1 overall
This message was automatically generated. |
LGTM. +1. |
Committed to trunk. Thanks @zhangshuyan0 for your works. And @haiyang1987 , @hfutatzhanghb for your reviews! |
…buted by Shuyan Zhang. Reviewed-by: hfutatzhanghb <1036798979@qq.com> Reviewed-by: Haiyang Hu <haiyang.hu@shopee.com> Signed-off-by: He Xiaoqiao <hexiaoqiao@apache.org> (cherry picked from commit 655c3df)
…Contributed by Shuyan Zhang. Reviewed-by: hfutatzhanghb <1036798979@qq.com> Reviewed-by: Haiyang Hu <haiyang.hu@shopee.com> Signed-off-by: He Xiaoqiao <hexiaoqiao@apache.org>
…Contributed by Shuyan Zhang. Reviewed-by: hfutatzhanghb <1036798979@qq.com> Reviewed-by: Haiyang Hu <haiyang.hu@shopee.com> Signed-off-by: He Xiaoqiao <hexiaoqiao@apache.org>
If the client crashes without writing the minimum number of internal blocks required by the EC policy, the lease recovery process for the corresponding unclosed file may continue to fail. Taking RS(6,3) policy as an example, the timeline is as follows:
uc.getNumExpectedLocations()
completely depends on block report, and there are 5 datanodes reporting internal blocks;hadoop/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BlockRecoveryWorker.java
Lines 534 to 540 in b6edcb9
When the number of internal blocks written by the client is less than 6, the block group is actually unrecoverable. We should equate this situation to the case where the number of replicas is 0 when processing replica files, i.e., directly remove the last block group and close the file.