Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HDFS-17150. EC: Fix the bug of failed lease recovery. #5937

Merged
merged 3 commits into from
Aug 15, 2023

Conversation

zhangshuyan0
Copy link
Contributor

@zhangshuyan0 zhangshuyan0 commented Aug 10, 2023

If the client crashes without writing the minimum number of internal blocks required by the EC policy, the lease recovery process for the corresponding unclosed file may continue to fail. Taking RS(6,3) policy as an example, the timeline is as follows:

  1. The client writes some data to only 5 datanodes;
  2. Client crashes;
  3. NN fails over;
  4. Now the result of uc.getNumExpectedLocations() completely depends on block report, and there are 5 datanodes reporting internal blocks;
  5. When the lease expires hard limit, NN issues a block recovery command;
  6. The datanode checks the command and finds that the number of internal blocks is insufficient, resulting in an exception and recovery failure;
    private void checkLocations(int locationCount)
    throws IOException {
    if (locationCount < ecPolicy.getNumDataUnits()) {
    throw new IOException(block + " has no enough internal blocks" +
    ", unable to start recovery. Locations=" + Arrays.asList(locs));
    }
    }
  7. The lease expires hard limit again, and NN issues a block recovery command again, but the recovery fails again......

When the number of internal blocks written by the client is less than 6, the block group is actually unrecoverable. We should equate this situation to the case where the number of replicas is 0 when processing replica files, i.e., directly remove the last block group and close the file.

@hfutatzhanghb
Copy link
Contributor

@zhangshuyan0 Thanks a lot for reporting this bug. This phenomenon also happens on our clusters. Leave some questions, hope to receive your reply~ Thanks.

minLocationsNum = ((BlockInfoStriped) lastBlock).getRealDataBlockNum();
}
if (uc.getNumExpectedLocations() < minLocationsNum &&
lastBlock.getNumBytes() == 0) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zhangshuyan0 Whether should we log this special case or not?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your review. This log is not very suitable, I will make some changes.

if (lastBlock.isStriped()) {
minLocationsNum = ((BlockInfoStriped) lastBlock).getRealDataBlockNum();
}
if (uc.getNumExpectedLocations() < minLocationsNum &&
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, @zhangshuyan0 , Also have a question here, is uc.getNumExpectedLocations() always >= minLocationsNum? because when invoke getAdditionalBlock rpc, the field replicas in
classBlockUnderConstructionFeature has been set to targets.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First, the writing process of EC files never call getAdditionalBlock. Second, after a failover in NameNode, the content of BlockUnderConstructionFeature#replicas is totally depends on block reports (IBR or FBR).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zhangshuyan0 Thanks a lot. I ignored the failover condition.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here I think the writing process of EC files will call getAdditionalBlock and will set ReplicaUnderConstruction[] replicas , for standby name need run failover ha will set BlockUnderConstructionFeature#replicas is totally when block reports (IBR or FBR). ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First, the writing process of EC files never call getAdditionalBlock. Second, after a failover in NameNode, the content of BlockUnderConstructionFeature#replicas is totally depends on block reports (IBR or FBR).

Sorry, I mistook getAdditionalBlock for getAdditionalDatanode just now, but the conclusion still holds because the failover occurs.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zhangshuyan0 Hi, shuyan. please also check below code snippet in method FSDirWriteFileOp#storeAllocatedBlock:

    final BlockType blockType = pendingFile.getBlockType();
    // allocate new block, record block locations in INode.
    Block newBlock = fsn.createNewBlock(blockType);
    INodesInPath inodesInPath = INodesInPath.fromINode(pendingFile);
    saveAllocatedBlock(fsn, src, inodesInPath, newBlock, targets, blockType);

    persistNewBlock(fsn, src, pendingFile);

Does the BlockUnderConstructionFeature#replicas also write to editlog because it is a part of lastBlock.
Thanks a lot.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hfutatzhanghb AddBlockOp only stores blockId, numBytes and generationStamp for the last block.

// There is no datanode reported to this block.
// may be client have crashed before writing data to pipeline.
// This blocks doesn't need any recovery.
// We can remove this block and close the file.
pendingFile.removeLastBlock(lastBlock);
finalizeINodeFileUnderConstruction(src, pendingFile,
iip.getLatestSnapshotId(), false);
NameNode.stateChangeLog.warn("BLOCK* internalReleaseLease: "
+ "Removed empty last block and closed file " + src);
if (uc.getNumExpectedLocations() == 0) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here how about update to this judgment logic?

  if (lastBlock.isStriped()) {
    NameNode.stateChangeLog.warn("BLOCK* internalReleaseLease: "
              + "Removed last unrecoverable block group and closed file " + src);
  } else {
  NameNode.stateChangeLog.warn("BLOCK* internalReleaseLease: "
              + "Removed empty last block and closed file " + src);
  }```

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If uc.getNumExpectedLocations() is 0, regardless of whether it is a striped block or not, I think we should all consider it to be an empty block, not unrecoverable. So I think the code before is better. What's your opinion?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, I understand your thoughts,

Perhaps it would be better to add some comment description.

@hadoop-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 29s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 codespell 0m 1s codespell was not available.
+0 🆗 detsecrets 0m 1s detect-secrets was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
-1 ❌ test4tests 0m 0s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch.
_ trunk Compile Tests _
+1 💚 mvninstall 32m 0s trunk passed
+1 💚 compile 0m 53s trunk passed with JDK Ubuntu-11.0.20+8-post-Ubuntu-1ubuntu120.04
+1 💚 compile 0m 48s trunk passed with JDK Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05
+1 💚 checkstyle 0m 44s trunk passed
+1 💚 mvnsite 0m 55s trunk passed
+1 💚 javadoc 0m 52s trunk passed with JDK Ubuntu-11.0.20+8-post-Ubuntu-1ubuntu120.04
+1 💚 javadoc 1m 10s trunk passed with JDK Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05
+1 💚 spotbugs 1m 58s trunk passed
+1 💚 shadedclient 23m 55s branch has no errors when building and testing our client artifacts.
_ Patch Compile Tests _
+1 💚 mvninstall 0m 46s the patch passed
+1 💚 compile 0m 45s the patch passed with JDK Ubuntu-11.0.20+8-post-Ubuntu-1ubuntu120.04
+1 💚 javac 0m 45s the patch passed
+1 💚 compile 0m 38s the patch passed with JDK Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05
+1 💚 javac 0m 38s the patch passed
+1 💚 blanks 0m 0s The patch has no blanks issues.
+1 💚 checkstyle 0m 32s the patch passed
+1 💚 mvnsite 0m 42s the patch passed
+1 💚 javadoc 0m 38s the patch passed with JDK Ubuntu-11.0.20+8-post-Ubuntu-1ubuntu120.04
+1 💚 javadoc 1m 6s the patch passed with JDK Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05
+1 💚 spotbugs 1m 52s the patch passed
+1 💚 shadedclient 24m 48s patch has no errors when building and testing our client artifacts.
_ Other Tests _
-1 ❌ unit 190m 50s /patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt hadoop-hdfs in the patch passed.
+1 💚 asflicense 0m 39s The patch does not generate ASF License warnings.
288m 11s
Reason Tests
Failed junit tests hadoop.hdfs.TestFileChecksum
hadoop.hdfs.server.namenode.ha.TestObserverNode
Subsystem Report/Notes
Docker ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5937/1/artifact/out/Dockerfile
GITHUB PR #5937
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets
uname Linux 068005fabb65 4.15.0-212-generic #223-Ubuntu SMP Tue May 23 13:09:22 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / 4e7cd88
Default Java Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.20+8-post-Ubuntu-1ubuntu120.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5937/1/testReport/
Max. process+thread count 4019 (vs. ulimit of 5500)
modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5937/1/console
versions git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

@hadoop-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 30s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 codespell 0m 1s codespell was not available.
+0 🆗 detsecrets 0m 1s detect-secrets was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
-1 ❌ test4tests 0m 0s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch.
_ trunk Compile Tests _
+1 💚 mvninstall 34m 54s trunk passed
+1 💚 compile 0m 52s trunk passed with JDK Ubuntu-11.0.20+8-post-Ubuntu-1ubuntu120.04
+1 💚 compile 0m 50s trunk passed with JDK Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05
+1 💚 checkstyle 0m 47s trunk passed
+1 💚 mvnsite 0m 52s trunk passed
+1 💚 javadoc 0m 48s trunk passed with JDK Ubuntu-11.0.20+8-post-Ubuntu-1ubuntu120.04
+1 💚 javadoc 1m 8s trunk passed with JDK Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05
+1 💚 spotbugs 1m 57s trunk passed
+1 💚 shadedclient 27m 6s branch has no errors when building and testing our client artifacts.
_ Patch Compile Tests _
+1 💚 mvninstall 0m 49s the patch passed
+1 💚 compile 0m 52s the patch passed with JDK Ubuntu-11.0.20+8-post-Ubuntu-1ubuntu120.04
+1 💚 javac 0m 52s the patch passed
+1 💚 compile 0m 45s the patch passed with JDK Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05
+1 💚 javac 0m 45s the patch passed
+1 💚 blanks 0m 0s The patch has no blanks issues.
-0 ⚠️ checkstyle 0m 38s /results-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt hadoop-hdfs-project/hadoop-hdfs: The patch generated 1 new + 110 unchanged - 0 fixed = 111 total (was 110)
+1 💚 mvnsite 0m 47s the patch passed
+1 💚 javadoc 0m 41s the patch passed with JDK Ubuntu-11.0.20+8-post-Ubuntu-1ubuntu120.04
+1 💚 javadoc 1m 16s the patch passed with JDK Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05
+1 💚 spotbugs 2m 17s the patch passed
+1 💚 shadedclient 27m 59s patch has no errors when building and testing our client artifacts.
_ Other Tests _
+1 💚 unit 194m 35s hadoop-hdfs in the patch passed.
+1 💚 asflicense 0m 39s The patch does not generate ASF License warnings.
302m 11s
Subsystem Report/Notes
Docker ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5937/2/artifact/out/Dockerfile
GITHUB PR #5937
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets
uname Linux e3965eccf0d3 4.15.0-212-generic #223-Ubuntu SMP Tue May 23 13:09:22 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / 35eb681
Default Java Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.20+8-post-Ubuntu-1ubuntu120.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5937/2/testReport/
Max. process+thread count 3238 (vs. ulimit of 5500)
modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5937/2/console
versions git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

Copy link
Contributor

@Hexiaoqiao Hexiaoqiao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great catch! leave one nit comment inline.

NameNode.stateChangeLog.warn("BLOCK* internalReleaseLease: "
+ "Removed empty last block and closed file " + src);
} else {
NameNode.stateChangeLog.warn("BLOCK* internalReleaseLease: "
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A little weird that if we can determine it is EC file here when uc.getNumExpectedLocations() != 0, if not it will be ambiguity of this log print.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

uc.getNumExpectedLocations() != 0 means minLocationsNum != 1, so it must be a EC file according to line 3806-3809.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Totally true, but poor readability. Any other way to improve it, such as lastBlock.isStriped() or others?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If uc.getNumExpectedLocations() is 0, regardless of whether it is a striped block or not, I think we should all consider it to be an empty block, not unrecoverable.

How about adding some comments here?

@zhangshuyan0
Copy link
Contributor Author

@Hexiaoqiao @haiyang1987 Some comments have been added. Please take a check when you have free time.

Copy link
Contributor

@Hexiaoqiao Hexiaoqiao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. +1.
@haiyang1987 @hfutatzhanghb Please let me know if any furthermore comments.

@hfutatzhanghb
Copy link
Contributor

LGTM. +1.

@hadoop-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 28s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 codespell 0m 1s codespell was not available.
+0 🆗 detsecrets 0m 1s detect-secrets was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
-1 ❌ test4tests 0m 0s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch.
_ trunk Compile Tests _
+1 💚 mvninstall 32m 56s trunk passed
+1 💚 compile 0m 54s trunk passed with JDK Ubuntu-11.0.20+8-post-Ubuntu-1ubuntu120.04
+1 💚 compile 0m 50s trunk passed with JDK Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05
+1 💚 checkstyle 0m 44s trunk passed
+1 💚 mvnsite 0m 55s trunk passed
+1 💚 javadoc 0m 51s trunk passed with JDK Ubuntu-11.0.20+8-post-Ubuntu-1ubuntu120.04
+1 💚 javadoc 1m 12s trunk passed with JDK Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05
+1 💚 spotbugs 2m 0s trunk passed
+1 💚 shadedclient 22m 11s branch has no errors when building and testing our client artifacts.
_ Patch Compile Tests _
+1 💚 mvninstall 0m 47s the patch passed
+1 💚 compile 0m 48s the patch passed with JDK Ubuntu-11.0.20+8-post-Ubuntu-1ubuntu120.04
+1 💚 javac 0m 48s the patch passed
+1 💚 compile 0m 43s the patch passed with JDK Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05
+1 💚 javac 0m 43s the patch passed
+1 💚 blanks 0m 0s The patch has no blanks issues.
-0 ⚠️ checkstyle 0m 34s /results-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt hadoop-hdfs-project/hadoop-hdfs: The patch generated 1 new + 110 unchanged - 0 fixed = 111 total (was 110)
+1 💚 mvnsite 0m 45s the patch passed
+1 💚 javadoc 0m 40s the patch passed with JDK Ubuntu-11.0.20+8-post-Ubuntu-1ubuntu120.04
+1 💚 javadoc 1m 4s the patch passed with JDK Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05
+1 💚 spotbugs 1m 55s the patch passed
+1 💚 shadedclient 22m 18s patch has no errors when building and testing our client artifacts.
_ Other Tests _
-1 ❌ unit 183m 51s /patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt hadoop-hdfs in the patch passed.
+1 💚 asflicense 0m 39s The patch does not generate ASF License warnings.
278m 3s
Reason Tests
Failed junit tests hadoop.hdfs.server.blockmanagement.TestBlockTokenWithShortCircuitRead
hadoop.hdfs.TestFileChecksum
Subsystem Report/Notes
Docker ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5937/3/artifact/out/Dockerfile
GITHUB PR #5937
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets
uname Linux 41e15ee1ded0 4.15.0-212-generic #223-Ubuntu SMP Tue May 23 13:09:22 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / b0ce7e3
Default Java Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.20+8-post-Ubuntu-1ubuntu120.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5937/3/testReport/
Max. process+thread count 3372 (vs. ulimit of 5500)
modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5937/3/console
versions git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

@haiyang1987
Copy link
Contributor

LGTM. +1.

@Hexiaoqiao Hexiaoqiao merged commit 655c3df into apache:trunk Aug 15, 2023
@Hexiaoqiao
Copy link
Contributor

Committed to trunk. Thanks @zhangshuyan0 for your works. And @haiyang1987 , @hfutatzhanghb for your reviews!

tasanuma pushed a commit that referenced this pull request Dec 28, 2023
…buted by Shuyan Zhang.

Reviewed-by: hfutatzhanghb <1036798979@qq.com>
Reviewed-by: Haiyang Hu <haiyang.hu@shopee.com>
Signed-off-by: He Xiaoqiao <hexiaoqiao@apache.org>
(cherry picked from commit 655c3df)
jiajunmao pushed a commit to jiajunmao/hadoop-MLEC that referenced this pull request Feb 6, 2024
…Contributed by Shuyan Zhang.

Reviewed-by: hfutatzhanghb <1036798979@qq.com>
Reviewed-by: Haiyang Hu <haiyang.hu@shopee.com>
Signed-off-by: He Xiaoqiao <hexiaoqiao@apache.org>
LiuGuH pushed a commit to LiuGuH/hadoop that referenced this pull request Mar 26, 2024
…Contributed by Shuyan Zhang.

Reviewed-by: hfutatzhanghb <1036798979@qq.com>
Reviewed-by: Haiyang Hu <haiyang.hu@shopee.com>
Signed-off-by: He Xiaoqiao <hexiaoqiao@apache.org>
symious pushed a commit to symious/hadoop that referenced this pull request Nov 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants