Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HADOOP-18073. Upgrade to AWS SDK V2. #5684

Closed
wants to merge 12 commits into from

Conversation

ahmarsuhail
Copy link
Contributor

@ahmarsuhail ahmarsuhail commented May 22, 2023

Description of PR

JIRA: https://issues.apache.org/jira/browse/HADOOP-18073

This PR upgrades S3A to use AWS Java SDK V2. Changes made are detailed in aws_sdk_v2_changelog.md.

Current known gaps/issues:

  • No client side encryption. JIRA: https://issues.apache.org/jira/browse/HADOOP-18708. The S3Encryption client currently wraps some exceptions in java.util.concurrent.CompletionException which impacts S3A's exception handling an translation. Ideally, the S3 encryption client should catch the completion exception, and instead throw the underlying exception. This issue has been raised with the S3 encryption client team. Opened issue: Exceptions wrapped in CompletionException aws/amazon-s3-encryption-client-java#160
  • No SigV2 signing. SDK V2 does not support sigV2, however S3A supports custom signers and so a sigV2 signer can still be configured. JIRA: https://issues.apache.org/jira/browse/HADOOP-18747
  • Regression in rename() performance. The transfer manager when used with the Java async client currently does not implement multipart copy, which causes a regression in rename() performance for files > multipart threshold. This has been raised with the SDK team and a follow up PR to fix this issue will be created once a patch becomes available.

How was this patch tested?

Tested in eu-west-1 by running mvn -Dparallel-tests -DtestsThreadCount=16 clean verify.

Known failures:

[ERROR]   ITestS3ABlockOutputArray.testDiskBlockCreate:114 » IO File name too long
[ERROR]   ITestS3ABlockOutputByteBuffer>ITestS3ABlockOutputArray.testDiskBlockCreate:114 » IO
[ERROR]   ITestS3ABlockOutputDisk>ITestS3ABlockOutputArray.testDiskBlockCreate:114 » IO ...

Jira created: https://issues.apache.org/jira/browse/HADOOP-18744

Other failures:

 ITestS3AEndpointRegion.testWithoutRegionConfig:80 [Region is not configured, region probe should have been made] expected:<[1]L> but was:<[0]L>
 ITestS3SelectLandsat.testSelectSeekFullLandsat:419->AbstractS3SelectTest.seek:711 » AWSClientIO

Investigating the above and will open a PR to fix.

ahmarsuhail and others added 12 commits May 17, 2023 10:17
See aws_sdk_v2_changelog.md for details.

Co-authored-by: Ahmar Suhail <ahmarsu@amazon.co.uk>
Co-authored-by: Alessandro Passaro <alexpax@amazon.co.uk>
addresses review comments + yetus errors

Co-authored-by: Ahmar Suhail <ahmarsu@amazon.co.uk>
…5421)

Changes include
* use bundled transfer manager
* adds transfer listener to upload
* adds support for custom signers
* don't set default endpoint
* removes v1 sdk bundle, only use core package
* implements region caching
+ many more

Note: spotbugs is warning about inconsistent
synchronization in accessing a new s3a FS field.
This will be fixed in a follow-up patch.

Contributed by Ahmar Suhail
@hadoop-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 37s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 5s No case conflicting files found.
+0 🆗 codespell 0m 1s codespell was not available.
+0 🆗 detsecrets 0m 1s detect-secrets was not available.
+0 🆗 xmllint 0m 1s xmllint was not available.
+0 🆗 markdownlint 0m 1s markdownlint was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 test4tests 0m 0s The patch appears to include 79 new or modified test files.
_ trunk Compile Tests _
+0 🆗 mvndep 15m 59s Maven dependency ordering for branch
+1 💚 mvninstall 19m 45s trunk passed
+1 💚 compile 15m 41s trunk passed with JDK Ubuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1
+1 💚 compile 14m 33s trunk passed with JDK Private Build-1.8.0_362-8u372-gaus1-0ubuntu120.04-b09
+1 💚 checkstyle 3m 46s trunk passed
+1 💚 mvnsite 3m 8s trunk passed
+1 💚 javadoc 2m 34s trunk passed with JDK Ubuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1
+1 💚 javadoc 2m 17s trunk passed with JDK Private Build-1.8.0_362-8u372-gaus1-0ubuntu120.04-b09
+0 🆗 spotbugs 0m 54s branch/hadoop-project no spotbugs output file (spotbugsXml.xml)
+1 💚 shadedclient 20m 56s branch has no errors when building and testing our client artifacts.
-0 ⚠️ patch 21m 20s Used diff version of patch file. Binary files and potentially other changes not applied. Please rebase and squash commits if necessary.
_ Patch Compile Tests _
+0 🆗 mvndep 1m 4s Maven dependency ordering for patch
+1 💚 mvninstall 1m 34s the patch passed
+1 💚 compile 15m 8s the patch passed with JDK Ubuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1
+1 💚 javac 15m 8s the patch passed
+1 💚 compile 14m 17s the patch passed with JDK Private Build-1.8.0_362-8u372-gaus1-0ubuntu120.04-b09
+1 💚 javac 14m 17s the patch passed
+1 💚 blanks 0m 0s The patch has no blanks issues.
-0 ⚠️ checkstyle 3m 44s /results-checkstyle-root.txt root: The patch generated 17 new + 73 unchanged - 7 fixed = 90 total (was 80)
+1 💚 mvnsite 3m 9s the patch passed
+1 💚 javadoc 2m 26s the patch passed with JDK Ubuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1
-1 ❌ javadoc 0m 46s /results-javadoc-javadoc-hadoop-tools_hadoop-aws-jdkPrivateBuild-1.8.0_362-8u372-gaus1-0ubuntu120.04-b09.txt hadoop-tools_hadoop-aws-jdkPrivateBuild-1.8.0_362-8u372-gaus1-0ubuntu120.04-b09 with JDK Private Build-1.8.0_362-8u372-gaus1-0ubuntu120.04-b09 generated 1 new + 0 unchanged - 0 fixed = 1 total (was 0)
+0 🆗 spotbugs 0m 35s hadoop-project has no data from spotbugs
-1 ❌ spotbugs 1m 27s /new-spotbugs-hadoop-tools_hadoop-aws.html hadoop-tools/hadoop-aws generated 1 new + 0 unchanged - 0 fixed = 1 total (was 0)
+1 💚 shadedclient 21m 18s patch has no errors when building and testing our client artifacts.
_ Other Tests _
+1 💚 unit 0m 38s hadoop-project in the patch passed.
+1 💚 unit 18m 20s hadoop-common in the patch passed.
+1 💚 unit 3m 3s hadoop-aws in the patch passed.
+1 💚 asflicense 1m 2s The patch does not generate ASF License warnings.
202m 46s
Reason Tests
SpotBugs module:hadoop-tools/hadoop-aws
Inconsistent synchronization of org.apache.hadoop.fs.s3a.S3AFileSystem.s3AsyncClient; locked 60% of time Unsynchronized access at S3AFileSystem.java:60% of time Unsynchronized access at S3AFileSystem.java:[line 1764]
Subsystem Report/Notes
Docker ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5684/1/artifact/out/Dockerfile
GITHUB PR #5684
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets xmllint markdownlint
uname Linux e7f078b83d89 4.15.0-206-generic #217-Ubuntu SMP Fri Feb 3 19:10:13 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / 75220b7
Default Java Private Build-1.8.0_362-8u372-gaus1-0ubuntu120.04-b09
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_362-8u372-gaus1-0ubuntu120.04-b09
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5684/1/testReport/
Max. process+thread count 1287 (vs. ulimit of 5500)
modules C: hadoop-project hadoop-common-project/hadoop-common hadoop-tools/hadoop-aws U: .
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5684/1/console
versions git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants