Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HADOOP-19518. ABFS: [FNSOverBlob] WASB to ABFS Migration Config Support Script #7564

Open
wants to merge 11 commits into
base: trunk
Choose a base branch
from

Conversation

manika137
Copy link
Contributor

Description of PR

The legacy WASB driver has been deprecated and is no longer recommended for use. To support customer onboard for migration from WASB to ABFS driver, we've introduced a script to help with the configuration changes required for the same.

The script requires the configuration file (in XML format) used for WASB and would generate configuration file required for ABFS driver respectively.

JIRA ticket: https://issues.apache.org/jira/browse/HADOOP-19518

How was this patch tested?

No production code change, no testing needed.

manika137 and others added 8 commits March 12, 2025 02:40
@hadoop-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 53s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 1s No case conflicting files found.
+0 🆗 codespell 0m 1s codespell was not available.
+0 🆗 detsecrets 0m 1s detect-secrets was not available.
+0 🆗 shelldocs 0m 1s Shelldocs was not available.
+0 🆗 xmllint 0m 1s xmllint was not available.
+0 🆗 markdownlint 0m 1s markdownlint was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
-1 ❌ test4tests 0m 0s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch.
_ trunk Compile Tests _
+1 💚 mvninstall 40m 34s trunk passed
+1 💚 mvnsite 0m 41s trunk passed
+1 💚 shadedclient 37m 43s branch has no errors when building and testing our client artifacts.
_ Patch Compile Tests _
+1 💚 mvninstall 0m 30s the patch passed
-1 ❌ blanks 0m 0s /blanks-eol.txt The patch has 1 line(s) that end in blanks. Use git apply --whitespace=fix <<patch_file>>. Refer https://git-scm.com/docs/git-apply
+1 💚 mvnsite 0m 31s the patch passed
+1 💚 shellcheck 0m 0s No new issues.
+1 💚 shadedclient 38m 16s patch has no errors when building and testing our client artifacts.
_ Other Tests _
+1 💚 unit 0m 33s hadoop-azure in the patch passed.
-1 ❌ asflicense 0m 37s /results-asflicense.txt The patch generated 2 ASF License warnings.
122m 36s
Subsystem Report/Notes
Docker ClientAPI=1.48 ServerAPI=1.48 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-7564/1/artifact/out/Dockerfile
GITHUB PR #7564
Optional Tests dupname asflicense mvnsite unit codespell detsecrets shellcheck shelldocs xmllint markdownlint
uname Linux 88dd9936b8a7 5.15.0-130-generic #140-Ubuntu SMP Wed Dec 18 17:59:53 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / 3a1a2eb
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-7564/1/testReport/
Max. process+thread count 527 (vs. ulimit of 5500)
modules C: hadoop-tools/hadoop-azure U: hadoop-tools/hadoop-azure
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-7564/1/console
versions git=2.25.1 maven=3.6.3 shellcheck=0.7.0
Powered by Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

Copy link
Contributor

@anujmodi2021 anujmodi2021 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added some thoughts

## Introduction

ABFS driver has now built support for
FNS accounts (over BlobEndpoint that WASB Driver uses) using the ABFS scheme.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adda link here to fns_blob.md file

ABFS driver has now built support for
FNS accounts (over BlobEndpoint that WASB Driver uses) using the ABFS scheme.

The legacy WASB driver has been **deprecated** and is no longer recommended for
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a link to deprecated_wasb.md here.
May be first documentation PR needs to be merged

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, will do

contactTeamMsg="For any queries or support, kindly reach out to us at '[email protected]'."
endpoint=".dfs."
printf "Select 'HNS' if you're migrating to ABFS driver with Hierarchical Namespace enabled account,
or 'Non-HNS' if you're migrating with Non-Hierarchical Namespace (FNS) account. \n"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Statement construction should be same for both the options.
Select 'HNS' if you're migrating to ABFS driver for Hierarchical Namespace enabled account, or 'Non-HNS' if you're migrating to ABFS driver for Non-Hierarchical Namespace (FNS) account. \n

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Taken

# Stop the script if any unsupported config is found
for key in "${unsupported_configs_list[@]}"; do
if grep -q "$key" "$OUTPUT_FILE"; then
echo "FAILURE: Remove the following configuration from file and rerun: '$key'"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tell user why to remove?
"Unsupported Config found"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, added

)

# Configurations not required in ABFS Driver and can be removed
obsolete_configs_list=(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where is the real mapping for supported configs defined?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The configs supported and present in both ABFS, WASB remain as it is (except for endpoint change if required)
Script only makes changes for renames, deleting the obsoletes and error for unsupported ones

# Remove the property block if any name tag is empty
xmlstarlet ed -L -d "//property[not(name) or name='']" "$OUTPUT_FILE"

echo "Updated file: $OUTPUT_FILE"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this script tested?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, tested with a sample config file to test the correct working and that the config values stay intact

@manika137 manika137 force-pushed the HADOOP-19518_wasbScript branch from 093e7fb to 82c7c02 Compare April 4, 2025 08:25
@@ -0,0 +1,157 @@
#!/usr/bin/env bash
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better to keep file name in camel case or all characters in small case just like we have for other test scripts.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Taken

exit 1
fi

if [[ "$1" != *.xml ]]; then
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we have already assigned $1 to FILE, it would be better to use $FILE. if [[ "$FILE" != *.xml ]]; then

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Taken

done

# Mapping for renaming configurations
declare -A rename_configs_map=(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

variable naming should be as per java (use camel casing instead of snake casing)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right. Corrected

@hadoop-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 50s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 codespell 0m 0s codespell was not available.
+0 🆗 detsecrets 0m 0s detect-secrets was not available.
+0 🆗 shelldocs 0m 0s Shelldocs was not available.
+0 🆗 markdownlint 0m 0s markdownlint was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
-1 ❌ test4tests 0m 0s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch.
_ trunk Compile Tests _
+1 💚 mvninstall 40m 28s trunk passed
+1 💚 mvnsite 0m 43s trunk passed
+1 💚 shadedclient 38m 34s branch has no errors when building and testing our client artifacts.
-0 ⚠️ patch 38m 55s Used diff version of patch file. Binary files and potentially other changes not applied. Please rebase and squash commits if necessary.
_ Patch Compile Tests _
+1 💚 mvninstall 0m 30s the patch passed
-1 ❌ blanks 0m 0s /blanks-eol.txt The patch has 1 line(s) that end in blanks. Use git apply --whitespace=fix <<patch_file>>. Refer https://git-scm.com/docs/git-apply
+1 💚 mvnsite 0m 31s the patch passed
+1 💚 shellcheck 0m 0s No new issues.
+1 💚 shadedclient 38m 42s patch has no errors when building and testing our client artifacts.
_ Other Tests _
+1 💚 unit 0m 31s hadoop-azure in the patch passed.
+1 💚 asflicense 0m 36s The patch does not generate ASF License warnings.
123m 26s
Subsystem Report/Notes
Docker ClientAPI=1.48 ServerAPI=1.48 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-7564/2/artifact/out/Dockerfile
GITHUB PR #7564
Optional Tests dupname asflicense mvnsite unit codespell detsecrets shellcheck shelldocs markdownlint
uname Linux 15b3540e9b8b 5.15.0-130-generic #140-Ubuntu SMP Wed Dec 18 17:59:53 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / 093e7fb
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-7564/2/testReport/
Max. process+thread count 578 (vs. ulimit of 5500)
modules C: hadoop-tools/hadoop-azure U: hadoop-tools/hadoop-azure
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-7564/2/console
versions git=2.25.1 maven=3.6.3 shellcheck=0.7.0
Powered by Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

@hadoop-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 55s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 codespell 0m 0s codespell was not available.
+0 🆗 detsecrets 0m 0s detect-secrets was not available.
+0 🆗 shelldocs 0m 0s Shelldocs was not available.
+0 🆗 markdownlint 0m 0s markdownlint was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
-1 ❌ test4tests 0m 0s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch.
_ trunk Compile Tests _
+1 💚 mvninstall 40m 30s trunk passed
+1 💚 mvnsite 0m 42s trunk passed
+1 💚 shadedclient 38m 6s branch has no errors when building and testing our client artifacts.
-0 ⚠️ patch 38m 27s Used diff version of patch file. Binary files and potentially other changes not applied. Please rebase and squash commits if necessary.
_ Patch Compile Tests _
+1 💚 mvninstall 0m 30s the patch passed
+1 💚 blanks 0m 0s The patch has no blanks issues.
+1 💚 mvnsite 0m 31s the patch passed
+1 💚 shellcheck 0m 0s No new issues.
+1 💚 shadedclient 37m 41s patch has no errors when building and testing our client artifacts.
_ Other Tests _
+1 💚 unit 0m 33s hadoop-azure in the patch passed.
+1 💚 asflicense 0m 37s The patch does not generate ASF License warnings.
122m 5s
Subsystem Report/Notes
Docker ClientAPI=1.48 ServerAPI=1.48 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-7564/3/artifact/out/Dockerfile
GITHUB PR #7564
Optional Tests dupname asflicense mvnsite unit codespell detsecrets shellcheck shelldocs markdownlint
uname Linux 4d293b198519 5.15.0-130-generic #140-Ubuntu SMP Wed Dec 18 17:59:53 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / 82c7c02
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-7564/3/testReport/
Max. process+thread count 600 (vs. ulimit of 5500)
modules C: hadoop-tools/hadoop-azure U: hadoop-tools/hadoop-azure
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-7564/3/console
versions git=2.25.1 maven=3.6.3 shellcheck=0.7.0
Powered by Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

@hadoop-yetus
Copy link

🎊 +1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 22m 17s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 codespell 0m 0s codespell was not available.
+0 🆗 detsecrets 0m 0s detect-secrets was not available.
+0 🆗 shelldocs 0m 0s Shelldocs was not available.
+0 🆗 markdownlint 0m 0s markdownlint was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 test4tests 0m 0s The patch appears to include 1 new or modified test files.
_ trunk Compile Tests _
+1 💚 mvninstall 40m 46s trunk passed
+1 💚 compile 0m 42s trunk passed with JDK Ubuntu-11.0.26+4-post-Ubuntu-1ubuntu120.04
+1 💚 compile 0m 37s trunk passed with JDK Private Build-1.8.0_442-8u442-b06us1-0ubuntu120.04-b06
+1 💚 checkstyle 0m 32s trunk passed
+1 💚 mvnsite 0m 41s trunk passed
+1 💚 javadoc 0m 41s trunk passed with JDK Ubuntu-11.0.26+4-post-Ubuntu-1ubuntu120.04
+1 💚 javadoc 0m 33s trunk passed with JDK Private Build-1.8.0_442-8u442-b06us1-0ubuntu120.04-b06
+1 💚 spotbugs 1m 8s trunk passed
+1 💚 shadedclient 38m 49s branch has no errors when building and testing our client artifacts.
-0 ⚠️ patch 39m 10s Used diff version of patch file. Binary files and potentially other changes not applied. Please rebase and squash commits if necessary.
_ Patch Compile Tests _
+1 💚 mvninstall 0m 30s the patch passed
+1 💚 compile 0m 33s the patch passed with JDK Ubuntu-11.0.26+4-post-Ubuntu-1ubuntu120.04
+1 💚 javac 0m 33s the patch passed
+1 💚 compile 0m 29s the patch passed with JDK Private Build-1.8.0_442-8u442-b06us1-0ubuntu120.04-b06
+1 💚 javac 0m 29s the patch passed
+1 💚 blanks 0m 0s The patch has no blanks issues.
+1 💚 checkstyle 0m 20s the patch passed
+1 💚 mvnsite 0m 31s the patch passed
+1 💚 shellcheck 0m 0s No new issues.
+1 💚 javadoc 0m 28s the patch passed with JDK Ubuntu-11.0.26+4-post-Ubuntu-1ubuntu120.04
+1 💚 javadoc 0m 26s the patch passed with JDK Private Build-1.8.0_442-8u442-b06us1-0ubuntu120.04-b06
+1 💚 spotbugs 1m 8s the patch passed
+1 💚 shadedclient 39m 14s patch has no errors when building and testing our client artifacts.
_ Other Tests _
+1 💚 unit 4m 6s hadoop-azure in the patch passed.
+1 💚 asflicense 0m 38s The patch does not generate ASF License warnings.
156m 30s
Subsystem Report/Notes
Docker ClientAPI=1.48 ServerAPI=1.48 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-7564/4/artifact/out/Dockerfile
GITHUB PR #7564
Optional Tests dupname asflicense mvnsite unit codespell detsecrets shellcheck shelldocs markdownlint compile javac javadoc mvninstall shadedclient spotbugs checkstyle
uname Linux 12b6ad4a1490 5.15.0-130-generic #140-Ubuntu SMP Wed Dec 18 17:59:53 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / 8340b0a
Default Java Private Build-1.8.0_442-8u442-b06us1-0ubuntu120.04-b06
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.26+4-post-Ubuntu-1ubuntu120.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_442-8u442-b06us1-0ubuntu120.04-b06
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-7564/4/testReport/
Max. process+thread count 581 (vs. ulimit of 5500)
modules C: hadoop-tools/hadoop-azure U: hadoop-tools/hadoop-azure
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-7564/4/console
versions git=2.25.1 maven=3.6.3 spotbugs=4.2.2 shellcheck=0.7.0
Powered by Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

Copy link
Contributor

@anujmodi2021 anujmodi2021 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

do
case $namespaceType in
HNS)
xmlstarlet ed -L -i '//configuration/property[1]' -t elem -n property -v '' \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if ! command -v xmlstarlet &> /dev/null; then
echo "Error: 'xmlstarlet' is not installed. Please install it to run this script."
exit 1
fi
nit : If xmlstarlet is not installed, we should manually throw error

)

# Configurations not required in ABFS Driver and can be removed
obseleteConfigsList=(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: obsoleteConfigsList


FILE=$1

if [ ! -f "$FILE" ]; then
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One overall suggestion would be to break down the different operations into functions, as we have lot going into the script. Some suggestion would be like
check_dependencies() { ... }
validate_input_file() { ... }
prompt_namespace_type() { ... }
rename_configurations() { ... }
remove_obsolete_configs() { ... }
handle_defaultFS_endpoint() { ... }

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants