Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HADOOP-19531. [ABFS][FnsOverBlob] Streaming List Path Result Should Happen Inside Retry Loop #7582

Merged
merged 11 commits into from
Apr 8, 2025

Conversation

anujmodi2021
Copy link
Contributor

Description of PR

Listing APIs on both DFS and Blob Endpoints return response as part of response body and has to be read from an Socket Input Stream.
Any network error occuring while reading the stream should be retried.

Today, this parsing happens in client and such errors are not retried.
This change fixes this behavior

How was this patch tested?

Added a few more tests and test suite ran for validation

@hadoop-yetus

This comment was marked as outdated.

LOG.error("Unable to deserialize list results for Uri {}", uri.toString(), ex);
throw new AbfsDriverException(ERR_DFS_LIST_PARSING, ex);
LOG.error("Unable to deserialize list results for Uri {}", uri != null ? uri.toString(): "NULL", ex);
throw new AbfsDriverException(ex);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

throw new AbfsDriverException(ERR_DFS_LIST_PARSING, ex); add the message here as well

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Taken

throw new AbfsDriverException(ERR_BLOB_LIST_PARSING, e);
} catch (AbfsDriverException ex) {
// Throw as it is to avoid multiple wrapping.
LOG.error("Unable to deserialize list results for Uri {}", uri != null ? uri.toString(): "NULL", ex);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

uri should not be null if we are reaching till this part of the code

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

uri can be null for the cases where this API is called by internal operations like in create flow.
If list call is coming from user, uri won't be null.

@@ -500,6 +505,20 @@ private void parseBlockListResponse(final InputStream stream) throws IOException
blockIdList = client.parseBlockListResponse(stream);
}

private void parseListPathResponse(final InputStream stream) throws IOException {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add javadocs

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Taken

@hadoop-yetus

This comment was marked as outdated.

@@ -500,6 +505,20 @@ private void parseBlockListResponse(final InputStream stream) throws IOException
blockIdList = client.parseBlockListResponse(stream);
}

private void parseListPathResponse(final InputStream stream) throws IOException {
if (stream == null || blockIdList != null) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how is blockIdList check relevant here ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was buggy, fixed it

@hadoop-yetus

This comment was marked as outdated.

@hadoop-yetus

This comment was marked as outdated.

throw new AbfsDriverException(e);
return filterDuplicateEntriesAndRenamePendingFiles(listResultSchema, uri);
} catch (SAXException | IOException ex) {
throw new AbfsDriverException(ERR_BLOB_LIST_PARSING, ex);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NIT: We can add URI in the error message for better visibility.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

URI is already in log messages.

if (stream == null) {
return null;
}
try (InputStream stream = new ByteArrayInputStream(result.getListResultData())) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will there be any issue here in case result.getListResultData() is null?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it can lead to NPE. Have modified to catch all the exceptions and wrap it around AbfsDriverException and throw back to user

@@ -438,6 +438,11 @@ final void parseResponse(final byte[] buffer,
method, getMaskedUrl(), ex.getMessage());
log.debug("IO Error: ", ex);
throw ex;
} catch (Exception ex) {
log.warn("Unexpected error: {} {}: {}",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need both log.warn and log.debug here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We were doing same for other type of exceptions.

@@ -67,6 +68,7 @@ public ITestApacheClientConnectionPool() throws Exception {
public void testKacIsClosed() throws Throwable {
Configuration configuration = new Configuration(getRawConfiguration());
configuration.set(FS_AZURE_NETWORKING_LIBRARY, APACHE_HTTP_CLIENT.name());
configuration.unset(FS_AZURE_METRIC_FORMAT);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we need to unset this config here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test was failing when this config was present in test xml files

@hadoop-yetus

This comment was marked as outdated.

Mockito.doReturn(spiedStore).when(spiedFs).getAbfsStore();
Mockito.doReturn(spiedClient).when(spiedStore).getClient();

Mockito.doThrow(new SocketException(CONNECTION_RESET_MESSAGE)).when(spiedClient).filterDuplicateEntriesAndRenamePendingFiles(any(), any());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we reduce the retry count and verify that request was retried ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This particular test is to make sure retry does not happen as here exception is injected in parsing list result from local stream.

I have modified the test testListPathTracingContext to assert that exception thrown within httpOperation.processResponse() is retried.

Copy link
Contributor

@bhattmanish98 bhattmanish98 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 LGTM

@hadoop-yetus

This comment was marked as outdated.

@hadoop-yetus

This comment was marked as outdated.

@hadoop-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 34s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 codespell 0m 0s codespell was not available.
+0 🆗 detsecrets 0m 0s detect-secrets was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 test4tests 0m 0s The patch appears to include 3 new or modified test files.
_ trunk Compile Tests _
+1 💚 mvninstall 37m 54s trunk passed
+1 💚 compile 0m 39s trunk passed with JDK Ubuntu-11.0.26+4-post-Ubuntu-1ubuntu120.04
+1 💚 compile 0m 37s trunk passed with JDK Private Build-1.8.0_442-8u442-b06us1-0ubuntu120.04-b06
+1 💚 checkstyle 0m 34s trunk passed
+1 💚 mvnsite 0m 41s trunk passed
+1 💚 javadoc 0m 40s trunk passed with JDK Ubuntu-11.0.26+4-post-Ubuntu-1ubuntu120.04
+1 💚 javadoc 0m 34s trunk passed with JDK Private Build-1.8.0_442-8u442-b06us1-0ubuntu120.04-b06
+1 💚 spotbugs 1m 9s trunk passed
+1 💚 shadedclient 36m 23s branch has no errors when building and testing our client artifacts.
_ Patch Compile Tests _
+1 💚 mvninstall 0m 30s the patch passed
+1 💚 compile 0m 33s the patch passed with JDK Ubuntu-11.0.26+4-post-Ubuntu-1ubuntu120.04
+1 💚 javac 0m 33s the patch passed
+1 💚 compile 0m 29s the patch passed with JDK Private Build-1.8.0_442-8u442-b06us1-0ubuntu120.04-b06
+1 💚 javac 0m 29s the patch passed
+1 💚 blanks 0m 0s The patch has no blanks issues.
+1 💚 checkstyle 0m 21s the patch passed
+1 💚 mvnsite 0m 33s the patch passed
+1 💚 javadoc 0m 29s the patch passed with JDK Ubuntu-11.0.26+4-post-Ubuntu-1ubuntu120.04
+1 💚 javadoc 0m 26s the patch passed with JDK Private Build-1.8.0_442-8u442-b06us1-0ubuntu120.04-b06
-1 ❌ spotbugs 1m 12s /new-spotbugs-hadoop-tools_hadoop-azure.html hadoop-tools/hadoop-azure generated 1 new + 0 unchanged - 0 fixed = 1 total (was 0)
+1 💚 shadedclient 38m 7s patch has no errors when building and testing our client artifacts.
_ Other Tests _
+1 💚 unit 2m 49s hadoop-azure in the patch passed.
+1 💚 asflicense 0m 35s The patch does not generate ASF License warnings.
127m 9s
Reason Tests
SpotBugs module:hadoop-tools/hadoop-azure
org.apache.hadoop.fs.azurebfs.services.AbfsHttpOperation.getListResultData() may expose internal representation by returning AbfsHttpOperation.listResultData At AbfsHttpOperation.java:by returning AbfsHttpOperation.listResultData At AbfsHttpOperation.java:[line 226]
Subsystem Report/Notes
Docker ClientAPI=1.48 ServerAPI=1.48 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-7582/8/artifact/out/Dockerfile
GITHUB PR #7582
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets
uname Linux 1199960dd5fa 5.15.0-131-generic #141-Ubuntu SMP Fri Jan 10 21:18:28 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / 5b9dfa5
Default Java Private Build-1.8.0_442-8u442-b06us1-0ubuntu120.04-b06
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.26+4-post-Ubuntu-1ubuntu120.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_442-8u442-b06us1-0ubuntu120.04-b06
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-7582/8/testReport/
Max. process+thread count 737 (vs. ulimit of 5500)
modules C: hadoop-tools/hadoop-azure U: hadoop-tools/hadoop-azure
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-7582/8/console
versions git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

@anujmodi2021
Copy link
Contributor Author

============================================================
HNS-OAuth-DFS

[WARNING] Tests run: 174, Failures: 0, Errors: 0, Skipped: 3
[WARNING] Tests run: 798, Failures: 0, Errors: 0, Skipped: 161
[WARNING] Tests run: 155, Failures: 0, Errors: 0, Skipped: 5
[WARNING] Tests run: 272, Failures: 0, Errors: 0, Skipped: 23

============================================================
HNS-SharedKey-DFS

[WARNING] Tests run: 174, Failures: 0, Errors: 0, Skipped: 4
[WARNING] Tests run: 801, Failures: 0, Errors: 0, Skipped: 114
[WARNING] Tests run: 155, Failures: 0, Errors: 0, Skipped: 5
[WARNING] Tests run: 272, Failures: 0, Errors: 0, Skipped: 10

============================================================
NonHNS-SharedKey-DFS

[WARNING] Tests run: 174, Failures: 0, Errors: 0, Skipped: 10
[WARNING] Tests run: 640, Failures: 0, Errors: 0, Skipped: 212
[WARNING] Tests run: 155, Failures: 0, Errors: 0, Skipped: 6
[WARNING] Tests run: 272, Failures: 0, Errors: 0, Skipped: 11

============================================================
AppendBlob-HNS-OAuth-DFS

[WARNING] Tests run: 174, Failures: 0, Errors: 0, Skipped: 3
[WARNING] Tests run: 798, Failures: 0, Errors: 0, Skipped: 168
[WARNING] Tests run: 123, Failures: 0, Errors: 0, Skipped: 6
[WARNING] Tests run: 272, Failures: 0, Errors: 0, Skipped: 23

============================================================
NonHNS-SharedKey-Blob

[WARNING] Tests run: 174, Failures: 0, Errors: 0, Skipped: 10
[WARNING] Tests run: 643, Failures: 0, Errors: 0, Skipped: 141
[WARNING] Tests run: 146, Failures: 0, Errors: 0, Skipped: 3
[WARNING] Tests run: 272, Failures: 0, Errors: 0, Skipped: 11

============================================================
NonHNS-OAuth-DFS

[WARNING] Tests run: 174, Failures: 0, Errors: 0, Skipped: 10
[WARNING] Tests run: 637, Failures: 0, Errors: 0, Skipped: 214
[WARNING] Tests run: 155, Failures: 0, Errors: 0, Skipped: 6
[WARNING] Tests run: 272, Failures: 0, Errors: 0, Skipped: 24

============================================================
NonHNS-OAuth-Blob

[WARNING] Tests run: 174, Failures: 0, Errors: 0, Skipped: 10
[WARNING] Tests run: 640, Failures: 0, Errors: 0, Skipped: 143
[WARNING] Tests run: 155, Failures: 0, Errors: 0, Skipped: 3
[WARNING] Tests run: 272, Failures: 0, Errors: 0, Skipped: 24

============================================================
AppendBlob-NonHNS-OAuth-Blob

[WARNING] Tests run: 174, Failures: 0, Errors: 0, Skipped: 10
[WARNING] Tests run: 638, Failures: 0, Errors: 0, Skipped: 161
[WARNING] Tests run: 132, Failures: 0, Errors: 0, Skipped: 4
[WARNING] Tests run: 272, Failures: 0, Errors: 0, Skipped: 24

============================================================
HNS-Oauth-DFS-IngressBlob

[WARNING] Tests run: 174, Failures: 0, Errors: 0, Skipped: 3
[WARNING] Tests run: 672, Failures: 0, Errors: 0, Skipped: 164
[WARNING] Tests run: 155, Failures: 0, Errors: 0, Skipped: 5
[WARNING] Tests run: 272, Failures: 0, Errors: 0, Skipped: 23

============================================================
NonHNS-OAuth-DFS-IngressBlob

[WARNING] Tests run: 174, Failures: 0, Errors: 0, Skipped: 10
[WARNING] Tests run: 637, Failures: 0, Errors: 0, Skipped: 212
[WARNING] Tests run: 146, Failures: 0, Errors: 0, Skipped: 6
[WARNING] Tests run: 272, Failures: 0, Errors: 0, Skipped: 24

@hadoop-yetus
Copy link

🎊 +1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 50s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 1s No case conflicting files found.
+0 🆗 codespell 0m 0s codespell was not available.
+0 🆗 detsecrets 0m 0s detect-secrets was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 test4tests 0m 0s The patch appears to include 3 new or modified test files.
_ trunk Compile Tests _
+1 💚 mvninstall 41m 48s trunk passed
+1 💚 compile 0m 41s trunk passed with JDK Ubuntu-11.0.26+4-post-Ubuntu-1ubuntu120.04
+1 💚 compile 0m 36s trunk passed with JDK Private Build-1.8.0_442-8u442-b06us1-0ubuntu120.04-b06
+1 💚 checkstyle 0m 34s trunk passed
+1 💚 mvnsite 0m 41s trunk passed
+1 💚 javadoc 0m 41s trunk passed with JDK Ubuntu-11.0.26+4-post-Ubuntu-1ubuntu120.04
+1 💚 javadoc 0m 34s trunk passed with JDK Private Build-1.8.0_442-8u442-b06us1-0ubuntu120.04-b06
+1 💚 spotbugs 1m 9s trunk passed
+1 💚 shadedclient 38m 33s branch has no errors when building and testing our client artifacts.
_ Patch Compile Tests _
+1 💚 mvninstall 0m 30s the patch passed
+1 💚 compile 0m 35s the patch passed with JDK Ubuntu-11.0.26+4-post-Ubuntu-1ubuntu120.04
+1 💚 javac 0m 35s the patch passed
+1 💚 compile 0m 28s the patch passed with JDK Private Build-1.8.0_442-8u442-b06us1-0ubuntu120.04-b06
+1 💚 javac 0m 28s the patch passed
+1 💚 blanks 0m 0s The patch has no blanks issues.
+1 💚 checkstyle 0m 21s the patch passed
+1 💚 mvnsite 0m 32s the patch passed
+1 💚 javadoc 0m 28s the patch passed with JDK Ubuntu-11.0.26+4-post-Ubuntu-1ubuntu120.04
+1 💚 javadoc 0m 25s the patch passed with JDK Private Build-1.8.0_442-8u442-b06us1-0ubuntu120.04-b06
+1 💚 spotbugs 1m 8s the patch passed
+1 💚 shadedclient 38m 28s patch has no errors when building and testing our client artifacts.
_ Other Tests _
+1 💚 unit 2m 43s hadoop-azure in the patch passed.
+1 💚 asflicense 0m 36s The patch does not generate ASF License warnings.
133m 31s
Subsystem Report/Notes
Docker ClientAPI=1.48 ServerAPI=1.48 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-7582/9/artifact/out/Dockerfile
GITHUB PR #7582
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets
uname Linux cc84c03266e5 5.15.0-131-generic #141-Ubuntu SMP Fri Jan 10 21:18:28 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / fde1783
Default Java Private Build-1.8.0_442-8u442-b06us1-0ubuntu120.04-b06
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.26+4-post-Ubuntu-1ubuntu120.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_442-8u442-b06us1-0ubuntu120.04-b06
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-7582/9/testReport/
Max. process+thread count 587 (vs. ulimit of 5500)
modules C: hadoop-tools/hadoop-azure U: hadoop-tools/hadoop-azure
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-7582/9/console
versions git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

@anujmodi2021 anujmodi2021 merged commit 0dac3d2 into apache:trunk Apr 8, 2025
4 checks passed
anujmodi2021 added a commit to ABFSDriver/AbfsHadoop that referenced this pull request Apr 8, 2025
…appen Inside Retry Loop (apache#7582)

Contributed by Anuj Modi
Reviewed by Anmol Asrani, Manish Bhatt, Manika Joshi

Signed off by: Anuj Modi<[email protected]>
anujmodi2021 added a commit that referenced this pull request Apr 10, 2025
…appen Inside Retry Loop (#7582) (#7585)

Contributed by Anuj Modi
Reviewed by Anmol Asrani, Manish Bhatt, Manika Joshi

Signed off by: Anuj Modi<[email protected]>
@anujmodi2021 anujmodi2021 deleted the listRetry branch April 10, 2025 06:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants