Skip to content

cloud_storage_clients: check Content-Type header before parsing REST error responses in s3_client #25738

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 8 commits into from
Apr 15, 2025

Conversation

WillemKauf
Copy link
Contributor

@WillemKauf WillemKauf commented Apr 7, 2025

According to the S3 docs, REST error responses should have the Content-Type header set to application/xml. However, response like 503 Service Unavailable may not be of this assumed Content-Type.

As seen in CI (abridged output):

WARN  s3 - s3_client.cc:841 - S3 PUT request failed for key "...": Service Unavailable [Content-Type: text/plain; charset=utf-8]; ...
ERROR s3 - util.cc:209 - !!parsing error boost::wrapexcept<boost::property_tree::xml_parser::xml_parser_error> (<unspecified file>(1): expected <)
ERROR s3 - s3_client.cc:441 - !!error parse error boost::wrapexcept<boost::property_tree::xml_parser::xml_parser_error> (<unspecified file>(1): expected <)

To fix this issue, read the Content-Type header in s3_client instead of assuming xml as the Content-Type and attempting to parse it. If the Content-Type is not xml, we will return a partially filled rest_error_response.

There are a number of JIRA tickets associated with this xml_parser_error- however, there may be an actual root cause for these errors that are not fixed by simply fixing the xml parsing.

Commits TL;DR

The first 6 commits deal with our unit and fixture testing expectations, since previously we would always parse S3 REST responses as XML without checking the Content-Type header.

Now that we don't, many changes had to be made through http, cloud_storage_clients and cloud_storage tests to ensure responses from imposter services which have content encoded with XML also now have a properly set Content-Type header.

The last 2 commits are the real implementation changes for this PR.

Backports Required

  • none - not a bug fix
  • none - this is a backport
  • none - issue does not exist in previous branches
  • none - papercut/not impactful enough to backport
  • v25.1.x
  • v24.3.x
  • v24.2.x
  • v24.1.x

Release Notes

Improvements

  • Improve parsing of non-XML Content-Type REST error responses in the s3_client.

@WillemKauf WillemKauf changed the title cloud_storage_clients: check Content-Type header before parsing REST error responses cloud_storage_clients: check Content-Type header before parsing REST error responses in s3_client Apr 7, 2025
@vbotbuildovich
Copy link
Collaborator

vbotbuildovich commented Apr 7, 2025

CI test results

test results on build#64237
test_id test_kind job_url test_status passed
cloud_storage_clients_single_thread_rpunit.cloud_storage_clients_single_thread_rpunit unit https://buildkite.com/redpanda/redpanda/builds/64237#019610c6-15dd-4128-9950-093c33134c46 FAIL 0/2
cloud_storage_clients_single_thread_rpunit.cloud_storage_clients_single_thread_rpunit unit https://buildkite.com/redpanda/redpanda/builds/64237#019610c6-15df-4dd3-8e91-eb34a30d4bce FAIL 0/2
rptest.tests.cloud_storage_scrubber_test.CloudStorageScrubberTest.test_scrubber.cloud_storage_type=CloudStorageType.S3 ducktape https://buildkite.com/redpanda/redpanda/builds/64237#0196111f-e05a-47ef-967a-ea487092189e FLAKY 1/2
rptest.tests.data_migrations_api_test.DataMigrationsApiTest.test_migrated_topic_data_integrity.transfer_leadership=True.params=.cancellation.dir.in.stage.preparing.use_alias.False ducktape https://buildkite.com/redpanda/redpanda/builds/64237#0196111f-e05c-4414-9bf0-e5bf33d189f5 FLAKY 1/2
rptest.tests.scaling_up_test.ScalingUpTest.test_scaling_up_with_recovered_topic ducktape https://buildkite.com/redpanda/redpanda/builds/64237#01961123-1c8f-4f9f-bdcb-4af517d5197c FLAKY 1/2
rptest.tests.upgrade_test.UpgradeBackToBackTest.test_upgrade_with_all_workloads.single_upgrade=False ducktape https://buildkite.com/redpanda/redpanda/builds/64237#0196111f-e05b-4e61-83ee-fe4bb65310b5 FLAKY 1/4
test results on build#64258
test_id test_kind job_url test_status passed
rptest.tests.data_migrations_api_test.DataMigrationsApiTest.test_migrated_topic_data_integrity.transfer_leadership=False.params=.cancellation.dir.in.stage.preparing.use_alias.False ducktape https://buildkite.com/redpanda/redpanda/builds/64258#0196124f-84a7-43de-acff-fa6a7e9405bd FLAKY 1/2
rptest.tests.datalake.throttling_test.DatalakeThrottlingTest.test_basic_throttling.cloud_storage_type=CloudStorageType.S3.catalog_type=CatalogType.REST_HADOOP ducktape https://buildkite.com/redpanda/redpanda/builds/64258#01961252-8738-4fda-9c09-cfddebba190f FLAKY 1/2
rptest.tests.scaling_up_test.ScalingUpTest.test_scaling_up_with_recovered_topic ducktape https://buildkite.com/redpanda/redpanda/builds/64258#01961252-8738-4fda-9c09-cfddebba190f FLAKY 1/2
rptest.tests.upgrade_test.UpgradeBackToBackTest.test_upgrade_with_all_workloads.single_upgrade=False ducktape https://buildkite.com/redpanda/redpanda/builds/64258#0196124f-84a8-4a09-b42c-1a3e4ef09e9c FLAKY 1/2
test results on build#64526
test_id test_kind job_url test_status passed
datalake_cloud_rpfixture.datalake_cloud_rpfixture unit https://buildkite.com/redpanda/redpanda/builds/64526#0196371b-e47c-4ff4-9c58-d6f4b764f2c8 FLAKY 1/2
rptest.tests.partition_force_reconfiguration_test.PartitionForceReconfigurationTest.test_basic_reconfiguration.acks=-1.restart=False.controller_snapshots=True ducktape https://buildkite.com/redpanda/redpanda/builds/64526#0196375e-23dc-484a-9e18-cd72321c5ff7 FLAKY 1/2

@WillemKauf WillemKauf force-pushed the s3_content_type_parse branch from 4ef35e1 to e060ecb Compare April 7, 2025 19:58
@WillemKauf WillemKauf force-pushed the s3_content_type_parse branch from e060ecb to 78455b6 Compare April 7, 2025 20:11
@WillemKauf WillemKauf requested review from nvartolomei and Lazin April 7, 2025 20:29
`ss::httpd::function_handler` calls `rep->done(_type)` on the reply before
returning it. Because certain code paths in imposter services may want to
return a different `Content-Type` in their response, this impl's `handle()`
function allows for modifying the `type` set in the function handler's body
before the final call to `done(type)` is performed.

This class also handles an oversight in `mime_types.cc`, in which `xml` is
not handled as a possible `mime` type.
All S3 error codes should be returned using `application/xml`
as the `Content-Type` header.
There are code paths here that possibly return a `xml` body for their
response. They need to ensure the response will be properly marked with
`application/xml` in the `Content-Type` header.
…ly_with()`

Sometimes we need to be able to set the headers and the body of a response.

Add a new overload that allows us to do so.
…tor_test`

The `not_found_response` is XML encoded.
…t_test`

More XML encoded responses that need their `Content-Type` headers properly set.
Future commits will add use of this function to the `s3_client`.
For that reason, move the `enum class response_content_type` to `types.h`,
and `get_response_content_type()` helper function to `util.cc/.h`.
According to the S3 docs (`https://docs.aws.amazon.com/AmazonS3/latest/API/ErrorResponses.html`),
REST error responses _should_ have the Content-Type header set to
`application/xml`. However, response like `503 Service Unavailable` may
not be of this assumed `Content-Type`.

As seen in CI (abridged output):

```
WARN  s3 - s3_client.cc:841 - S3 PUT request failed for key "...": Service Unavailable [Content-Type: text/plain; charset=utf-8]; ...
ERROR s3 - util.cc:209 - !!parsing error boost::wrapexcept<boost::property_tree::xml_parser::xml_parser_error> (<unspecified file>(1): expected <)
ERROR s3 - s3_client.cc:441 - !!error parse error boost::wrapexcept<boost::property_tree::xml_parser::xml_parser_error> (<unspecified file>(1): expected <)
```

To fix this issue, read the `Content-Type` header in `s3_client` instead of
assuming `xml` as the `Content-Type` and attempting to parse it. If the
`Content-Type` is not `xml`, we will return a partially filled
`rest_error_response`.
@WillemKauf WillemKauf force-pushed the s3_content_type_parse branch from 78455b6 to 1cc76c7 Compare April 7, 2025 20:56
}
}

auto result_prefix = buf.empty() ? "Empty error response, " : "";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shall we dump few bytes of the response at trace log level to aid debugging in the future if we fall into this unexpected branch with non-xml/non-empty response?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah that would be a good idea, I can add it in a follow up

@WillemKauf
Copy link
Contributor Author

/cdt
rp_version=build
tests/rptest/tests/random_node_operations_test.py

@WillemKauf
Copy link
Contributor Author

Running a cdt test for a sanity check, will merge after.

@WillemKauf
Copy link
Contributor Author

/ci-repeat

@redpanda-data redpanda-data deleted a comment from vbotbuildovich Apr 15, 2025
@WillemKauf WillemKauf merged commit 3e1763f into redpanda-data:dev Apr 15, 2025
21 of 22 checks passed
@vbotbuildovich
Copy link
Collaborator

/backport v25.1.x

@vbotbuildovich
Copy link
Collaborator

/backport v24.3.x

@vbotbuildovich
Copy link
Collaborator

Failed to create a backport PR to v24.3.x branch. I tried:

git remote add upstream https://github.com/redpanda-data/redpanda.git
git fetch --all
git checkout -b backport-pr-25738-v24.3.x-211 remotes/upstream/v24.3.x
git cherry-pick -x 7477d92458 0f6d14a8c5 8dd0922965 a803a63c60 115bf816c0 366027f14d cc5b49fac2 1cc76c7dd5

Workflow run logs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants