-
Notifications
You must be signed in to change notification settings - Fork 591
feat: initial Operator::from_uri
implementation
#5482
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
core/src/types/operator/registry.rs
Outdated
@@ -0,0 +1,226 @@ | |||
use std::cell::LazyCell; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added this file inside the crate::types::operator::registry
module. Is it ok? I thought about adding a crate::types::operator_registry
, but it seemed better this way.
core/src/types/operator/registry.rs
Outdated
|
||
// TODO: thread local or use LazyLock instead? this way the access is lock-free | ||
// TODO: should we expose the `GLOBAL_OPERATOR_REGISTRY` as public API at `crate::types::operator::GLOBAL_OPERATOR_REGISTRY`? | ||
thread_local! { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the preferred way of having a global static variable such as this?
I prefer to have it thread_local
so there is not need for a LazyLock
, we can use LazyCell
instead (LazyCell
is lock-free but LazyLock
is not, it synchronizes access through threads)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just curious, why do we need static var here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In order to have a global initialized registry and to not have to initialize it every time whenever we want to parse a uri
opendal/core/src/types/operator/builder.rs
Line 118 in 088d7d0
GLOBAL_OPERATOR_REGISTRY.with(|registry| registry.parse(uri, options)) |
core/src/types/operator/registry.rs
Outdated
// TODO: thread local or use LazyLock instead? this way the access is lock-free | ||
// TODO: should we expose the `GLOBAL_OPERATOR_REGISTRY` as public API at `crate::types::operator::GLOBAL_OPERATOR_REGISTRY`? | ||
thread_local! { | ||
pub static GLOBAL_OPERATOR_REGISTRY: LazyCell<OperatorRegistry> = LazyCell::new(|| OperatorRegistry::with_enabled_services()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
MSRV check fails due to the usage of LazyCell
. Should we update the MSRV or use another thing instead?
I see this TODO
about the once_cell
crate usage:
Line 248 in 52c96bb
# TODO: remove once_cell when lazy_lock is stable: https://doc.rust-lang.org/std/cell/struct.LazyCell.html |
If the replacement is planned, I think it would be better to use LazyCell
than once_cell::Lazy
in new code like this one, to not increase technical debt.

core/src/raw/http_util/uri.rs
Outdated
/// query_pairs will parse a query string encoded as key-value pairs separated by `&` to a vector of key-value pairs. | ||
/// It also does percent decoding for both key and value. | ||
/// | ||
/// Note that `?` is not allowed in the query string, and it will be treated as a part of the first key if included. | ||
pub fn query_pairs(query: &str) -> Vec<(String, String)> { | ||
query | ||
.split('&') | ||
.filter_map(|pair| { | ||
let mut iter = pair.splitn(2, '='); | ||
// TODO: should we silently ignore invalid pairs and filter them out without the user noticing? | ||
// or should we return an error in the whole `query_pairs` function so the caller knows it failed? | ||
let key = iter.next()?; | ||
let value = iter.next().unwrap_or(""); | ||
Some((key, value)) | ||
}) | ||
.map(|(key, value)| (percent_decode_path(key), percent_decode_path(value))) | ||
.collect() | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we have
pub struct QueryPairsWriter {
now, use this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't get what you mean. QueryPairsWriter
is used to write query param pairs to an url, not for reading them. The function query_pairs
this PR adds is to *read query param pairs` from an url
so many "todo"... keep the code change small, clippy happy, test pass, and the code style unified (you can use llm for this kind of idea) then you can answer many todos by yourself. :) |
Thank you very much for your review @asukaminato0721! I usually leave the TODOs in code as comments to the reviewer, will address everything asap! |
fn from_uri(uri: &Uri, options: impl IntoIterator<Item = (String, String)>) -> Result<Self> { | ||
let query_pairs = uri.query().map(query_pairs).unwrap_or_default(); | ||
|
||
let merged_options = query_pairs.into_iter().chain(options); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unsure about
let merged_options = query_pairs.into_iter().chain(options);
or
let merged_options = options.chain(query_pairs.into_iter());
Which one should take precedence? query_pairs should overwrite the options, or should the options overwrite the query_pairs? I'm thinking on the case that query_pairs and options contain the same key
Right now, I think that the behaviour with query_pairs.into_iter().chain(options)
is that options
takes precedence and them overwrite whatever goes in query_pairs
if a key is present in both
pub use registry::OperatorFactory; | ||
pub use registry::OperatorRegistry; | ||
pub use registry::GLOBAL_OPERATOR_REGISTRY; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we expose these 3 as public api @Xuanwo? or should we pub(crate) them instead?
Or should we only expose OperatorFactory
and OperatorRegistry
but not GLOBAL_OPERATOR_REGISTRY
?
Hi @Xuanwo, I worked a bit more on this and I think it is ready for another review round. Once the implementation looks good to you, I will work on tests and documentation, is this okay? @asukaminato0721 I think I addressed most of your comments |
@@ -137,6 +148,15 @@ pub trait Configurator: Serialize + DeserializeOwned + Debug + 'static { | |||
}) | |||
} | |||
|
|||
/// TODO: document this. | |||
fn from_uri(uri: &Uri, options: impl IntoIterator<Item = (String, String)>) -> Result<Self> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it okay to accept &Uri
here? This way, we don't have to parse the uri again (it is only done here right now)
Or should we change it to &str
and parse the uri twice?
Relates to #5445
Left some doubts as
// TODO
in the source. I have little experience contributing to this repo so I'm sorry if there are a lot of doubts about this. Just want to be sure all the changes of this PR aligns with the current codebase. Please take a look to all theTODOs
I left when reviewing.I would like to add more tests, but I don't know in which place those should be placed. The
core/tests
folder seems like a good place, but I don't find any place suitable, as placing those incore/tests/behaviour
seems weird to me. But as this implies various components, maybe we can have acore/tests/integration
? Although I would like to write some unit tests atcore/src/types/builder.rs
,core/src/types/operator/builder.rs
andcore/src/types/operator/registry.rs
, but didn't any existing unit tests there.In this PR I implemented a single
Configurator::from_uri
method, which will serve as default and takes only the query parameters as options. Services which need a more specific configuration such as s3 or azblob can be implemented in follow-up PRs.I also have pending documentating all the newly added public API, but will do that after an initial review round.
Thank you very much.