-
Notifications
You must be signed in to change notification settings - Fork 17
Add annotated YAML loading #6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
* Make `YamlLoader` generic on the type of the `Node`. This is required because deeper node need to have annotations too. * Add a `LoadableYamlNode` trait, required for YAML node types to be loaded by `YamlLoader`. It contains methods required by `YamlLoader` during loading. * Implement `LoadableYamlNode` for `Yaml`. * Take `load_from_str` out of `YamlLoader` for parsing non-annotated nodes. This avoids every user to specify the generics in `YamlLoader::<Yaml>::load_from_str`.
A few changes have had to be made to `LoadableYamlNode`: * The `From<Yaml>` requirement has been removed as it can be error-prone. It was not a direct conversion as it is unable to handle `Yaml::Hash` or `Yaml::Array` with a non-empty array/map. * Instead, `from_bare_yaml` was added, which does essentially the same as `From` but does not leak for users of the library. * `with_marker` has been added to populate the marker for the `Node`. The function is empty for `Yaml`. `load_from_*` methods have been added to `MarkedYaml` for convenience. They load YAML using the markers. The markers returned from `saphyr-parser` are not all correct, meaning that tests are kind of useless for now as they will fail due to bugs outside of the scope of this library.
This would make more sense in user code: ```rs Yaml::load_from_str("foo"); // Explicit that we're parsing YAML load_from_str("foo"); // Too implicit, too generic, may be from another lib ``` Plus, this mirrors `MarkedYaml`'s behavior.
Can we also have the end markers? 😅 |
I would have liked to have end markers, but as you may have noticed when implementing it on your side, it is rather complicated. The parser can hardly provide that information. We don't build a full AST and functions that scan tokens sometimes need to look some characters ahead and can't provide accurate end markers. We could change scanning functions to store potential markers and update them when context is known, but I know not if it's worth the price. Alternatively (and as you implemented), we can "guess" the end of marker from the node's contents and the starting position. But it doesn't help that YAML allows If we were to add a Now the good news is that |
self.data.is_hash() | ||
} | ||
|
||
fn is_badvalue(&self) -> bool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would rename this as bad_value
(even the counterpart is BadValue
or something along the line).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think I'm gonna rename this to bad_value
. Null
can sometimes be a "bad value" and that underscore would suggest that it means "is the object a [general concept of] bad value". Without underscore, badvalue
more directly refers to the BadValue
variant, imo.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not very familiar with the parsing bit, so I can't say much here, but in general LGTM and I would totally make use of this. Thanks for doing it!
/// Notable differences with [`Yaml`]: | ||
/// * Indexing cannot return `BadValue` and will panic instead. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I doesn't have to be in this PR specifically - I'm happy to get this version for now, and leave this for future improvement. But I wonder if you couldn't just add a trait for that? trait BadValue { fn bad_value() -> Self }
and add this constraint to Node
. If it's important, that is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It turns out this is more difficult to implement than I thought since I'd need to initialize the Marker
in MarkedYaml
, but the fields are private.
I'll leave it as-is, but other than that, the function can be added to LoadableYamlNode
.
} | ||
|
||
// I don't know if it's okay to implement that, but we need it for the hashmap. | ||
impl Eq for MarkedYaml {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Most of the time, the offending types for Eq
are types that can contain floats transitively (because this is the typical primitive value that isn't Eq
). Which isn't the case here, so I believe it's entirely fine.
src/loader.rs
Outdated
if self.doc_stack.is_empty() { | ||
self.doc_stack.push(node); | ||
} else { | ||
let parent = self.doc_stack.last_mut().unwrap(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nitpick: you can avoid the unwrap by reorganizing the if-then-else:
if let Some(parent) = self.doc_stack.last_mut() {
//...code of the current else...
}
else {
self.doc_stack.push(node);
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like nitpicks like these. I rely too much on unwrap
sometimes. Good catch!
/// | ||
/// [`Array`]: `Yaml::Array` | ||
/// [`Hash`]: `Yaml::Hash` | ||
fn from_bare_yaml(yaml: Yaml) -> Self; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know if it's worth it, but interestingly with the parametrized Yaml representation, you can define a type that is exactly "a shallow YAML tree of 1 level that doesn't contain any more content", which is YamlData<()>
, so that from_bare_yaml
could return a YamlData<()>
, which would better reflect what it actually represents
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't really think that is worth it, as it would require more constraints on the Node
generic at the time of loading. We would need to add Node: From<YamlData<()>>
, which I don't need that we want.
@Ethiraric very excited about it! I don't know any solid and active serde-compatible library for YAML/JSON that handle position. Do you plan to make a release with this PR? |
I can make a release. Beware however, it seems like markers are out-of-sync. The markers we receive from the parser do not match the input. It needs a release in the parser, but I'm hoping to get my refactor prior to the next release. I've changed the way the input is handled so that I can optimize loading for in-memory strings, where we know in advance the bounds of the buffer. |
Sorry, I'm not sure to understand this part. |
The parser fails to properly keep track of the markers. I tried writing tests for this feature, including matching the markers, and it failed. For instance foo: bar May return a marker We receive that data from |
Add support for adding a
Marker
to everyYaml
node throughMarkedYaml
. The process was as follows:MarkedYaml
structure that owns aMarker
and data (aYaml
-like).Yaml
-like field must have anArray(Vec<MarkedYaml>)
variant and not anArray(Vec<Yaml>)
one as this would mean that child nodes are not annotated. AYamlData<Node>
enum was added, which mimicsYaml
but usesNode
instead ofYaml
for theArray
andHash
variants.YamlLoader
so that it is generic over the type of node that we parse.LoadableNode
trait that gathers methods that aNode
must support (conversion from a basicYaml
object, access to the hash and array fields, ...).LoadableNode
for bothYaml
andMarkedYaml
.Marker
-specific methods toLoadableNode
to populate the marker field ofMarkedYaml
and use a dummy function forYaml
.YamlLoader<Node>
so that it calls that functionload_from_*
functions toMarkedYaml
.I also moved
load_from_*
methods forYaml
fromYamlLoader
toYaml
. This makes it clearer to call (Yaml::load_from_str
as opposed toYamlLoader::load_from_str
) and also mimics the API ofMarkedYaml
.This PR makes changes in almost all of the codebase. I apologize to those who we working and on whose code I have wreaked havoc.
I have released
saphyr-parser
v0.0.2
. This was necessary to exposeMarker
and have it be#[derive(Default)]
.However, writing tests for this feature is kind of useless at this time. I wrote a basic one and it turns out the
Marker
s returned by the parser are not always correct. When loadingfoo: bar
, it reports aDocumentStart
at offset 3 instead of 0.foo
is correctly marked at offset 0, though. CallingMarkedYaml::load_from_*
correctly populatesMarkedYaml
with the incorrectMarker
s the parser returns. I'll have to fix that and make a new release ofsaphyr-parser
prior to writing tests forMarkedYaml
.This fixes Ethiraric/yaml-rust2#27 that, to this day, I still cannot transfer to this repository.
cc @felipesere @yannham