Skip to content

sync metamodel format #751

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
AlexanderLanin opened this issue Mar 21, 2025 · 5 comments
Open

sync metamodel format #751

AlexanderLanin opened this issue Mar 21, 2025 · 5 comments
Assignees
Labels
community:infrastructure General Score infrastructure topics

Comments

@AlexanderLanin
Copy link
Member

AlexanderLanin commented Mar 21, 2025

Our current metamodel.yml follows a different approach than the one by ubcode. We should align.

Participants:


While we were just talking about contributing, @ubmarco has already prepared a PR at useblocks/sphinx-needs#1441. Now we really urgently need to compare the approaches.


First draft (please edit this message!)

- useblocks/sphinx-needs#1441 S-CORE
Extension Directly embedded into sphinx-needs Extension on top of sphinx-needs
Approach Rather generic Focused purely on S-CORE process requirements. More limited?
Focus Rather generic Readability and writeability of config
Stage Theoretical? In use
Number of links min and max are user defined optional and mandatory links. No need for min and max.
Severities Info, Warning, Violation etc All violations are fatal errors
Type validations str, int, bool etc planned, but so far only regex
Pretty errors included planned
Tests a few, difficult to write yes
Schema format json yaml (for human readability/writeability)
Types / complex checks triggers for local checks, local checks local checks, graph checks
Example config https://github.com/useblocks/sphinx-needs/blob/mh-schema-validation/tests/doc_test/doc_schema/schemas.json https://github.com/eclipse-score/docs-as-code/blob/main/src/extensions/score_metamodel/metamodel.yaml

Open points:

  • We'll probably need walkthroughs to understand our respective approaches?
@AlexanderLanin
Copy link
Member Author

AlexanderLanin commented May 15, 2025

Let's have a look at the 19 requirements layed out in useblocks/sphinx-needs#1451

❓ = not quite sure what that requirement means. Needs clarification with @ubmarco

req short useblocks/sphinx-needs#1441 S-CORE
1 Schema definition format shall be declarative and agnostic to specific programming languages json ✅ yaml ✅
2 Tool and library support shall be available for Python and Rust.
3 Tool and library support shall be available for Linux (x64, arm64), MacOS/Darwin (arm64) and Windows (x64).
4 Mapping of need items to schema types shall be part of the declarative description.
5 The solution shall support the definition of default values for extra options.
6 The solution shall work for both core options/links and extra options/links.
7 The solution shall support the required semantics of option and link fields.
8 The solution shall support the conditional required semantics of option and link fields, i.e. the existence of a field depends on other field values.
9 The solution shall offer the following need option data types only via regex
10 The solution shall support at least the following string formats only via regex
11 The solution shall support string regex patterns.
12 The solution shall support disallowing additional properties (closing the model).
13 The solution shall support need graph validation, i.e. outgoing need links shall be target to constraints.
14 The solution shall be fast to execute for local needs not measured. should be as fast as python allows
15 The solution shall be fast to execute for need graphs not optimized
16 The solution should follow an established standard. ❌ (intentional)
17 The solution should feature an extension or composition mechanism to re-use base definitions ❌ (intentional)
18 The solution may support rendering a visualisation of the schema types and links between them. ❌ (Reading as "should": it's not implemented)
19 The solution may calculate or aggregate data during validation. ❌ (Reading as "should": not needed)

@ubmarco
Copy link

ubmarco commented May 15, 2025

Hi @AlexanderLanin thanks for the efforts you and the team put in to make Sphinx-Needs even more useful.
Schema validation & ontology is on my mind since months and after the last Sphinx-Needs user group meeting I invested quite some thoughts to find a suitable solution and move the ecosystem further.

Thanks also for this write-up and comparison. Ultimately I think that Sphinx-Needs requires an internal way to describe the metamodel. I say that because we are building tools around Sphinx-Needs such as ubCode and its companion CLI app ubc that require a reliable schema interface to support the user in real-time as they type. Some of above requirements (e.g. Python+Rust, fast execution, split between local and graph validation, typing, default values) stem from this.

I consider typing crucial for the solution as it allows typed import and exports. You would not put a bool or integer into a string in a (graph) database. It also allows to connect Sphinx-Needs with stricter Engineering-as-Code solutions.
Or even change the internals of Sphinx-Needs to be typed for user provided fields.

Other requirements (graph validation over multiple nodes, user provided messages and severity, composition mechanism) stem from requirements of other Sphinx-Needs users that build their solutions around RDF and SHACL. These descriptions are much more powerful but also less well known in the community, so my goal is to build something that is at least compatible and transpilable.

My PR exists to showcase how a solution could look like that ticks all shall boxes.
I wrote around 50 test cases to find bugs in my own code and also do some performance testing.
I consider the test structure not a deciding factor as this can be improved without bothering the end user.
I want to build something that considers use cases of the bigger ecosystem, while still being familiar for developers.

I think we should organize a meeting (next week?) to align on this. In the meantime I will look a bit into your metamodel format and write some docs for my solution.

(Btw, I cannot edit your messages)

@AlexanderLanin
Copy link
Member Author

speed: agreed, I did not mark 14-15 as done, since it's not quite clear what "fast" is. And we did not measure. And it's probably the same anyway for python.

typing: agreed. So far we simply don't have a use case for it. But we want to add it anyway. Most notably an enum support, instead of writing regex.

testing: we are rather fond and happy with our rst based tests. Combined with the (hopefully readable) yaml config, this allows non developers to specify exactly how they want the metamodel to behave.

the others: let's see whether both solutions satisfy all requirements in detail

meeting: invitation is out to @danwos, please make sure he forwards the meeting. Public announcement at #236

So far my feeling is that the solutions are similar, although completely different. Ours is more specific, focuses on our use cases and on readable config. So from a very high level it might be possible for us to use our yaml frontend with your backend. Which does sound quite reasonable in general for any user facing software architecture. From your point of view, ours might have something that you don't have so far (at least a simpler architecture 😉), and a real life use case.

@ubmarco
Copy link

ubmarco commented May 21, 2025

I looked through the referenced metamodel example. Looks understandable from a user perspective and I think its built for the use cases you have.

Let me lay out some difficulties I see and that I want to discuss tomorrow:

    1. A lot of duplication (e.g. safety: "^(QM|ASIL_B|ASIL_D)$" appears 18 times). Source of this is a missing composition mechanism.
    1. Regexes are not particularly fast (think of the performance on big models)
    1. Regexes are used as a replacement for a proper typing system
    • Booleans should be upper/lower case versions of yes/no/true/false/1/0, but at the end I just want to say shall be a bool instead of "^(YES|NO|yes|no|true|false|TRUE|FALSE...)$"
    • Integer would have to be given as a regex (how would you build a minimum constraint?)
    • Floats/Decimals are quite hard to express with regex (not speaking about range constraints yet)
    • Some string formats are particularly hard to write as regex (emails/urls/ISO 8601 datetimes)
    • Sphinx-Needs has tags which is actually an array.
    1. Optional and mandatory options have to be given as "^.*$" or "^.+$" to mark them as required.
    1. graph_check
    • Are the graph_check conditions Python expressions (e.g. "safety != QM")? If yes, it's a problem for other languages consuming the same schema
      arch_safety_linkage:
        needs:
          include: "comp_req, feat_req"
          condition:
            and:
              - "safety != QM"
              - "status == valid"
      
    • How are conditions evaluated, can I have both and and or and not in a single check? If yes, how is that evaluated?
    • Is it possible to build check conditions that hop over multiple graph nodes? My observation: for link fields, needs_types only works on string IDs while graph_checks select certain needs and then look at their direct neighbors.
    • How would I express that I need max 2 links of a certain structure for an extra link field?

@AlexanderLanin
Copy link
Member Author

    1. We did actively decide against composition, as it increases complexity. Not typical for programming, but we decided that lowering the barrier to "how do I use that" is more important than "I have a single place to configure complex abstractions that I could not write on my own". Main motivation is that we want the metamodel.yml to be writeable by non programmers. They can manage a search-replace operation... hopefully.
  • 2+3) We want to introduce a typing system. Most notably enums.
    1. We have mandatory_options and optional_options (horrible names). Same with mandatory_links and optional_links.
    1. The expressions are a very limited set of allowed expressions with manual parsing. The current notation is hard to read, and we want to rework it to be more user friendly.
    • And and or in the same condition are currently not supported (no demand).
    • Checks over multiple graph nodes are currently not supported (no demand).
    • No min/max restrictions on links supported (no demand).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
community:infrastructure General Score infrastructure topics
Projects
Status: In Progress
Status: No status
Development

No branches or pull requests

3 participants