Replies: 3 comments 2 replies
-
Hi @ubmarco, thanks for starting this discussion! I appreciate that you keep pushing the ontology based validation for Sphinx-Needs. Regarding the Rust support for SHACL: I agree that the rudof library is not yet mature enough and the performance of Python based validation does not scale for large projects. What about Java based alternatives, such as RDF4J or Apache Jena? Both are mature and provide good performance for validation. |
Beta Was this translation helpful? Give feedback.
-
Hi, I've tried out the PR #1441, and I was able to represent our metamodel and its constraints with this JSON schema approach. For large graph checks, the |
Beta Was this translation helpful? Give feedback.
-
Hi @ubmarco, Thanks for this collection of requirements (I'm missing the Sphinx-Needs version of it, but I understand GitHub doesn't render them [yet]). In my opinion quite complete. Additionally to requirement 5, the ontology may be used to not only generate snippets, but also to offer e.g. generation of compliant derived needs, generation of compliant links, etc. This may be tricky to realize, though (where to generate the need? with which ID? etc.), so it is at best a may requirement for the backlog. (Hi @AlexanderLanin, funny to meet good old colleagues in a GitHub issue that isn't C++-related ;-).) |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Introduction
The last Sphinx-Needs user group meetings showed increased interest in the topic of
schema (ontology) validation.
2 previous discussions also address this topic:
Started at Jun 9, 2022
Started at Dec 20, 2023
This discussion picks up the topic again and proposes a way forward.
Goals of the ontology concept
This requires resolution of the whole project incl. dependencies.
start a discussion around this.
Note
The Sphinx-Needs internal representation of data is not changing with this proposal.
This may happen in future, as any upstream and downstream tool is interested in the type
and structure of incoming data. Also Sphinx-Needs filters can benefit from typed fields.
Condensed list of known requirements
The following list is a condensed version of the requirements for the technical solution
that were discussed.
The list adhers to RFC 2119 for terms of requirement levels.
Rational:
It is commonly connected to upstream and downstream data sources that also feature schema definitions.
Rational:
Rational: Avoid custom code.
Rational: Required for snippet generation and auto-completion.
the existence of a field depends on other field values.
string
defaultint
whole numbersfloat
floating point numbersbool
boolean values (understandsyes
,no
,true
,false
,1
,0
)array
of above types; currently only for tags, but can be extended in future to other options as welldate
(ISO 8601)datetime
(ISO 8601)time
(ISO 8601)duration
(ISO 8601)email
(RFC 5322)url
(RFC 3986)uuid
(RFC 4122)enum
(string with a list of allowed values)This shall also include chains of links.
Example: There are 3 need types
SPEC
,FEAT
,USECASE
and they link likeSPEC > FEAT > USECASE
.If
SPEC
has an enum string fieldasil
that can be any ofQM|A|B|C|D
and, if it is any ofA|B|C|D
, thenFEAT
must have the field set (required) and it cannot beQM
and it must link to aUSECASE
of the samequality.
Rational: Execution as part of a language server.
Rational: Reduce waiting time in Sphinx builds and for IDE / CLI apps.
Rational: Avoid reinventing the wheel.
Examples: JSON Schema, SHACL, Cypher, Protobuf, SysMLv2.
(think of the types
SYS_REQ
andSW_REQ
that both extend fromREQ
)Rational: Derive new fields from existing ones.
Example: Sum all effort float fields of linked needs and validate the result against a threshold.
Note
Many requirements can be fulfilled already by needs_warnings and needs_constraints.
However, complex checks have to be custom coded and also lack performance.
The existing solutions quickly become confusing for bigger projects.
Options
Unfortunately there is no single available solution that ticks all boxes.
The following options have been evaluated:
JSON Schema
Idea is to validate the need items directly with JSON schema definitions.
Validating the full need graph is not possible with JSON schema out of the box.
This can be done with a custom implementation that uses JSON schema validation.
SHACL
SHACL is a W3C standard for RDF graphs and can be used to validate the need items.
The rudof library lacks important features such as sh:pattern or sh:closed.
Performance seems good however.
Graph Query Languages
The idea is to write graph queries in languages such as Cypher or SPARQL as user input and run them against the need graph in a local graph DB. All items returned are violating the query constraint.
Kuzu).
Solutions
The PR #1441 proposes a solution that is based on JSON schema.
It is a good compromise between the requirements and the available options.
The PR is work-in-progress, however the test cases already lays out how the schema definition would look like:
https://github.com/useblocks/sphinx-needs/tree/mh-schema-validation/tests/doc_test/doc_schema
The approach uses a JSON file that holds a list of schemas.
Each schema may have these keys:
id
a unique identifier for the schema for logging and referencingmessage
a user provided error message that is shown when the schema validation failsseverity
the severity of the error, aligned with SHACL toviolation
,warning
andinfo
types
list of need types the schema applies to; if missing, the schema applies to all need typestrigger_schema
a JSON schema that, if it validates against the need item, activateslocal_schema
andlink_schema
; this makes it possible to write complex conditional schemaslocal_schema
a JSON schema that validates against the unresolved need item, so links will just bea list of string need IDs. That is helpful for fast validation because the need graph does not need
to be built.
link_schema
dictionary that maps link types to a dictionary that constrains resolved need links<link_type>
link type for the constraintschema_id
the ID of the schema that is used to validate each linked needminItems
the minimum number of linked needs that validate against the schemamaxItems
the maximum number of linked needs that validate against the schemaunevaluatedItems
whether to complain about additionally linked needs that fail the schema, if the min/maxItems constraint is already metThe
link_schema
concept also allows modeling chains of links.Note
Existing SHACL input could be transpiled to this JSON schema solution.
Note
The new schema definition can already be part of exchanged data between projects,
so compatiblity checks become possible.
Going forward
The PR is work-in-progress and needs to be finished (test cases, docs).
I also invite others to comment on the proposal and the PR and also outline validations that are not possible
with the current approach.
I'm also interested in counter arguments and completely different proposals that also tick the most important boxes.
Beta Was this translation helpful? Give feedback.
All reactions