Skip to content

Commit 4854086

Browse files
hf-kkleinKonstantinCopilot
authored
feat: Add materialized helper table in SQLite which unfolds the recursive Segment Group structure + Pydantic/SQLModel and can easily be queried (#111)
* wip ahb view * wip * wip * feat: add `is_outdated` property to Anwendungsfall (interpret `##alt##`) wtf * wip * spellcheck * remove WIP import * fix snapshot test imports * extend readme * extend readme 2 * fix test * fix test 2 * Update src/fundamend/sqlmodels/ahbview.py Co-authored-by: Copilot <[email protected]> * add snapshot test * Renames SQL path variable for clarity Renames the variable storing the path to the SQL command file to `sql_command_path` for improved clarity and readability. #111 (comment) * add efoli edifact format version to ahb + view and format sql file... sorry for this * update snapshots * fix import orrder * black * rework duplicate-check * add kommunikationvon and beschreibung to helper view * snapshots, * pylint * add id-path * use code value, not name --------- Co-authored-by: Konstantin <[email protected]> Co-authored-by: Copilot <[email protected]>
1 parent 0011b20 commit 4854086

File tree

9 files changed

+14595
-7
lines changed

9 files changed

+14595
-7
lines changed

README.md

Lines changed: 68 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -133,6 +133,74 @@ my_sql_model = SqlAnwendungshandbuch.from_model(pydantic_ahb)
133133
pydantic_ahb = my_sql_model.to_model()
134134
```
135135

136+
#### Befüllen einer Datenbank mit AHB-Informationen
137+
In den XML-Rohdaten sind die Informationen aus den AHBs theoretisch beliebig tief verschachtelt, weil jede Segmentgruppe ihrerseits wieder Segmentgruppen enthalten kann.
138+
Diese Rekursion ist so auch in den SQL-Model-Klassen und der Datenbank abgebildet.
139+
Dieses Paket liefert eine Hilfsfunktion, die die AHBs wieder "flach" zieht, sodass die Datenstruktur mit den flachen AHBs aus bspw. den PDF-Dateien vergleichbar ist, ohne jedoch die Strukturinformationen zu verlieren.
140+
Dazu wird eine rekursive Common Table Expression (CTE) verwendet, um eine zusätzliche Hilfstabelle `ahb_hierarchy_materialized` zu befüllen.
141+
142+
```python
143+
# pip install fundamend[sqlmodel]
144+
from pathlib import Path
145+
from fundamend.sqlmodels.ahbview import create_db_and_populate_with_ahb_view
146+
from fundamend.sqlmodels.anwendungshandbuch import AhbHierarchyMaterialized
147+
from sqlmodel import Session, create_engine, select
148+
ahb_paths = [
149+
Path("UTILTS_AHB_1.1c_Lesefassung_2023_12_12_ZPbXedn.xml"),
150+
# add more AHB XML files here
151+
]
152+
sqlite_file = create_db_and_populate_with_ahb_view(ahb_paths) # copy the file to somewhere else if necessary
153+
engine = create_engine(f"sqlite:///{sqlite_file}")
154+
with Session(bind=engine) as session:
155+
stmt = select(AhbHierarchyMaterialized).where(AhbHierarchyMaterialized.pruefidentifikator == "25001").order_by(
156+
AhbHierarchyMaterialized.sort_path
157+
)
158+
results = session.exec(stmt).all()
159+
```
160+
oder in plain SQL:
161+
```sql
162+
-- sqlite dialect
163+
SELECT path,
164+
type,
165+
segmentgroup_name,
166+
segmentgroup_ahb_status,
167+
segment_id,
168+
segment_name,
169+
segment_ahb_status,
170+
dataelementgroup_id,
171+
dataelementgroup_name,
172+
dataelement_id,
173+
dataelement_name,
174+
dataelement_ahb_status,
175+
code_value,
176+
code_name,
177+
code_ahb_status
178+
FROM ahb_hierarchy_materialized
179+
WHERE pruefidentifikator = '25001'
180+
ORDER BY sort_path;
181+
```
182+
<details>
183+
<summary>Ergebnisse des `SELECT`</summary>
184+
<br>
185+
... 125 andere Zeilen ...
186+
187+
| path | type | segmentgroup\_name | segmentgroup\_ahb\_status | segment\_id | segment\_name | segment\_ahb\_status | dataelementgroup\_id | dataelementgroup\_name | dataelement\_id | dataelement\_name | dataelement\_ahb\_status | code\_value |
188+
| :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- |
189+
| Vorgang &gt; Bestandteil des Rechenschritts | segment\_group | Bestandteil des Rechenschritts | Muss \[2006\] | null | null | null | null | null | null | null | null | null |
190+
| Vorgang &gt; Bestandteil des Rechenschritts &gt; Bestandteil des Rechenschritts | segment | Bestandteil des Rechenschritts | Muss \[2006\] | SEQ | Bestandteil des Rechenschritts | Muss | null | null | null | null | null | null |
191+
| Vorgang &gt; Bestandteil des Rechenschritts &gt; Bestandteil des Rechenschritts &gt; Handlung, Code | dataelement | Bestandteil des Rechenschritts | Muss \[2006\] | SEQ | Bestandteil des Rechenschritts | Muss | null | null | D\_1229 | Handlung, Code | null | null |
192+
| Vorgang &gt; Bestandteil des Rechenschritts &gt; Bestandteil des Rechenschritts &gt; Handlung, Code &gt; Bestandteil des Rechenschritts | code | Bestandteil des Rechenschritts | Muss \[2006\] | SEQ | Bestandteil des Rechenschritts | Muss | null | null | D\_1229 | Handlung, Code | null | Z37 |
193+
| Vorgang &gt; Bestandteil des Rechenschritts &gt; Bestandteil des Rechenschritts &gt; Information über eine Folge | dataelementgroup | Bestandteil des Rechenschritts | Muss \[2006\] | SEQ | Bestandteil des Rechenschritts | Muss | C\_C286 | Information über eine Folge | null | null | null | null |
194+
| Vorgang &gt; Bestandteil des Rechenschritts &gt; Bestandteil des Rechenschritts &gt; Information über eine Folge &gt; Rechenschrittidentifikator | dataelement | Bestandteil des Rechenschritts | Muss \[2006\] | SEQ | Bestandteil des Rechenschritts | Muss | C\_C286 | Information über eine Folge | D\_1050 | Rechenschrittidentifikator | X \[913\] | null |
195+
| Vorgang &gt; Bestandteil des Rechenschritts &gt; Referenz auf eine Zeitraum-ID | segment | Bestandteil des Rechenschritts | Muss \[2006\] | RFF | Referenz auf eine Zeitraum-ID | Muss | null | null | null | null | null | null |
196+
| Vorgang &gt; Bestandteil des Rechenschritts &gt; Referenz auf eine Zeitraum-ID &gt; Referenz | dataelementgroup | Bestandteil des Rechenschritts | Muss \[2006\] | RFF | Referenz auf eine Zeitraum-ID | Muss | C\_C506 | Referenz | null | null | null | null |
197+
| Vorgang &gt; Bestandteil des Rechenschritts &gt; Referenz auf eine Zeitraum-ID &gt; Referenz &gt; Referenz, Qualifier | dataelement | Bestandteil des Rechenschritts | Muss \[2006\] | RFF | Referenz auf eine Zeitraum-ID | Muss | C\_C506 | Referenz | D\_1153 | Referenz, Qualifier | null | null |
198+
| Vorgang &gt; Bestandteil des Rechenschritts &gt; Referenz auf eine Zeitraum-ID &gt; Referenz &gt; Referenz, Qualifier &gt; Referenz auf Zeitraum-ID | code | Bestandteil des Rechenschritts | Muss \[2006\] | RFF | Referenz auf eine Zeitraum-ID | Muss | C\_C506 | Referenz | D\_1153 | Referenz, Qualifier | null | Z46 |
199+
| Vorgang &gt; Bestandteil des Rechenschritts &gt; Referenz auf eine Zeitraum-ID &gt; Referenz &gt; Referenz auf Zeitraum-ID | dataelement | Bestandteil des Rechenschritts | Muss \[2006\] | RFF | Referenz auf eine Zeitraum-ID | Muss | C\_C506 | Referenz | D\_1154 | Referenz auf Zeitraum-ID | X \[914\] ∧ \[937\] \[59\] | null |
200+
201+
...
202+
</details>
203+
136204
### CLI Tool für XML➡️JSON Konvertierung
137205
Mit
138206
```bash

domain-specific-terms.txt

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -13,3 +13,5 @@ alle
1313
ende
1414
tages
1515
sie
16+
rekursion
17+
rekursive

pyproject.toml

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -113,3 +113,6 @@ exclude = ["/unittests"]
113113
[tool.hatch.build.targets.wheel]
114114
only-include = ["src"]
115115
sources = ["src"]
116+
include = [
117+
"src/fundamend/sqlmodels/*.sql",
118+
]

src/fundamend/sqlmodels/ahbview.py

Lines changed: 169 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,169 @@
1+
"""
2+
helper module to create a "materialized view" (in sqlite this means: create and populate a plain table)
3+
"""
4+
5+
import logging
6+
import tempfile
7+
from datetime import date
8+
from itertools import groupby
9+
from pathlib import Path
10+
from typing import Iterable, Literal, Optional
11+
12+
import sqlalchemy
13+
from efoli import get_edifact_format_version
14+
from pydantic import BaseModel
15+
from sqlalchemy.sql.functions import func
16+
from sqlmodel import Session, SQLModel, create_engine, select
17+
18+
from fundamend import AhbReader
19+
from fundamend import Anwendungshandbuch as PydanticAnwendungshandbuch
20+
from fundamend.sqlmodels.anwendungshandbuch import (
21+
AhbHierarchyMaterialized,
22+
Anwendungsfall,
23+
)
24+
from fundamend.sqlmodels.anwendungshandbuch import Anwendungshandbuch as SqlAnwendungshandbuch
25+
from fundamend.sqlmodels.anwendungshandbuch import (
26+
Code,
27+
DataElement,
28+
DataElementGroup,
29+
Segment,
30+
SegmentGroup,
31+
SegmentGroupLink,
32+
)
33+
34+
_logger = logging.getLogger(__name__)
35+
36+
37+
def create_ahb_view(session: Session) -> None:
38+
"""
39+
Create a materialized view for the Anwendungshandbücher using a SQLAlchemy session.
40+
Warning: This is only tested for SQLite!
41+
"""
42+
path_to_sql_command = Path(__file__).parent / "materialize_ahb_view.sql"
43+
44+
with open(path_to_sql_command, "r", encoding="utf-8") as sql_file:
45+
bare_sql = sql_file.read()
46+
47+
bare_statements = bare_sql.split(";")
48+
49+
for bare_statement in bare_statements:
50+
statement = bare_statement.strip()
51+
if statement:
52+
session.execute(sqlalchemy.text(statement))
53+
session.commit()
54+
number_of_inserted_rows = session.scalar(
55+
select(func.count(AhbHierarchyMaterialized.id)) # type:ignore[arg-type] # pylint:disable=not-callable #
56+
)
57+
_logger.info(
58+
"Inserted %d rows into the materialized view %s",
59+
number_of_inserted_rows,
60+
AhbHierarchyMaterialized.__tablename__,
61+
)
62+
63+
64+
class _PruefiValidity(BaseModel):
65+
"""
66+
models how long a model, associated with a pruefidentifikator is valid
67+
"""
68+
69+
gueltig_von: Optional[date] # inclusive start
70+
gueltig_bis: Optional[date] # exclusive end
71+
pruefidentifikator: str
72+
73+
def overlaps(self, other: "_PruefiValidity") -> bool:
74+
"""
75+
returns true if the two validity periods overlap
76+
"""
77+
return (
78+
(self.gueltig_bis is None or other.gueltig_von is None or self.gueltig_bis > other.gueltig_von)
79+
and (self.gueltig_von is None or other.gueltig_bis is None or self.gueltig_von < other.gueltig_bis)
80+
or (self.gueltig_bis is None and other.gueltig_bis is None)
81+
or (self.gueltig_von is None and other.gueltig_von is None)
82+
)
83+
84+
85+
def _check_for_no_overlaps(pruefi_validities: list[_PruefiValidity]) -> None:
86+
"""raises a value error if there are duplicates/redundancies"""
87+
duplicate_pruefis_for_same_gueltigkeitszeitraum = []
88+
89+
for duplicate_pruefi, group in groupby(
90+
sorted(pruefi_validities, key=lambda x: x.pruefidentifikator), key=lambda x: x.pruefidentifikator
91+
):
92+
group_list = list(group)
93+
if any(a.overlaps(b) for a, b in zip(group_list, group_list[1:])):
94+
duplicate_pruefis_for_same_gueltigkeitszeitraum.append(duplicate_pruefi)
95+
if any(duplicate_pruefis_for_same_gueltigkeitszeitraum):
96+
raise ValueError(
97+
# pylint:disable=line-too-long
98+
f"There are duplicate pruefidentifikators in the AHBs: {', '.join(duplicate_pruefis_for_same_gueltigkeitszeitraum)}. Dropping the source tables is not a good idea."
99+
)
100+
101+
102+
def create_db_and_populate_with_ahb_view(
103+
ahb_files: Iterable[Path | tuple[Path, date, Optional[date]] | tuple[Path, Literal[None], Literal[None]]],
104+
drop_raw_tables: bool = False,
105+
) -> Path:
106+
"""
107+
Creates a SQLite database as temporary file, populates it with the AHBs provided and the materializes the AHB view.
108+
You may provide either paths to the AHB.xml files or tuples where each Path comes with a gueltig_von and gueltig_bis
109+
date.
110+
Optionally deletes the original tables to have a smaller db file (only if the prüfis are unique across all AHBs).
111+
Returns the path to the temporary database file.
112+
The calling code should move the file to a permanent location if needed.
113+
"""
114+
with tempfile.NamedTemporaryFile(suffix=".sqlite", delete=False) as sqlite_file:
115+
sqlite_path = Path(sqlite_file.name)
116+
engine = create_engine(f"sqlite:///{sqlite_path}")
117+
SQLModel.metadata.drop_all(engine)
118+
SQLModel.metadata.create_all(engine)
119+
pruefis_added: list[_PruefiValidity] = []
120+
with Session(bind=engine) as session:
121+
for item in ahb_files:
122+
ahb: PydanticAnwendungshandbuch
123+
gueltig_von: Optional[date]
124+
gueltig_bis: Optional[date]
125+
if isinstance(item, Path):
126+
ahb = AhbReader(item).read()
127+
gueltig_von = None
128+
gueltig_bis = None
129+
elif isinstance(item, tuple):
130+
ahb = AhbReader(item[0]).read()
131+
gueltig_von = item[1]
132+
gueltig_bis = item[2]
133+
else:
134+
raise ValueError(f"Invalid item type in ahb_files: {type(item)}")
135+
sql_ahb = SqlAnwendungshandbuch.from_model(ahb)
136+
sql_ahb.gueltig_von = gueltig_von
137+
sql_ahb.gueltig_bis = gueltig_bis
138+
if sql_ahb.gueltig_von is not None:
139+
sql_ahb.edifact_format_version = get_edifact_format_version(sql_ahb.gueltig_von)
140+
session.add(sql_ahb)
141+
pruefis_added += [
142+
_PruefiValidity(
143+
pruefidentifikator=af.pruefidentifikator, gueltig_bis=gueltig_bis, gueltig_von=gueltig_von
144+
)
145+
for af in sql_ahb.anwendungsfaelle
146+
]
147+
session.commit()
148+
session.flush()
149+
create_ahb_view(session)
150+
if drop_raw_tables:
151+
_check_for_no_overlaps(pruefis_added)
152+
for model_class in [
153+
SqlAnwendungshandbuch,
154+
Anwendungsfall,
155+
Code,
156+
DataElement,
157+
DataElementGroup,
158+
Segment,
159+
SegmentGroup,
160+
SegmentGroupLink,
161+
]:
162+
session.execute(sqlalchemy.text(f"DROP TABLE IF EXISTS {model_class.__tablename__};"))
163+
_logger.debug("Dropped %s", model_class.__tablename__)
164+
session.commit()
165+
session.flush()
166+
return sqlite_path
167+
168+
169+
__all__ = ["create_db_and_populate_with_ahb_view", "create_ahb_view"]

src/fundamend/sqlmodels/anwendungshandbuch.py

Lines changed: 89 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,11 @@
11
"""Anwendungshandbuch SQL models"""
22

3+
import uuid
4+
from datetime import date
35
from typing import Optional, Union
6+
from uuid import UUID
7+
8+
from efoli import EdifactFormatVersion
49

510
# pylint: disable=too-few-public-methods, duplicate-code, missing-function-docstring
611

@@ -15,9 +20,6 @@
1520
# sqlmodel is only an optional dependency when fundamend is used to fill a database
1621
raise
1722

18-
import uuid
19-
from datetime import date
20-
from uuid import UUID
2123

2224
from fundamend.models.anwendungshandbuch import Anwendungsfall as PydanticAnwendungsfall
2325
from fundamend.models.anwendungshandbuch import Anwendungshandbuch as PydanticAnwendungshandbuch
@@ -482,8 +484,21 @@ class Anwendungshandbuch(SQLModel, table=True):
482484
# das Veröffentlichungsdatum. Die Informationen darf man sich schön aus der mehr schlecht als recht gepflegten API
483485
# von bdew-mako.de rauskratzen. Sie sind aber nützlich um mehrere Versionen des AHBs in einer DB zu speichern.
484486
# Daher hier als SQLModel-Attribute ohne Entsprechung im XML/rohen Original-Datenmodell.
485-
gueltig_von: Optional[date] = Field(default=None, index=True) #: inklusives Startdatum (Deutsche Zeitzone)
486-
gueltig_bis: Optional[date] = Field(default=None, index=True) #: ggf. exklusives Enddatum (Deutsche Zeitzone)
487+
gueltig_von: Optional[date] = Field(default=None, index=True)
488+
"""
489+
inklusives Startdatum der Gültigkeit dieses AHBs (Deutsche Zeitzone)
490+
"""
491+
gueltig_bis: Optional[date] = Field(default=None, index=True)
492+
"""
493+
Ggf. exklusives Enddatum der Gültigkeit dieses AHBs (Deutsche Zeitzone).
494+
Wir verwenden None für ein offenes Ende, nicht 9999-12-31.
495+
"""
496+
edifact_format_version: Optional[EdifactFormatVersion] = Field(default=None, index=True)
497+
"""
498+
efoli format version (note that this is not derived from the gueltig von/bis dates but has to be set explicitly).
499+
It's also not a computed column although technically this might have been possible.
500+
For details about the type check the documentation of the EdifactFormatVersion enum from the efoli package.
501+
"""
487502

488503
@classmethod
489504
def from_model(cls, model: PydanticAnwendungshandbuch) -> "Anwendungshandbuch":
@@ -494,7 +509,7 @@ def from_model(cls, model: PydanticAnwendungshandbuch) -> "Anwendungshandbuch":
494509
bedingungen=[Bedingung.from_model(x) for x in model.bedingungen],
495510
ub_bedingungen=[UbBedingung.from_model(x) for x in model.ub_bedingungen],
496511
pakete=[Paket.from_model(x) for x in model.pakete],
497-
anwendungsfaelle=[Anwendungsfall.from_model(x) for x in model.anwendungsfaelle],
512+
anwendungsfaelle=[Anwendungsfall.from_model(x) for x in model.anwendungsfaelle if not x.is_outdated],
498513
)
499514

500515
def to_model(self) -> PydanticAnwendungshandbuch:
@@ -507,3 +522,71 @@ def to_model(self) -> PydanticAnwendungshandbuch:
507522
pakete=tuple(x.to_model() for x in sorted(self.pakete, key=lambda y: y.position or 0)),
508523
anwendungsfaelle=tuple(x.to_model() for x in sorted(self.anwendungsfaelle, key=lambda y: y.position or 0)),
509524
)
525+
526+
527+
class AhbHierarchyMaterialized(SQLModel, table=True):
528+
"""
529+
A materialized flattened AHB hierarchy containing segment groups, segments, data elements, codes,
530+
and enriched with metadata like format, versionsnummer, and prüfidentifikator.
531+
This table is not thought to be written to, but only read from.
532+
It is created once after all other tables have been filled by the create_ahb_view function in ahbview.py.
533+
"""
534+
535+
__tablename__ = "ahb_hierarchy_materialized"
536+
id: UUID = Field(default_factory=uuid.uuid4, primary_key=True)
537+
anwendungsfall_pk: UUID = Field(index=True)
538+
current_id: UUID
539+
root_id: UUID
540+
parent_id: Optional[UUID] = None
541+
depth: int
542+
position: Optional[int] = Field(default=None)
543+
path: str
544+
id_path: str
545+
parent_path: str
546+
root_order: int
547+
type: str = Field(index=True)
548+
source_id: UUID
549+
sort_path: str = Field(index=True)
550+
551+
# Metadata
552+
pruefidentifikator: str = Field(index=True)
553+
format: str = Field(index=True)
554+
versionsnummer: str = Field(index=True)
555+
gueltig_von: Optional[date] = Field(default=None, index=True)
556+
gueltig_bis: Optional[date] = Field(default=None, index=True)
557+
kommunikation_von: Optional[str] = Field(default=None, index=True)
558+
beschreibung: Optional[str] = Field(default=None, index=True)
559+
edifact_format_version: Optional[EdifactFormatVersion] = Field(default=None, index=True)
560+
561+
# Segment Group
562+
segmentgroup_id: Optional[str] = Field(default=None, index=True)
563+
segmentgroup_name: Optional[str] = Field(default=None, index=True)
564+
segmentgroup_ahb_status: Optional[str] = Field(default=None)
565+
segmentgroup_position: Optional[int] = Field(default=None, index=True)
566+
segmentgroup_anwendungsfall_primary_key: Optional[UUID] = Field(default=None)
567+
568+
# Segment
569+
segment_id: Optional[str] = Field(default=None, index=True)
570+
segment_name: Optional[str] = Field(default=None, index=True)
571+
segment_number: Optional[str] = Field(default=None, index=True)
572+
segment_ahb_status: Optional[str] = Field(default=None)
573+
segment_position: Optional[int] = Field(default=None, index=True)
574+
575+
# Data Element Group
576+
dataelementgroup_id: Optional[str] = Field(default=None, index=True)
577+
dataelementgroup_name: Optional[str] = Field(default=None, index=True)
578+
dataelementgroup_position: Optional[int] = Field(default=None, index=True)
579+
580+
# Data Element
581+
dataelement_id: Optional[str] = Field(default=None, index=True)
582+
dataelement_name: Optional[str] = Field(default=None, index=True)
583+
dataelement_position: Optional[int] = Field(default=None, index=True)
584+
dataelement_ahb_status: Optional[str] = Field(default=None, index=True)
585+
586+
# Code
587+
code_id: Optional[UUID] = Field(default=None, index=True)
588+
code_name: Optional[str] = Field(default=None, index=True)
589+
code_description: Optional[str] = Field(default=None, index=True)
590+
code_value: Optional[str] = Field(default=None, index=True)
591+
code_ahb_status: Optional[str] = Field(default=None, index=True)
592+
code_position: Optional[int] = Field(default=None, index=True)

0 commit comments

Comments
 (0)