-
Notifications
You must be signed in to change notification settings - Fork 0
dft street manager spike 0004 data model
Data for Street Manager can sometimes be called NSWRA (after New Roads and Street Works act, 1991, see here).
The best data around volumes DfT have supplied suggest ~2.5 million works in England, for the purpose of NFRs we are assuming larger than that. Additionally this does not take variations (Promoter/HA changes) and phases (additional sites related to works) into account, which will increase the amount of data needed around a work.
Due to the transactional nature of the data being captured and the need to audit/report/notify about events that take place for works, we will be using an Event Model approach (see here). This means our data will be split into two main types; modelled application data which is opinionated and mutable, atomic event data which is unopinionated and immutable.
All database schema should follow the same naming convention based on common PostGres standards. See here for more detail and here for official documentation on identifiers.
Overview:
- Table and column names should lowercase and
_
word separation, e.g.person
,person_id
- Table names should be singular
- Id columns should include table name (e.g.
work_id
notid
), so when included as foreign keys or queries are unique and identifiable - In queries SQL key words should be capitalised for easy reading, e.g.
SELECT * from person
- Reference tables should follow a consistent pattern of columns, e.g.
party_type_id
,name
,enabled
This is the draft data model which will capture changes to the moving application data, which will feed the event data.
This is a high level proposed data model, subject to change as our knowledge improves.
Main entities:
- Work - main data entity storing data about street works
- Inspection - details about inspections carried out against works
- Penalty - details about penalties and fines against works
- Party - links person/organisation details to works/other entities using the Universal Person and Organization Data Model
- User - user details linked to permissions, entities and audit events
- Configuration - slow moving configuration data (date ranges for fines/inspections etc.)
Link to known work data fields here.
NOTE: It is important that the Work Reference Number is not updated as it is the common identifier that Promoters are legally required to display on site about the work.
See here.
To prevent users from accidentally overwriting each others changes when viewing the same work record, all updates to moving data tables should be performed in a transaction which first validates that the row hasn't been updated since it was retrieved by the user.
This requires that all moving data tables include columns:
- date_created - timestamp row was created
- date_modified - timestamp used to validate update
- modified_by_user_id - user id of last user to update/create
Works are subject to updates and changes by various parties (Planners/HAs/contractors etc.) through their lifetime. It is important that these changes are recorded so that all parties know who changed what and when. This data will be used in history/audit trail, reporting and notifications.
Every significant change to a work, major entity or auditable event should cause an immutable event row to be inserted. For example, creating a new work or modifying an existing work. These events should be inserted at the same time as the update in a single transaction. The event should record the event type, timestamp, id of the entity updated, reference, user name/id and any extra important relevant data. The schema of the event tables should be designed for ease of understanding and querying by external parties for reporting, with the full data included in JSON columns (jsonb for indexing) to allow more complex queries and avoid issues with modelled data schema changes.
As works are the primary item of interest, we will store event data for works separately from other events. This will allow us to include more work specific columns in the event schema with indexes to make it easier to query, such as geospacial data. Works marked as commercially sensitive or secure may be redacted so the full event data is not exposed.
- work_event - all work changes and updates, flattened with full current works data stored in a JSON column and also important works columns as indexed columns (TBD)
- party_event - all party changes and updates
- audit_event - all other audit events, such as audited page views or changes
Street Manager C4 reporting archive system containers
By replicating the main street manager system database into a separate database we can allow reporting users direct read only access to the event data (restricted to appropriate tables via permissions) via SQL.
Using a separate database means that reporting load cannot affect the live system.
To allow LHAs and Utilities to monitor specific works or locations, we will expose a simple atom feed API publicly, so they can poll for updates against specific criteria, such as works reference.
Currently assuming this can be carried out against the Reporting database, but if this causes too much load we may need a separate subscriptions database.
To reduce the amount of data exposed in the system, event data should be archived to a separate database after a set period of time (set by DfT, e.g. 6 years). Modelled data should be deleted as event data should capture all relevant details. This should applied to both the main Street Manager database and reporting database. A file archiving process should also apply to file attachments.
To apply schema changes to databases during production and live support we will use a change management framework. This will allow databases to be maintained without destroying live data.
Main options for change management frameworks are:
Database access will be controlled by creating a series of users specific to each application or external party, with permissions limited to the minimum necessary for them to function, following the Principle of least privilege. This includes change management, so only the CI performing the release has permissions to modify database schema.
The system will require access to external data such as:
- AddressBase - for postcode lookup
- OS Mastermap data - tiling service data for maps
- National Street Gadgeteer (NSG) - Official records on streets and assets
- Local Highway Authorities (LHAs) - Local authority records on noticing officers, road restrictions etc.
- Legislative Reference data - a lot of reference data in the system is controlled by the NSWRA act 1991
This information will be supplied by third parties as dumps (e.g. csv, or SQL) and will need to be included in our database for use in mapping and backend logic. Scripts and transformations will need to be created to manage the regular updates of this data.
- Should we put commercially sensitive works data in event data?
- [SA] - assuming event data will be stored as normal with commercially sensitive flag but served selectively from APIs to ensure no one other than LHA/Promoter can see private details
- Do we need to create and maintain a semantic layer for reporting users or is the flattened event data enough?
- [SA] - for Beta, no