Skip to content

dft street manager spike 0004 data model

Steven Alexander edited this page May 15, 2018 · 19 revisions

Data model and standards

Data for Street Manager can sometimes be called NSWRA (after New Roads and Street Works act, 1991, see here).

The best data around volumes DfT have supplied suggest ~2.5 million works in England, for the purpose of NFRs we are assuming larger than that. Additionally this does not take variations (Promoter/HA changes) and phases (additional sites related to works) into account, which will increase the amount of data needed around a work.

Due to the transactional nature of the data being captured and the need to audit/report/notify about events that take place for works, we will be using an Event Model approach (see here). This means our data will be split into two main types; modelled application data which is opinionated and mutable, atomic event data which is unopinionated and immutable.

Database naming standards and conventions

All database schema should follow the same naming convention based on common PostGres standards. See here for more detail and here for official documentation on identifiers.

Overview:

  • Table and column names should lowercase and _ word separation, e.g. person, person_id
  • Table names should be singular
  • Id columns should include table name (e.g. work_id not id), so when included as foreign keys or queries are unique and identifiable
  • In queries SQL key words should be capitalised for easy reading, e.g. SELECT * from person
  • Reference tables should follow a consistent pattern of columns, e.g. party_type_id, name, enabled

Modelled data model

This is the draft data model which will capture changes to the moving application data, which will feed the event data.

Street Manager high level data model

draw.io diagram

This is a high level proposed data model, subject to change as our knowledge improves.

Main entities:

  • Work - main data entity storing data about street works
  • Inspection - details about inspections carried out against works
  • Penalty - details about penalties and fines against works
  • Party - links person/organisation details to works/other entities using the Universal Person and Organization Data Model
  • User - user details linked to permissions, entities and audit events
  • Configuration - slow moving configuration data (date ranges for fines/inspections etc.)

Link to known work data fields here.

NOTE: It is important that the Work Reference Number is not updated as it is the common identifier that Promoters are legally required to display on site about the work.

Optimistic concurrency control

See here.

To prevent users from accidentally overwriting each others changes when viewing the same work record, all updates to moving data tables should be performed in a transaction which first validates that the row hasn't been updated since it was retrieved by the user.

This requires that all moving data tables include columns:

  • date_created - timestamp row was created
  • date_modified - timestamp used to validate update
  • modified_by_user_id - user id of last user to update/create

Event data model

Works are subject to updates and changes by various parties (Planners/HAs/contractors etc.) through their lifetime. It is important that these changes are recorded so that all parties know who changed what and when. This data will be used in history/audit trail, reporting and notifications.

Every significant change to a work, major entity or auditable event should cause an immutable event row to be inserted. For example, creating a new work or modifying an existing work. These events should be inserted at the same time as the update in a single transaction. The event should record the event type, timestamp, id of the entity updated, reference, user name/id and any extra important relevant data. The schema of the event tables should be designed for ease of understanding and querying by external parties for reporting, with the full data included in JSON columns (jsonb for indexing) to allow more complex queries and avoid issues with modelled data schema changes.

As works are the primary item of interest, we will store event data for works separately from other events. This will allow us to include more work specific columns in the event schema with indexes to make it easier to query, such as geospacial data. Works marked as commercially sensitive or secure may be redacted so the full event data is not exposed.

  • work_event - all work changes and updates, flattened with full current works data stored in a JSON column and also important works columns as indexed columns (TBD)
  • party_event - all party changes and updates
  • audit_event - all other audit events, such as audited page views or changes

Architecture

Street Manager C4 reporting archive system containers

Street Manager C4 reporting archive system containers

Reporting

By replicating the main street manager system database into a separate database we can allow reporting users direct read only access to the event data (restricted to appropriate tables via permissions) via SQL.

Using a separate database means that reporting load cannot affect the live system.

Subscription

To allow LHAs and Utilities to monitor specific works or locations, we will expose a simple atom feed API publicly, so they can poll for updates against specific criteria, such as works reference.

Currently assuming this can be carried out against the Reporting database, but if this causes too much load we may need a separate subscriptions database.

Archiving

To reduce the amount of data exposed in the system, event data should be archived to a separate database after a set period of time (set by DfT, e.g. 6 years). Modelled data should be deleted as event data should capture all relevant details. This should applied to both the main Street Manager database and reporting database. A file archiving process should also apply to file attachments.

Database change management

To apply schema changes to databases during production and live support we will use a change management framework. This will allow databases to be maintained without destroying live data.

Main options for change management frameworks are:

Permissions and restrictions

Database access will be controlled by creating a series of users specific to each application or external party, with permissions limited to the minimum necessary for them to function, following the Principle of least privilege. This includes change management, so only the CI performing the release has permissions to modify database schema.

External data

The system will require access to external data, supplied by third parties as dumps (e.g. csv, or SQL) and will need to be included in our database for use in mapping and backend logic. Scripts and transformations will need to be created to manage the regular updates of this data.

Each functional area needing external data should be have a number of stories created to manage them e.g.:

  • Dev spike on how to consume the raw data from the external source
  • OPs spike on how to get/receive the updated data from the external source (e.g. FTP) and release regular updates
  • Story to provide the functionality needing the external data (e.g. address search on a map), should include an automated way to test a release of the data is valid
  • OPs story to setup the infrastructure necessary for the using the external data (linked to story above)

AddressBase

Address data covering the UK, provided as CSV and GML. Updated every six weeks.

https://www.ordnancesurvey.co.uk/business-and-government/products/addressbase-products.html

Product offered by OS, available for free to all public sector orgs as part of PSMA.

There are different tiers available:

  • AddressBase - basic version, has UPRN, PAF Royal Mail address data, Northings/Eastings and metadata (TODO what does Classification mean?)
  • AddressBase Plus - all from AddressBase, with Local Authority Current Address and available to cross ref with third party (TODO what does this mean?), includes Islands data
  • AddressBase Premium - all from Plus, with Alternative Addresses and Lifecyle data
  • AddressBase Islands - separate dataset for NI and Isle of Man

Most likely we would need the AddressBase Plus dataset, provided to DfT for free under PSMA.

Samples are available per tier here.

CodePoint

Postcode location data, provided as CSV and map shapefiles (in polygons version). Gives location and bounds for Postcodes (not full address), but also extra data like NHS® health authority codes, administrative codes, PO box indicator. Marketed as a cut down AddressBase for more simple analytics.

https://www.ordnancesurvey.co.uk/business-and-government/products/code-point.html

Product offered by OS, available for free to all public sector orgs as part of PSMA.

There are different tiers available:

  • Code-Point Open - free tier, just postcodes with Eastings and Northings
  • Code-Point - Includes Northern Ireland, Local Gov codes, NHS codes, PO box
  • Code-Point with Polygons - all plus map shapes of postcode areas

Unless all we need to search on is Postcode, this is of limited use to us. If we need to search for addresses AddressBase already has the Postcode data.

Samples are available per tier and Code-Point Open can be downloaded for free, here.

OS Mastermap Highways

OS product giving data on the entire UK road network, most complete data set of data on UK roads available. OS MasterMap Highways Network is published once a month, on the first working day of every month. It is in GML 3.2.1 format, PSMA users can download the data.

https://www.ordnancesurvey.co.uk/business-and-government/products/os-mastermap-highways-network-products.html

Product offered by OS, available for free to all public sector orgs as part of PSMA. NOTE While it is shared, OS do not allow the data to be shared out publicly, so we have to be careful how we expose the data and be sure no one can crawl the service to get the MasterMap details. Most of the LHAs and Utilities already have agreements with OS to access this for their own GIS solutions.

Versions:

  • Roads
  • Routing and Asset Management
  • Paths

We will need the MasterMap data, both Roads/Path, to get the road geometry, USRNs and Highway Authority data for the selected location of the work (TODO how do we relate this?). Some of the data from the MasterMap should be included in the base map provided by the tiling service (i.e. ViaEuropa includes OS MM data in layers).

Samples are available here, there is a GitHub repo here detailing the schema and the technical spec is here.

NSG - National Street Gazetteer

GeoPlace maintained dataset on English/Welsh streets (street names/locations), LHAs feed data into the NSG from their own LSG (Local Street Gazetteer). This data is given to OS and used to make AddressBase and update the MasterMap. Comes in DTF (see spec) and map SHP format. It is published each month on the second day of the month.

The NSG has three parts:

  • Street information including street geometry
  • Additional Street Data (ASD)
  • Operational District File compliant with the most current EToN Technical Specification

There is also a related dataset SWA codes, Street Works Authorities.

Local Highway Authorities (LHAs) - Local authority records on noticing officers, road restrictions etc.

Data to be used in Street Manager for admin, user registration and configuration of the service for LHAs. It's likely this will need to be manually entered by LHA admins for any custom configuration required.

Legislative Reference data

Much of the reference data in the application needs to be taken from legislation, such as the NRSWA 1991. This data will need to be identified in the Work fields and used to validate data entry.

Outstanding questions

  • Should we put commercially sensitive works data in event data?
    • [SA] - assuming event data will be stored as normal with commercially sensitive flag but served selectively from APIs to ensure no one other than LHA/Promoter can see private details
  • Do we need to create and maintain a semantic layer for reporting users or is the flattened event data enough?
    • [SA] - for Beta, no
Clone this wiki locally