dft street manager doc 0002 solution architecture

Solution Architecture

Author(s) - Alistair Cowan, Phil Allen, Steven Alexander

Introduction

Street Manager is a centralised system for collecting and processing street work information, used by Promoters (utility companies) and Local Authorities.

The vision of the project is:

To transform the planning, management and communication of street works through open data and intelligent services to minimise disruption and improve journeys for the public.

Audience

This document is aimed at Product Delivery, Software Architects, Delivery and Operations teams.

Architectural design approach

In this document, we will describe the system from different viewpoints so that each member of the delivery/operations team have a shared understanding of the system.

We are using a cross functional agile approach to delivery, so all functions of the team are represented (dev/test/operations).

We are taking a DevOps approach to implementation, including fully automated testing, deployment and security testing. Teams are responsible for pushing their code from development to live, taking all non-functional requirements into consideration when starting and completing a story.

Functional view

We use the C4 approach to illustrating the overall system architecture. See draw.io for the source of diagrams below.

System context

draw.io diagram

NOTES:

Assuming GOV.UK Notify for email/sms/post notifications.
NSG data will be loaded from regular data dumps from GeoPlace which will be imported into the application database. The data will be used in mapping and business logic, initially directly as data but may be separated into another service later if necessary.

TODO DRAFT: Phil to review

System Containers

Street Manager system

Street Manager system containers

draw.io diagram

Services and responsibilities

Undertaker web

Separate front end for serving undertaker HTML requests so that it can handle load and scale independently.

Requires common elements: GDS styles, mapping Javascript, security filters.

Local Highways Authority web

Separate front end for serving undertaker HTML requests so that it can handle load and scale independently.

Requires common elements: GDS styles, mapping Javascript, security filters.

Mapping Server

TODO

Work API

API for handling all updates to works data, persisting into database. Works data will use an event data model approach, so all updates to the works will be recorded as events.

Requires common elements: database change management, security filters, API documentation.

Party API

API for handling all updates to Person and Organisation data, persisting into database. Data will be modelled based on the Universal Person and Organization Data Model approach. Separate to scale and manage independently as other systems may require party details, such as authentication and registration.

Requires common elements: database change management, security filters, API documentation.

Task API

API for adding/checking tasks. Tasks are regularly scheduled, fragile external calls to integration points or long running jobs required by the system. They should record their status (created, in progress, completed, failed) and capable of re-running in case of failure. Only internal components should call the Task API.

Requires common elements: database change management, API documentation.

Questions:

Mapping Server questions:
- Do we need one? SA - "Assuming yes"
- Do we need to expose WMS layers publicly? Phil - "Public net - yes. Call to WMS/WFS could be from client-side tool. Unlike for RESTful APIs, which will always be server with a cert (and TBD white-listed IP address). We need a scenario about securing calls to WMS/WFS."
- How do we authenticate requests? SA - "GeoServer/MapServer supports using authentication and if necessary we can put a whitelisting load balancer infront of it"
- GeoServer vs MapServer SA - "Assuming GeoServer"
Do we need to expose the Work API? SA - "Assuming all UI components need API"
Should we design for handling EToN messages now? SA - "Assuming no"
Do we need a Task queue or can we just use DB with worker approach? SA - "Assuming simple solution for Solution Architecture, queue for estimation"

TODO DRAFT: Phil to review

Scenarios

Scenarios show how the components of the solution collaborate on key behaviours of the solution.

A planner enters forward planning details of a work via works planning tool TODO Steven

A planner enters forward planning details of a work in the UI TODO Steven

External mapping system

Street Manager provides two ways of accessing mapping data. Users may use either of the Street Manager websites. Or users may use their local mapping tools, accessing Street Manager via the standard WFS and WMS protocols. Street Manager uses separate resources to address these needs.

Street Manager GIS containers

draw.io diagram

The left hand side of the figure shows the stack for users' own mapping tools. Note that the mapping tool's WFS and WMS requests include a valid basic authorisation header. TBD: management of basic auth credentials.

The Mapping Gateway intercepts WFS / WMS calls from the user mapping tool. It provides TLS termination and simple IP load balancing. TBD: it may also check authorisation headers. TBD: the gateway will either be implemented by nginx deployed as part of the application, or by a cloud service.
GeoServer is a standard component that is configured for particular data sources. It is a Java application. It will be containerised. TBD: GeoServer may additionally check for a valid basic auth header. TBD: monitoring; at very least, we have JMX support.
We use a read only replica to keep the GeoServer workload away from our master DB. For the same reason, replication will be asynchronous.

The right hand side of the figure shows the stack for the SM web interface. Note that the browser already has a valid login session and has already loaded a Street Manager mapping page.

The browser triggers JavaScript according to user actions, such as panning the map and selecting / deselecting layers. The bespoke JavaScript makes calls to the RESTful Works API, including the session token in the header.
These calls are intercepted by the Gateway, which terminates TLS, validates the session token (redirecting if invalid) and passes the call through to Works API
Works API returns resources that include GeoJSON to represent works geometry.

Reporting and archive system

Street Manager reporting and archive system containers

draw.io diagram

TODO DRAFT: Phil to review

Authentication and Registration system

TODO Steven - need detail on managed authentication solutions available

Information view

Information principles

Full details on the Data model and standards approach are documented here.

SM will enforce organisation-level access policies on a single DB instance.
SM will support messages (e.g. permit requests) in draft state and potentially not passing all validation rules, for UI access, only. The API made available to promoters will support only messages in their final state, fully valid.
SM will preserve all successfully created messages between promoters and authorities immutably: requests, refusals, variations, and so on. Shared entities with mutable state will be summaries of these primitive messages. To that extent, the SM information model will be event-orientated.
SM will support occurence times that precede insertion (capture) times, allowing SM to "catch up" during DR without losing time information.

Reporting and Analysis

See the Reporting and archive system containers diagram for details and here for implementation details.

In Street Works industry parlance, reporting includes any operational queries where the user needs to export results for wider distribution.

Analysis includes aggregate queries, whether for wider distribution.

Intially, SM will provide a SQL endpoint to a read replica for DfT analytical use. DfT statisticians will access this via VPN from their office network.

Deployment View

Overview of build pipeline, quality gates etc. will be outlined here, these will be standard across the project.

CI/CD Principles

TODO Ali

Branching Strategy

The project will use a feature branch workflow:

Developers work on feature branches until their work is thoroughly tested and is signed off by the product owner.
They produce a squashed commit that combines all the changes on that branch
Merging to master entails commitment to release order. The team tests merge commits and tags them as release candidates
Merging a feature to master means that other features that are in-flight will have to rebase their code
Fixes are treated as urgent features. That is, other in-flight features should wait on branch for the fix to be merged to master so that the rebasing overhead falls on them rather than on the urgent fix

See here for a tutorial.

This model is based of the "DVSA MOT Code Workflow - 27/02/2017" whitepaper.

Workflow

TODO diagram Ali

TODO description of gates Ali

Build pipeline

TODO diagram Ali

Operations view

Web analytics

SM will integrate with a web analytics service, and will not deploy a self-hosted solution. On grounds of cost-effectiveness, the most likely candidate is Google Analytics Pro. In that case, SM would be what is termed a property. It may sit within a DfT account, or another government account.

The selected tool will conform to the W3C CEDDL standard and we will integrate with our web tier on that basis. This will mitigate lock-in and will keep open options for future tag management by DfT staff who do not have development skills.

TODO - Steven review

Integrating with GOV.UK performance platform

TODO - Steven

Monitoring and Logging

TODO overview of how we will monitor the components: Steven

System monitoring

TODO - Alistair

Application logging

TODO - Alistair/Steven

Application monitoring

TODO - Alistair/Steven

Development principles

Development architecture overview

System context

draw.io diagram

Technology Stack

Node (v8.x.x, latest LTS) - JavaScript language for server side web and api logic
Express - Node web framework
OpenLayers - JavaScript mapping framework
PostGres with PostGIS extensions - Relational database with GIS functions

Rationale:

The application will require significant client side JavaScript so using NodeJS for web/api logic means a single consistent language for the application with good support for including GDS styles in the application
Express is the most common and flexible web framework for Node
OpenLayers is a mature JavaScript mapping library and existing GOV.UK solutions have passed Alpha assessment using it (Land Registry LLC)
PostGres scales extremely well, has good managed RDB support in hosting providers and mature GIS extensions

Useful links:

API design and documentation

General guidelines for RESTful URLs

A URL identifies a resource. (A resource is a representation of some part of a business or problem domain)
URLs should include nouns, not verbs.
Use plural nouns only for consistency (no singular nouns).
Use HTTP verbs (GET, POST, PUT, DELETE) to operate on the collections and elements.
You shouldn’t need to go deeper than resource/identifier/resource.
Put the version number at the base of your URL, for example http://example.com/1.0/path/to/resource.
URL v. header:
- If it changes the logic you write to handle the response, put it in the URL.
- If it doesn’t change the logic for each response, like OAuth info, put it in the header.
Specify optional fields in a comma separated list.
Formats should be in the form of api/v2/resource/{id}.json
For resources consisting of multiple words use this form api/accounts/mortgage-invoices/1234 with hyphens and all lowercase rather than camelCase

Good URL examples

List of magazines: GET http://www.example.org/api/1.0/magazines.json
Filtering is a query: GET http://www.example.org/api/1.0/magazines.json?year=2011&sort=desc GET http://www.example.org/api/1.0/magazines.json?topic=economy&year=2011
A single magazine: GET http://www.example.org/api/1.0/magazines/1234
All articles in (or belonging to) this magazine: GET http://www.example.org/api/1.0/magazines/1234/articles
All articles in this magazine in XML format: GET http://example.org/api/1.0/magazines/1234/articles
Specify optional fields in a comma separated list: GET http://www.example.org/api/1.0/magazines/1234?fields=title,subtitle,date
Add a new article to a particular magazine: POST http://example.org/api/1.0/magazines/1234/articles

Bad URL examples

Non-plural noun: http://www.example.org/api/magazine http://www.example.org/api/magazine/1234 http://www.example.org/api/publisher/magazine/1234
Verb in URL: http://www.example.org/api/magazine/1234/create
Filter outside of query string http://www.example.org/api/magazines/2011/desc

Content types

Where supported, content which can be returned in multiple types should be negotiated using the 'Accept' HTTP header, e.g.

GET http://example.org/api/magazine/1234
Accept: text/json

Should return a JSON representation of the resource.

GET http://example.org/api/magazine/1234
Accept: application/xml

Should return an XML representation of the resource.

HTTP Verbs

HTTP verbs, or methods, should be used in compliance with their definitions under the HTTP/1.1 standard. The action taken on the representation will be contextual to the media type being worked on and its current state. Here's an example of how HTTP verbs map to create, read, update, delete operations in a particular context:

GET: Obtains either a list of items of a single item and never changes data.
- This should always be a safe operation, that is, a GET does not affect the data it acts on.
POST: Creates an object: inputs should be in the request body, not the url. The response body should then include the created object.
PUT: Updates data
DELETE: Deletes data

Response

No values in keys
No internal-specific names (e.g. "node" and "UserAuthBaseImpl") - values returned should meaningfully describe the objects involved
Metadata should only contain direct properties of the response set, not properties of the members of the response set

Good examples

No values in keys:

  "Organisations": [
    {
      "id": "125",
      "name": "Farms Ltd"
    }, {
      "id": "834",
      "name": "Pigs and Chickens"
    }
  ]

Bad examples

Values in keys:

  "tags": [
    {"125": "Farms Ltd"}, 
    {"834": "Pigs and Chickens"}
  ]

HTTP Status Codes

Generally, HTTP status codes can be used to infer the result of the operation using this guide: 2XX = OK, 3XX = moved, 4XX = client error, 5XX = server error.

When defining an API endpoint, be clear in the documentation supplied as to which status codes your resource can return and in which situations. Bear in mind that many clients' default behaviour is to treat any non-200 status code as an error, so while 201 and 204 may be relevant for object creation and deletion, ensure this is clearly annotated for third party clients.

Look at http://httpstatus.es for succinct descriptions of the various HTTP status codes.

It is important that you use the correct classification of response. Remember if a dependency fails, let you consumer know that was what happened, don't just return blank 500 responses for all faults.

Error Handling

Error responses should include the relevant HTTP status code, and a meaningful message in the case of an exception. Avoid exposing implementation details or other data in the error message.

For example, if a request is sent to a POST endpoint that does not have the correct JSON format, an appropriate response should be like:

Status: 400 (BAD_REQUEST)
Response Body:
{
  error: "unable to parse request body as JSON"
}

Try and use the most relevant status code rather than just returning 400 for all requests that failed. For example, a request that fails syntactic validation (can't parse as JSON) should return a 400 BAD_REQUEST but a request that fails semantic validation (future date of birth supplied) should return a 422 UNPROCESSABLE_ENTITY.

Versions

Major changes and product releases should coincide with a major version revision, and minor changes and updates should coincide with a minor version revision. API calls made without the version number included will default to accessing the latest version of the API e.g. if version 2.0 is the latest API version: http://example.org/api/resource/123 will access resource/123 on version 2.0 of the API http://example.org/api/1.1/resource/123 will access resource/123 on version 1.1 of the API Activity monitoring should be used to determine when to decommission an API. When an old version of the API has not been used at all for a significant period of time it should be removed. Note: for now all APIs will be versioned at 1.0 only

Record limits

If no limit is specified, return results with a default limit (this should be set to 20 and defined on the API documentation). To get records 51 through 75 do this: http://example.org/api/magazines?limit=25&offset=50 offset=50 means, ‘skip the first 50 records’ limit=25 means, ‘return a maximum of 25 records’ Ensure that paging/limiting parameters are consistently named across the API, e.g. "limit" vs "pageSize", "offset" vs "startAt". Information about record limits and total available count should also be included in the response. Example:

{
    "metadata": {
        "resultset": {
            "count": 227,
            "offset": 25,
            "limit": 25
        }
    },
    "results": [ ]
}

Health checks

All web and api components should support a /healthcheck and /status endpoint which will return 200 if component is in health state. The health check should check dependencies (e.g. try SELECT 1 from database or call /status on API dependencies) and the status should just return 200 if component is up. This gives a consistent way to check instances are available and healthy that can be used as part of monitoring and CI.

See here for details.

Useful links

Definition of done

The definition of done is defined here.

Security overview

Data flow

TODO data flow diagram Steven/Ali/Phil

Authentication and authorisation

TODO diagram Steven/Ali/Phil

TODO Registration Phil

TODO critical national infrastructure Phil

Connection between components

TODO Steven/Phil

Database user access

Each component which accesses a database should do so with specific user credentials with permissions set following the principle of least privilege. Database changes/migrations should be performed using separate credentials as part of the deployment process, so not exposed in the running application.

Common types of attacks

The system should have prevention/countermeasures for attacks described in the OWASP top 10.

TODO list specific countermeasures: Steven

A1:2017-Injection -
A2:2017-Broken Authentication -
A3:2017-Sensitive Data Exposure -
A4:2017-XML External Entities (XXE) -
A5:2017-Broken Access Control -
A6:2017-Security Misconfiguration -
A7:2017-Cross-Site Scripting (XSS) -
A8:2017-Insecure Deserialization -
A9:2017-Using Components with Known Vulnerabilities -
A10:2017-Insufficient Logging & Monitoring - See Monitoring and Logging details.

Testing strategy

The test strategy is documented here.

Technical overview of Alpha

The technical overview of Alpha is documented here.

dft street manager doc 0002 solution architecture

Solution Architecture

Author(s) - Alistair Cowan, Phil Allen, Steven Alexander

Table of contents

Introduction

Audience

Architectural design approach

Functional view

System context

System Containers

Street Manager system

Services and responsibilities

Scenarios

External mapping system

Reporting and archive system

Authentication and Registration system

Information view

Information principles

Reporting and Analysis

Deployment View

CI/CD Principles

Branching Strategy

Workflow

Build pipeline

Operations view

Web analytics

Integrating with GOV.UK performance platform

Monitoring and Logging

System monitoring

Application logging

Application monitoring

Development principles

Development architecture overview

Technology Stack

API design and documentation

General guidelines for RESTful URLs

Good URL examples

Bad URL examples

Content types

HTTP Verbs

Response

Good examples

Bad examples

HTTP Status Codes

Error Handling

Versions

Record limits

Health checks

Useful links

Definition of done

Security overview

Data flow

Authentication and authorisation

Connection between components

Database user access

Common types of attacks

Testing strategy

Technical overview of Alpha

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!