Skip to content

Commit 697b188

Browse files
committed
Make the data cubes definition more clear
1 parent 50b5bad commit 697b188

File tree

1 file changed

+24
-7
lines changed

1 file changed

+24
-7
lines changed

documentation/1.0/datacubes.md

Lines changed: 24 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -27,6 +27,14 @@ A vector datacube on the other hand could look like this:
2727
A raster datacube has at least two spatial dimensions (usually named `x` and `y`) and a vector datacube has at least one geometry dimension (usually named `geometry`).
2828
The purpose of these distinctions is simply to make it easier to describe "special" cases of datacubes, but you can also define other types such as a temporal datacube that has at least one temporal dimension (usually named `t`).
2929

30+
The following additional information are usually available for datacubes:
31+
32+
- the dimensions (see [below](#dimensions))
33+
- a sampling method (see [below](#values-in-a-data-cube))
34+
- a unit for the values
35+
36+
All these information are usually provided through the datacube metadata.
37+
3038
## Dimensions
3139

3240
A dimension refers to a certain axis of a datacube. This includes all variables (e.g. bands), which are represented as dimensions. Our exemplary raster datacube has the spatial dimensions `x` and `y`, and the temporal dimension `t`. Furthermore, it has a `bands` dimension, extending into the realm of _what kind of information_ is contained in the cube.
@@ -39,9 +47,11 @@ The following properties are usually available for dimensions:
3947
* labels (usually exposed through textual or numerical representations, in the metadata as nominal values and/or extents)
4048
* reference system / projection
4149
* resolution / step size
42-
* unit (either explicitly specified or implicitly given by the reference system)
50+
* unit for the labels (either explicitly specified or implicitly provided by the reference system)
4351
* additional information specific to the dimension type (e.g. the geometry types for a dimension containing geometries)
4452

53+
All these information are usually provided through the datacube metadata.
54+
4555
Here is an overview of the dimensions contained in our example raster datacube above:
4656

4757
| # | name | type | labels | resolution | reference system |
@@ -66,12 +76,6 @@ A dimension with geometries can consist of points, linestrings, polygons, multi
6676
It is not possible to mix geometry types, but the single geometry type with their corresponding multi type can be combined in a dimension (e.g. points and multi points).
6777
Empty geometries (such as GeoJSON features with a `null` geometry or GeoJSON geometries with an empty coordinates array) are allowed and can sometimes also be the result of certain vector operations such as a negative buffer.
6878

69-
openEO datacubes contain scalar values (e.g. strings, numbers or boolean values), with all other associated attributes stored in dimensions (e.g. coordinates or timestamps). Attributes such as the CRS or the sensor can also be turned into dimensions. Be advised that in such a case, the uniqueness of pixel coordinates may be affected. When usually, `(x, y)` refers to a unique location, that changes to `(x, y, CRS)` when `(x, y)` values are reused in other coordinate reference systems (e.g. two neighboring UTM zones).
70-
71-
::: tip Be Careful with Data Types
72-
As stated above, datacubes only contain scalar values. However, implementations may differ in their ability to handle or convert them. Implementations may also not allow mixing data types in a datacube. For example, returning a boolean value for a reducer on a numerical datacube may result in an error on some back-ends. The recommendation is to not change the data type of values in a datacube unless the back-end supports it explicitly.
73-
:::
74-
7579
### Applying Processes on Dimensions
7680

7781
Some processes are typically applied "along a dimension". You can imagine said dimension as an arrow and whatever is happening as a parallel process to that arrow. It simply means: "we focus on _this_ dimension right now".
@@ -88,6 +92,19 @@ Resampling is however costly, involves (some) data loss, and is in general not r
8892

8993
On such a _crs-dimensioned data cube_, several operations make perfect sense, such as `apply` or `reduce_dimension` on spectral and/or temporal dimensions. A simple reduction over the `crs` dimension, using _sum_ or _mean_ would typically not make sense. The "reduction" (removal) of the `crs` dimension that is meaningful involves the resampling/warping of all sub-cubes for the `crs` dimension to a single, common target coordinate reference system.
9094

95+
## Values in a datacube
96+
97+
openEO datacubes contain scalar values (e.g. strings, numbers or boolean values), with all other associated attributes stored in dimensions (e.g. coordinates or timestamps). Attributes such as the CRS or the sensor can also be turned into dimensions. Be advised that in such a case, the uniqueness of pixel coordinates may be affected. When usually, `(x, y)` refers to a unique location, that changes to `(x, y, CRS)` when `(x, y)` values are reused in other coordinate reference systems (e.g. two neighboring UTM zones).
98+
99+
::: tip Be Careful with Data Types
100+
As stated above, datacubes only contain scalar values. However, implementations may differ in their ability to handle or convert them. Implementations may also not allow mixing data types in a datacube. For example, returning a boolean value for a reducer on a numerical datacube may result in an error on some back-ends. The recommendation is to not change the data type of values in a datacube unless the back-end supports it explicitly.
101+
:::
102+
103+
Data cube values can be sampled in two different ways. The values are either area or point samples.
104+
105+
- Area sampling aggregates measurements over defined regions, i.e. the grid cells for raster data or polygons/lines for vector data.
106+
- Point sampling collects data at specific locations, providing detailed information for specific points.
107+
91108
## Processes on Datacubes
92109

93110
In the following part, the basic processes for manipulating datacubes are introduced.

0 commit comments

Comments
 (0)