-
Notifications
You must be signed in to change notification settings - Fork 181
Rendering Hints extension (WIP) #879
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Ok, first draft is up. Detailed review is appreciated since I am not sure that I explained everything right. Any more details on the 'why' would also be good to add. Still need to make an example and provide a schema. But feedback on the core ideas would be great. Also am thinking about making the schema check zoom levels between 0 and 25? I welcome ideas on what to put for the max, but would be good to check to make sure users aren't putting in like 500 for the zoom levels. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we include raster statistics (min/mean/max) as optional fields in this extension?
extensions/rendering-hints/README.md
Outdated
| Type Name | Description | | ||
|-----------|-------------| | ||
| `unknown` | Not known | | ||
| `byte` | An unsigned 8-bit integer (common for 8-bit rgb png's) | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as far as type names go, int8
is numpy's reference to the type:
In [1]: import numpy as np
In [2]: np.int8
Out[2]: numpy.int8
In [3]: np.byte
Out[3]: numpy.int8
And rasterio
has a similar idea:
In [1]: import rasterio
In [2]: src = rasterio.open("<a tiff i have>")
In [3]: src.dtypes
Out[3]: ('uint8', 'uint8', 'uint8')
I don't think it's super important but it's not an uncommon name for the type
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh no I scrolled up in gitter and now see the numpy / rasterio discussion 🙃
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I really know very little about these things, so I'm happy for whatever you all think is best. I just want to have a clear list for people to pick from.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh sorry; I thought byte
was an alias for uint8
, not int8
. I see Numpy has ubyte
as the alias for uint8
. uint8
is the more common one for imagery as you saw
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I should change this to just be unit8 ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It doesn't hurt to include both I guess
Type Name | Description |
---|---|
uint8 |
8-bit unsigned integer |
int8 |
8-bit signed integer |
|
||
## Item Properties fields | ||
|
||
| Field Name | Type | Description | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(throwing this here because it seems like the most appropriate place to ask about included fields)
I don't know what the appropriate scope is for this extension, but I think the main hint I would want, if it's available, is a colormap from the tif. I'm a little bit confused about what I'd do with the min and max zoom hints.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've been interested in having that type of stuff in here, though I'm far from an expert on it. I think it can be in scope, and I was anticipating it'd grow to include similar things. But I'm more than happy to include it now. Just give me more details of what that actually looks like - what is the field, what's in it, how do you explain it. Or (preferred) feel free to add a 'commitable suggestion' (or whatever it is called) on the PR and I'll add it in.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a little bit confused about what I'd do with the min and max zoom hints
Min and max zoom hints are helpful for on-demand rendering. The max_zoom
is generally discoverable from the gsd
separately, but the min_zoom
depends on the number of overview levels in the file (usually a Cloud-Optimized GeoTIFF).
That gets me thinking... what if a different way to describe min/max zoom is the gsd
of the full-resolution data vs the gsd
of the smallest overview? the main gsd
is of course already stored per band. Storing the min_overview_gsd
[of each band?] would allow the dynamic tiler to derive the min/max zoom levels, and would additionally support non-mercator rendering by being able to derive those zoom levels from the gsd
values
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Additionally, for visualization it's sometimes important to know the min/max values (not zoom, actual values) for the data. Sometimes that can't be well determined from the data_type. For example, normalized_differences are usually between -1 and 1, but non of the data types really caters for that.
Additionally, I would like to use some of the fields from this extension, but don't always care about zoom levels. So having that required seems to make this less useful. But I'm not exactly sure what to require. Maybe just one of the properties? (Schema: minProperties: 1
) Otherwise I'd probably just fill the min/max values for zoom. They should be mentioned in the docs, I think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
min_overview_gsd does seem a bit cleaner. Is there an easy way to calculate that? Would want to give people some reference of how to get it.
min/max values sounds good to add. I'm for it. What should the field name be?
And minProperties: 1
sounds good - I only knew the one use case, but if you have a use case that involves not using them then I'm happy to not make that one required.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Responded in main thread.
@vincentsarago uses 30 as the max zoom in his mosaic code. |
@DanielJDufour I think this one could be interesting for you and I think you have some expertise to share from your work on geotiff-stats, geotiff.io, georaster etc. Could you have a look, please? |
@cholmes Let me know whenever you are finished and need a JSON Schema ;-) |
Hello. I apologize if my suggestions are already covered by other extensions (I'm still learning about what is out there). I submit the following items for your consideration: minimum and maximum pixel values for each band of each assetThis is useful for rendering single-band images without a color palette. It could also help render a single-band from a Landsat scene. I should note that GDAL can embed this information as XML into a GeoTIFF's metadata. However, I'm not sure if you can do this for other types of gridded data (e.g., JPG2000). Maybe someone else can provide information on that. And JSON is easier to work with if you are doing anything client-side in the browser. more projection informationIt's probably because of my lack of knowledge about STAC that I suggest this. But is there currently a way to grab the projection information for an asset? Ideally we could have the wkt and maybe proj4js string easily available. In order to render a GeoTIFF asset in a projection other than the one it is in would require this information. However, my understanding is that this has been solved in Python already via GDAL/RasterIO, but the JS ecosystem doesn't yet have a solution for converting a GeoTIFFs key to a proj4js or wkt string. I started a little of this work here, but there's a long way to go. This is also currently (imho) the biggest blocker to creating an OpenLayers plugin (ol-geotiff) that can handle and display any GeoTIFF (and doesn't rely on external services). It's also the most common issue that comes up with GeoRasterLayer color palette(I think someone mentioned this already). The color palette is basically the mapping of a pixel value to an RGBA Color. GeoTIFFs often store color palette imagery in a somewhat compressed format where you don't store all the mappings, but instead the scaled step between pixel values. geotiff-palette basically expands this info into a simple array of RGBA values where the index number in the array refers to its pixel value. For 8-bit imagery, there would be 256 histogramsSimilar to palettes the size of the histogram depends on the number of uniques values found in the raster. GDAL will bin/group this data for you, which will shrink the space required to store this information. However, it's often useful to have direct access to the raw un-binned counts for each pixel value, especially if the renderer would like to bin it with different bin sizes. The raw data could get very large if used on raster with more than 8-bits per pixel. tile-level informationIt would be awesome if there was a standard around storing not just file-level statistics, but the stats for each tile. It could be helpful for customizing/stretching the rendering relative to tiles that are actually being displayed. Sometimes people will want to apply a threshold (i.e., only show pixel values over 300), so having access to the tile-level minimum, could help us avoid extra calls for tiles without this information. Here's an example of displaying a GeoTIFF that displays the areas where tropical fruit is grown in Puerto Rico: https://geotiff.github.io/georaster-layer-for-leaflet-example/examples/thresholding.html maskThis might have been mentioned earlier and I've seen it used in some of the awesome tilers out there. It might be useful in some situations to have access to a polygon representing how the data should be masked (i.e. where the no-data values are). It can help avoid extra calls for pixels in an area that is only no-data values. range of pixel values for each bandThis could be optional or not included in the extension because it's basically derived by subtracting the minimum pixel value from the maximum pixel value. However, it could be useful and remove one (albeit simple) step. Here's an example of the range (along with min and max) being used: https://github.com/GeoTIFF/georaster-layer-for-leaflet/blob/master/georaster-layer-for-leaflet.js#L377 official rendering functionsI'm not sure about this one, but thought I should share it in case it sparks conversation. I see it's usefulness but also its drawbacks. It could be interesting if there was a way that data holders can provide validated band arithmetic functions. I've personally found it difficult to come up these equations because of all the caveats. For example, although NDVI is a rather simple equation on the face of it, in practice, one often has to add in caveats like making sure water displays correctly. We could also consider using a generic language like the one proposed by stac-expr, but I've heard its more useful for people to have code in the language they would actually run the arithmetic in, like Python or JavaScript. "expressions": {
"ndvi": {
"js": "results = (nir - red) / (nir + red); return result <= 0.1 ? 'blue' : result >= 0.8 ? 'black' : result",
"python": ...
}
} Although these expressions can get long and complicated, that's precisely why I think it could be useful for people who simply want to display NDVI without having to do the Math themselves. However, it would increase the maintenance cost and what happens when an equation gets updated? Would this change any applications depending on the previous rendering equation? Is that a good or bad thing? What do you think @m-mohr ? Would love your thoughts on this. Supervised Classification and AI model outputThis is probably out of scope, but if we really wanted to go overboard, we could include include the results of supervised classification. For example, we could store the range for water values:
I'd also like to invite @rowanwins to offer some feedback. He's done a lot of good work with GeoTIFFs and might also have some good suggestions to add :-) He's mentioned standard deviation and quantiles being useful before (source). Apologies again for the lengthy post. I think there's a clear use case for min/max, projection info, color palette, histogram, tile-level info, and mask. However, some of this might already be covered by other extensions. I'm also unsure about whether it makes sense to include range, band arithmetic, and classification output in the STAC extension because it would increase the scope of it and might make sense to separate concerns more. Looking forward to your thoughts and feedback. |
Thanks for all the extensive comments @DanielJDufour! Really appreciate it. A few responses:
@m-mohr suggested this too. Let's get it in.
https://github.com/radiantearth/stac-spec/tree/master/extensions/projection is pretty extensive - let me know if there's anything that's needed for you, but I think it should be sufficient. We also recently added transform & shape, to enable things like VRT's without having to open the files.
This sounds like more of an asset than a property? Like it's a small file you reference (or even be embedded in the geotiff?), not something you'd embed in JSON? Or do I have it wrong? I think that'd just be an asset. I don't know that other extensions yet specify assets, but it seems like it would make sense to me.
This also sounds interesting to me, and also sounds like more of an asset to reference? I'd love an example with a color palette and a histogram, so we can show directly what it would look like for people, and provide a best practice for this stutff.
This one feels outside the scope of STAC to me? Unless maybe if you're using the tiled asset extension
Feels like another asset? Again, if someone can get me an example mask I can probably make an example with it, and recommend it.
I'm inclined to just leave this for people to get by subtracting min/max. So that there's not a situation where the values don't agree with one another, like someone changes one but doesn't update the other.
I'm into this idea in general - like getting to standard rendering functions. At Planet we're experimenting with this, and I would have loved a standard to point at. In the interest of 'small pieces loosely coupled' I'm inclined to aim for a standard on this that stands alone / isn't tied to STAC. But that could be referenced by STAC. Since I think there are people who would want to use this without having to understand STAC.
Similar feelings to the official rendering functions - cool to have, let's do in its own spec. |
A few thoughts Min/Max
|
I think these are generally embedded within the GeoTIFF, but I could be wrong. |
@geospatial-jeff - I'm into this in theory, but I struggle with how to implement. I'm guessing you mean like the results of gdal with -stats:
I could see it working if all data distributed one band per asset, since then you can just have those fields on each asset. But many have one asset with several bands, so we'd need like some nested json object to give the values for each band. And that just seems to get pretty messy. Seems like we should just get this written up as an 'extension' on the COG spec. Like it feels much cleaner at the tiff tag level. I've been meaning to do that since @mojodna came up with the idea. |
I'm on board. Seems like we just need min_gsd for this extension then? And a perhaps a recommendation on how to use gsd to calculate the max? I'll take a crack at it.
I had thought someone mentioned this in the thread, but now I can't seem to find it. But I think the real-world example is NDVI output, that is often just -1 to 1. Unless I'm off on that one. That'd obviously be a float, so I don't know an uint16.
Ah, true. We should probably call this out specifically. I think in the label extension we do, since often it just uses a subset of an overall valid image. But perhaps we call it out for EO as well, or even make it a general recommendation - the polygon you use should be the valid portion, not including black fill, etc. I think we'd want extensions to be able to change that behavior if there's some reason to, but could be good to make clear that should be the default. Definitely seems more useful. |
Overviews have associated decimations, which I believe is defined as the number of times fewer pixels that overview has. Running
Then given that the
This seems strange to me. I understand that NDVI output ranges from -1 to 1, but why would you store the literal values in a GeoTIFF as -1 to 1? You'd be either losing precision or making a larger file size. If the source data comes as In any case, this isn't something I feel strongly about either way, and it's not too much space to store min and max. |
Hrm good point. |
Another good real world example is satellite imagery is often captured between 11-14 bits but stored in a 16 bit image. |
As I sat down to try to add the new things discussed it occurs to me that the min/max values per band is the same situation as the stats - we need some sort of per band construct. Oh wait, we just decided that the 'bands' information should be included in every item. So I think we could just add on fields to the band. That would be an extension of an extension, which I think is ok? But not sure if that should be a new extension or done in this one. I think for now I'm going to consider those out of scope, but file new issues for them, so that we can get this extension out the door. |
Ok, so I think this draft is getting close to ready. Somehow with all the great suggestions we ended up with less fields. I'll file a ticket on the min/max values, it'll just be trickier to integrate with eo bands. @kylebarron @geospatial-jeff @vincentsarago - a final review would be great. Also if one of you could help me get together a real world example. Like could you figure out the overview gsd of one planet stac sample images? And the data type? @m-mohr - a schema would be great at this point too, hopefully not much will change. |
@cholmes In general this looks good, but (1) it should have at least one example and (2) I'm not sure I follow you on min/max values. What's the exact reson to not include them here? It feels like it exactly fits into this extension and we named a couple use cases already. min/max are likely closely bound to eo:bands, but could also apply to data without bands, like SAR or so. So I'd actually define them on the same level as data type, but also allow them in bands, I guess. We may re-use the Stats Objects from Collections, which is extensible for mean, median or whatever... @DanielJDufour Thanks! Chris pretty much summarized my thoughts, too. I'd love to see a color palette extension though. Many of the other things are either already covered or will be covered soon. Great! I'll come up with a Schema soon. |
| render:overview_max_gsd | number | The maximum Ground Sample Distance represented in an overview. This should be the GSD of the highest level overview, generally of a [Cloud Optimized GeoTIFF](http://cogeo.org), but should work with any format. | | ||
| render:data_type | string | The data `type` (float, int, complex, etc) to let the renderer apply any needed rescaling up front. The full set of options is listed below. | | ||
|
||
**render:overview_max_gsd**: This field helps renderers of understand what zoom levels they can efficiently show. It is |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
**render:overview_max_gsd**: This field helps renderers of understand what zoom levels they can efficiently show. It is | |
**render:overview_max_gsd**: This field helps renderers understand what zoom levels they can efficiently show. It is |
@m-mohr: 1) I'll get an example in - wanted to get a real example, and just saw @kylebarron provided the info I need to do it. Will try to get it in within a week (on vacation and have a ton of sprint organizing to do). 2) Midway through writing I realized that it was not as hard as I thought, since we use the bands object. But I think the one thing I'm less clear on is 'extending an extension', since bands isn't in core. Just not sure if we do it all in one extension, where some of the fields don't need to extend eo. @kylebarron - thanks! This is helpful. It does feel like we should have a good 'best practices' section on this stuff, with how to figure out quadkeys, zoom levels, etc. And ideally those link to code in various languages eventually. |
I think it shouldn't only live in bands, because there's more data then EO. |
@m-mohr - how do we specify it then? Just at the asset level? And then note that assets can use it in their bands objects if desired? |
There's likely not an easy answer to this. It all depends on the underlying data structure / file format. I guess it needs to be available at eo:bands and in assets, assets being the primary place where it lives in, but in future extensions it could also live in other places. Like in data cubes for example it would also be useful and there's likely more... Maybe we should just make this extension work for assets and then say in the EO and data cube extension that the rendering extension can be used in their Band / Dimension Objects? |
Yeah, that's probably a good way to do it. Specify here, and then in EO it can mention that rendering hints extension can be used at the bands level. Also should we specify at the item property level and say that it's usually just used at the asset level? So that items with only one asset can use it? Or is it easier to just keep it at the asset level? |
I'd go with the properties level, I guess. For summary reasons and we did that for most fields recently and just allow everything also in assets. But no strong preferrence from my side. |
Should we add the min/max values per data type to the table? Then these would be default values for the min/max fields, if not provided, right? |
Yes, great idea. |
Any progress on this? I'd like to re-use it in CARD4L... Thinking whether the data_type would actually make sense in a "file" extension, see #921 (comment). |
#934 also proposes to add the data type field, so we may remove it here. Instead we could add min/max values. |
Yeah, just noticed the data type field in the file extension. I agree it makes more sense there, it's more general. So let's remove it here. As for progress, I'm hoping to work on it sometime soon - paused all my stac core spec work in favor of API. |
Anything I can do to help move this forward? We're removing the data type from this ext since it's now in the file extension, and adding data range? One question about data range: would it represent the "global" minimum and maximum across the dataset or the "local" minimum and maximum within that specific scene? For example, Sentinel 2 data that's 12 bits would presumably have a global min/max of 0-4095, but if the specific scene isn't fully saturated, the scene's min and max might be a smaller range. |
Co-authored-by: Matthias Mohr <[email protected]>
@kylebarron if you want to work on this that'd be awesome. Though I think this particular extension ends up being pretty minimal. I think the path ahead is:
So the main work is probably on the 'stats' extension, getting it written up and into a PR, and making sure the structure it proposes works with the different ways people organize files.
In my mind this is mostly per item, so yeah, it's the 'local' one. You might put the global in a 'summary' at the collection level. |
As extensions now live in the stac-extensions organization, this PR is being closed. Follow instructions here to create a repository for the extension: |
Progress on this at https://github.com/stac-extensions/raster |
Related Issue(s): #807
Proposed Changes:
PR Checklist: