Skip to content

MSC3775: Markup Locations for Audiovisual Media #3775

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 5 commits into
base: main
Choose a base branch
from
Draft
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
173 changes: 173 additions & 0 deletions proposals/3775-markup-locations-for-audiovisual-media.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,173 @@
# Markup Locations for Audiovisual Media

[MSC3574](https://github.com/matrix-org/matrix-spec-proposals/pull/3574)
proposes a mechanism for marking up resources (webpages, documents, videos, and
other files) using Matrix. The proposed mechanism requires an
`m.markup.location` schema for representing the location of annotations within
different kinds of resources. MSC3574 punts on what standard location types
might be available, deferring that large family of questions to other MSCs.
This MSC aims to provide basic location types for marking up audiovisual media
resources.

## Proposal

Basic markup locations for audiovisual media should make use of the [Media
Fragments URI specification](https://www.w3.org/TR/media-frags/). The media
fragment specification is quite simple, and results in annotations compatible
with the w3c's [web annotation data
model](https://www.w3.org/TR/annotation-model/).

Basic markup locations for audiovisual media should be applicable to `audio/*`,
`video/*` and `image/*` media types.

The basic media fragment specification addresses content along two dimensions:
temporal (in the form of a time interval), and spatial (in the form of a
rectangle of pixels in the original media). The specification also includes
support for addressing media by track (in the case of audio with multiple
parallel media streams, for example a secondary dubbed English-language audio
stream). We only make use of the temporal and spatial dimensions in this MSC.

Temporal locations consist of half-open intervals, specifying the first moment
included the location, and the first moment not included in the location. This
MSC will use milliseconds to represent moments. The Media Fragments URI
specification uses seconds with a decimal part. Spatial locations consist of
rectangular selections given by the x,y coordinates of the upper left corner
and the width and height of the rectangle. The coordinates and dimensions of
the rectangle can be indicated either with integers representing pixels (with
0,0 representing the top left corner of the image or video), or with integers
representing percentages of the width and height of the media. This MSC will
represent percentages with integers in [0,1000000], allowing for four decimal
points of accuracy. Temporal and spatial locations can be combined to select
spatio-temporal segments of video recordings.

### Media Fragments

Media Fragments will be represented as follows:

```
"m.markup.location": {
"m.markup.media.fragment": {
"start": ..
"end": ..
"x": ..
"y": ..
"w": ..
"h": ..
}
..
}
```

or (when spatial dimensions are given in percentages) as

```
"m.markup.location": {
"m.markup.media.fragment": {
"start": ..
"end": ..
"xp": ..
"yp": ..
"wp": ..
"hp": ..
}
..
}
```

with all fields optional, but with the requirement that at least one field is
present, and that if any of `xywh` are present, then all are.

The `start` and `end` values should be non-negative integers with `start <
end`, where `start` indicates the first millisecond of media included in the
location, and `end` indicates the first millisecond of media not included. If
`start` is omitted, the location begins at zero, and if `end` is omitted, the
location includes the whole duration of the media.

The `xywh` fields should be non-negative integers describing a spatial region
within the media in pixel coordinates as described above. So `xy` should be
smaller than then [intrinsic height and width of the
video](https://html.spec.whatwg.org/multipage/media.html#concept-video-intrinsic-width)
respectively, and `wh` should be smaller than the difference between `x` and
the intrinsic width, and the difference between `y` and the intrinsic height,
respectively. In cases where the exception on `wh` is violated, the region
described should be clipped at the edges of the media. In the case where the
expectation on `xy` is violated, the location should be ignored as invalid.

The `xp` `yp` `wp` and `hp` fields should be non-negative integers less than or
equal to 1000000, giving a spatial region within the media in percentage
coordinates as described above. If either `xp` + `wp` or `yp` + `hp` is greater
than 1000000, then the location should be ignored as invalid.

### Web Annotation Data Model Serialization

[MSC3574](https://github.com/matrix-org/matrix-spec-proposals/pull/3574)
includes a scheme for serializing matrix markup events as web annotations in
the web annotation data model. The scheme requires each markup location type to
have a canonical serialization as [a web annotation
selector](https://www.w3.org/TR/annotation-model/#selectors]). In this section,
we describe how to serialize `m.markup.media.fragment` as a WADM selector.

We take advantage of the WADM's support for URI fragments as locations, using
the [FragmentSelector](https://www.w3.org/TR/annotation-model/#text-quote-selector)
selector.

This allows us to encode a location of the form

```
"m.markup.media.fragment": {
"start": $START
"end": $END
"x": $X
"y": $Y
"w": $W
"h": $H
}
```

as a selector

```
{
"type": "FragmentSelector",
"conformsTo": "http://www.w3.org/TR/media-frags/",
"value": "t=($START/1000),($END/1000)&xywh=$X,$Y,$W,$H"
}
```

and

```
"m.markup.media.fragment": {
"start": $START
"end": $END
"xp": $X
"yp": $Y
"wp": $W
"hp": $H
}
```

as

```
{
"type": "FragmentSelector",
"conformsTo": "http://www.w3.org/TR/media-frags/",
"value": "t=($START/1000),($END/1000)&xywh=percent:($X/1000),($Y/1000),($W/1000),($H/1000)"
}
```

## Security considerations

Because room state is unencrypted, `m.space.child` events conveying locations
via `m.markup.media.fragment` could leak information about the duration and
dimensions of a piece of media. This is part of a more general problem with
state events potentially leaking information, and deserves a general
resolution, a la [MSC3414](https://github.com/matrix-org/matrix-spec-proposals/pull/3414)


## Unstable prefix

| Proposed Final Identifier | Purpose | Development Identifier |
| ------------------------- | ---------------------------------------------------------- | --------------------------------------------- |
| `m.markup.media.fragment` | key in `m.markup.location` | `com.open-tower.msc3775.markup.media.fragment`|