Skip to content
This repository was archived by the owner on Apr 26, 2024. It is now read-only.

Commit c3119d1

Browse files
authored
Add an admin API for users' media statistics (#8700)
Add `GET /_synapse/admin/v1/statistics/users/media` to get statisics about local media usage by users. Related to #6094 It is the first API for statistics. Goal is to avoid/reduce usage of sql queries like [Wiki analyzing Synapse](https://github.com/matrix-org/synapse/wiki/SQL-for-analyzing-Synapse-PostgreSQL-database-stats) Signed-off-by: Dirk Klimpel [email protected]
1 parent e4676bd commit c3119d1

File tree

6 files changed

+820
-0
lines changed

6 files changed

+820
-0
lines changed

changelog.d/8700.feature

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
Add an admin API for local user media statistics. Contributed by @dklimpel.

docs/admin_api/statistics.md

Lines changed: 83 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,83 @@
1+
# Users' media usage statistics
2+
3+
Returns information about all local media usage of users. Gives the
4+
possibility to filter them by time and user.
5+
6+
The API is:
7+
8+
```
9+
GET /_synapse/admin/v1/statistics/users/media
10+
```
11+
12+
To use it, you will need to authenticate by providing an `access_token`
13+
for a server admin: see [README.rst](README.rst).
14+
15+
A response body like the following is returned:
16+
17+
```json
18+
{
19+
"users": [
20+
{
21+
"displayname": "foo_user_0",
22+
"media_count": 2,
23+
"media_length": 134,
24+
"user_id": "@foo_user_0:test"
25+
},
26+
{
27+
"displayname": "foo_user_1",
28+
"media_count": 2,
29+
"media_length": 134,
30+
"user_id": "@foo_user_1:test"
31+
}
32+
],
33+
"next_token": 3,
34+
"total": 10
35+
}
36+
```
37+
38+
To paginate, check for `next_token` and if present, call the endpoint
39+
again with `from` set to the value of `next_token`. This will return a new page.
40+
41+
If the endpoint does not return a `next_token` then there are no more
42+
reports to paginate through.
43+
44+
**Parameters**
45+
46+
The following parameters should be set in the URL:
47+
48+
* `limit`: string representing a positive integer - Is optional but is
49+
used for pagination, denoting the maximum number of items to return
50+
in this call. Defaults to `100`.
51+
* `from`: string representing a positive integer - Is optional but used for pagination,
52+
denoting the offset in the returned results. This should be treated as an opaque value
53+
and not explicitly set to anything other than the return value of `next_token` from a
54+
previous call. Defaults to `0`.
55+
* `order_by` - string - The method in which to sort the returned list of users. Valid values are:
56+
- `user_id` - Users are ordered alphabetically by `user_id`. This is the default.
57+
- `displayname` - Users are ordered alphabetically by `displayname`.
58+
- `media_length` - Users are ordered by the total size of uploaded media in bytes.
59+
Smallest to largest.
60+
- `media_count` - Users are ordered by number of uploaded media. Smallest to largest.
61+
* `from_ts` - string representing a positive integer - Considers only
62+
files created at this timestamp or later. Unix timestamp in ms.
63+
* `until_ts` - string representing a positive integer - Considers only
64+
files created at this timestamp or earlier. Unix timestamp in ms.
65+
* `search_term` - string - Filter users by their user ID localpart **or** displayname.
66+
The search term can be found in any part of the string.
67+
Defaults to no filtering.
68+
* `dir` - string - Direction of order. Either `f` for forwards or `b` for backwards.
69+
Setting this value to `b` will reverse the above sort order. Defaults to `f`.
70+
71+
72+
**Response**
73+
74+
The following fields are returned in the JSON response body:
75+
76+
* `users` - An array of objects, each containing information
77+
about the user and their local media. Objects contain the following fields:
78+
- `displayname` - string - Displayname of this user.
79+
- `media_count` - integer - Number of uploaded media by this user.
80+
- `media_length` - integer - Size of uploaded media in bytes by this user.
81+
- `user_id` - string - Fully-qualified user ID (ex. `@user:server.com`).
82+
* `next_token` - integer - Opaque value used for pagination. See above.
83+
* `total` - integer - Total number of users after filtering.

synapse/rest/admin/__init__.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -47,6 +47,7 @@
4747
ShutdownRoomRestServlet,
4848
)
4949
from synapse.rest.admin.server_notice_servlet import SendServerNoticeServlet
50+
from synapse.rest.admin.statistics import UserMediaStatisticsRestServlet
5051
from synapse.rest.admin.users import (
5152
AccountValidityRenewServlet,
5253
DeactivateAccountRestServlet,
@@ -227,6 +228,7 @@ def register_servlets(hs, http_server):
227228
DeviceRestServlet(hs).register(http_server)
228229
DevicesRestServlet(hs).register(http_server)
229230
DeleteDevicesRestServlet(hs).register(http_server)
231+
UserMediaStatisticsRestServlet(hs).register(http_server)
230232
EventReportDetailRestServlet(hs).register(http_server)
231233
EventReportsRestServlet(hs).register(http_server)
232234
PushersRestServlet(hs).register(http_server)

synapse/rest/admin/statistics.py

Lines changed: 122 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,122 @@
1+
# -*- coding: utf-8 -*-
2+
# Copyright 2020 Dirk Klimpel
3+
#
4+
# Licensed under the Apache License, Version 2.0 (the "License");
5+
# you may not use this file except in compliance with the License.
6+
# You may obtain a copy of the License at
7+
#
8+
# http://www.apache.org/licenses/LICENSE-2.0
9+
#
10+
# Unless required by applicable law or agreed to in writing, software
11+
# distributed under the License is distributed on an "AS IS" BASIS,
12+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
13+
# See the License for the specific language governing permissions and
14+
# limitations under the License.
15+
16+
import logging
17+
from typing import TYPE_CHECKING, Tuple
18+
19+
from synapse.api.errors import Codes, SynapseError
20+
from synapse.http.servlet import RestServlet, parse_integer, parse_string
21+
from synapse.http.site import SynapseRequest
22+
from synapse.rest.admin._base import admin_patterns, assert_requester_is_admin
23+
from synapse.storage.databases.main.stats import UserSortOrder
24+
from synapse.types import JsonDict
25+
26+
if TYPE_CHECKING:
27+
from synapse.server import HomeServer
28+
29+
logger = logging.getLogger(__name__)
30+
31+
32+
class UserMediaStatisticsRestServlet(RestServlet):
33+
"""
34+
Get statistics about uploaded media by users.
35+
"""
36+
37+
PATTERNS = admin_patterns("/statistics/users/media$")
38+
39+
def __init__(self, hs: "HomeServer"):
40+
self.hs = hs
41+
self.auth = hs.get_auth()
42+
self.store = hs.get_datastore()
43+
44+
async def on_GET(self, request: SynapseRequest) -> Tuple[int, JsonDict]:
45+
await assert_requester_is_admin(self.auth, request)
46+
47+
order_by = parse_string(
48+
request, "order_by", default=UserSortOrder.USER_ID.value
49+
)
50+
if order_by not in (
51+
UserSortOrder.MEDIA_LENGTH.value,
52+
UserSortOrder.MEDIA_COUNT.value,
53+
UserSortOrder.USER_ID.value,
54+
UserSortOrder.DISPLAYNAME.value,
55+
):
56+
raise SynapseError(
57+
400,
58+
"Unknown value for order_by: %s" % (order_by,),
59+
errcode=Codes.INVALID_PARAM,
60+
)
61+
62+
start = parse_integer(request, "from", default=0)
63+
if start < 0:
64+
raise SynapseError(
65+
400,
66+
"Query parameter from must be a string representing a positive integer.",
67+
errcode=Codes.INVALID_PARAM,
68+
)
69+
70+
limit = parse_integer(request, "limit", default=100)
71+
if limit < 0:
72+
raise SynapseError(
73+
400,
74+
"Query parameter limit must be a string representing a positive integer.",
75+
errcode=Codes.INVALID_PARAM,
76+
)
77+
78+
from_ts = parse_integer(request, "from_ts", default=0)
79+
if from_ts < 0:
80+
raise SynapseError(
81+
400,
82+
"Query parameter from_ts must be a string representing a positive integer.",
83+
errcode=Codes.INVALID_PARAM,
84+
)
85+
86+
until_ts = parse_integer(request, "until_ts")
87+
if until_ts is not None:
88+
if until_ts < 0:
89+
raise SynapseError(
90+
400,
91+
"Query parameter until_ts must be a string representing a positive integer.",
92+
errcode=Codes.INVALID_PARAM,
93+
)
94+
if until_ts <= from_ts:
95+
raise SynapseError(
96+
400,
97+
"Query parameter until_ts must be greater than from_ts.",
98+
errcode=Codes.INVALID_PARAM,
99+
)
100+
101+
search_term = parse_string(request, "search_term")
102+
if search_term == "":
103+
raise SynapseError(
104+
400,
105+
"Query parameter search_term cannot be an empty string.",
106+
errcode=Codes.INVALID_PARAM,
107+
)
108+
109+
direction = parse_string(request, "dir", default="f")
110+
if direction not in ("f", "b"):
111+
raise SynapseError(
112+
400, "Unknown direction: %s" % (direction,), errcode=Codes.INVALID_PARAM
113+
)
114+
115+
users_media, total = await self.store.get_users_media_usage_paginate(
116+
start, limit, from_ts, until_ts, order_by, direction, search_term
117+
)
118+
ret = {"users": users_media, "total": total}
119+
if (start + limit) < total:
120+
ret["next_token"] = start + len(users_media)
121+
122+
return 200, ret

synapse/storage/databases/main/stats.py

Lines changed: 127 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -16,15 +16,18 @@
1616

1717
import logging
1818
from collections import Counter
19+
from enum import Enum
1920
from itertools import chain
2021
from typing import Any, Dict, List, Optional, Tuple
2122

2223
from twisted.internet.defer import DeferredLock
2324

2425
from synapse.api.constants import EventTypes, Membership
26+
from synapse.api.errors import StoreError
2527
from synapse.storage.database import DatabasePool
2628
from synapse.storage.databases.main.state_deltas import StateDeltasStore
2729
from synapse.storage.engines import PostgresEngine
30+
from synapse.types import JsonDict
2831
from synapse.util.caches.descriptors import cached
2932

3033
logger = logging.getLogger(__name__)
@@ -59,6 +62,23 @@
5962
TYPE_TO_ORIGIN_TABLE = {"room": ("rooms", "room_id"), "user": ("users", "name")}
6063

6164

65+
class UserSortOrder(Enum):
66+
"""
67+
Enum to define the sorting method used when returning users
68+
with get_users_media_usage_paginate
69+
70+
MEDIA_LENGTH = ordered by size of uploaded media. Smallest to largest.
71+
MEDIA_COUNT = ordered by number of uploaded media. Smallest to largest.
72+
USER_ID = ordered alphabetically by `user_id`.
73+
DISPLAYNAME = ordered alphabetically by `displayname`
74+
"""
75+
76+
MEDIA_LENGTH = "media_length"
77+
MEDIA_COUNT = "media_count"
78+
USER_ID = "user_id"
79+
DISPLAYNAME = "displayname"
80+
81+
6282
class StatsStore(StateDeltasStore):
6383
def __init__(self, database: DatabasePool, db_conn, hs):
6484
super().__init__(database, db_conn, hs)
@@ -882,3 +902,110 @@ def _calculate_and_set_initial_state_for_user_txn(txn):
882902
complete_with_stream_id=pos,
883903
absolute_field_overrides={"joined_rooms": joined_rooms},
884904
)
905+
906+
async def get_users_media_usage_paginate(
907+
self,
908+
start: int,
909+
limit: int,
910+
from_ts: Optional[int] = None,
911+
until_ts: Optional[int] = None,
912+
order_by: Optional[UserSortOrder] = UserSortOrder.USER_ID.value,
913+
direction: Optional[str] = "f",
914+
search_term: Optional[str] = None,
915+
) -> Tuple[List[JsonDict], Dict[str, int]]:
916+
"""Function to retrieve a paginated list of users and their uploaded local media
917+
(size and number). This will return a json list of users and the
918+
total number of users matching the filter criteria.
919+
920+
Args:
921+
start: offset to begin the query from
922+
limit: number of rows to retrieve
923+
from_ts: request only media that are created later than this timestamp (ms)
924+
until_ts: request only media that are created earlier than this timestamp (ms)
925+
order_by: the sort order of the returned list
926+
direction: sort ascending or descending
927+
search_term: a string to filter user names by
928+
Returns:
929+
A list of user dicts and an integer representing the total number of
930+
users that exist given this query
931+
"""
932+
933+
def get_users_media_usage_paginate_txn(txn):
934+
filters = []
935+
args = [self.hs.config.server_name]
936+
937+
if search_term:
938+
filters.append("(lmr.user_id LIKE ? OR displayname LIKE ?)")
939+
args.extend(["@%" + search_term + "%:%", "%" + search_term + "%"])
940+
941+
if from_ts:
942+
filters.append("created_ts >= ?")
943+
args.extend([from_ts])
944+
if until_ts:
945+
filters.append("created_ts <= ?")
946+
args.extend([until_ts])
947+
948+
# Set ordering
949+
if UserSortOrder(order_by) == UserSortOrder.MEDIA_LENGTH:
950+
order_by_column = "media_length"
951+
elif UserSortOrder(order_by) == UserSortOrder.MEDIA_COUNT:
952+
order_by_column = "media_count"
953+
elif UserSortOrder(order_by) == UserSortOrder.USER_ID:
954+
order_by_column = "lmr.user_id"
955+
elif UserSortOrder(order_by) == UserSortOrder.DISPLAYNAME:
956+
order_by_column = "displayname"
957+
else:
958+
raise StoreError(
959+
500, "Incorrect value for order_by provided: %s" % order_by
960+
)
961+
962+
if direction == "b":
963+
order = "DESC"
964+
else:
965+
order = "ASC"
966+
967+
where_clause = "WHERE " + " AND ".join(filters) if len(filters) > 0 else ""
968+
969+
sql_base = """
970+
FROM local_media_repository as lmr
971+
LEFT JOIN profiles AS p ON lmr.user_id = '@' || p.user_id || ':' || ?
972+
{}
973+
GROUP BY lmr.user_id, displayname
974+
""".format(
975+
where_clause
976+
)
977+
978+
# SQLite does not support SELECT COUNT(*) OVER()
979+
sql = """
980+
SELECT COUNT(*) FROM (
981+
SELECT lmr.user_id
982+
{sql_base}
983+
) AS count_user_ids
984+
""".format(
985+
sql_base=sql_base,
986+
)
987+
txn.execute(sql, args)
988+
count = txn.fetchone()[0]
989+
990+
sql = """
991+
SELECT
992+
lmr.user_id,
993+
displayname,
994+
COUNT(lmr.user_id) as media_count,
995+
SUM(media_length) as media_length
996+
{sql_base}
997+
ORDER BY {order_by_column} {order}
998+
LIMIT ? OFFSET ?
999+
""".format(
1000+
sql_base=sql_base, order_by_column=order_by_column, order=order,
1001+
)
1002+
1003+
args += [limit, start]
1004+
txn.execute(sql, args)
1005+
users = self.db_pool.cursor_to_dict(txn)
1006+
1007+
return users, count
1008+
1009+
return await self.db_pool.runInteraction(
1010+
"get_users_media_usage_paginate_txn", get_users_media_usage_paginate_txn
1011+
)

0 commit comments

Comments
 (0)