Skip to content

Commit 6a28dc8

Browse files
docs: DOC-295: Add docs for proxy storage
1 parent 694a8ab commit 6a28dc8

File tree

1 file changed

+88
-1
lines changed

1 file changed

+88
-1
lines changed

docs/source/guide/storage.md

Lines changed: 88 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -67,6 +67,94 @@ Source storage functionality can be divided into two parts:
6767

6868
<img src="/images/source-cloud-storages.png" class="make-intense-zoom">
6969

70+
#### Pre-signed URLs vs. storage proxies
71+
72+
There are two secure mechanisms in which Label Studio fetches media data from cloud storage: via proxy and via pre-signed URLS.
73+
74+
Which one you use depends on whether you have **Use pre-signed URLs** toggled on or off when setting up your source storage. Proxy storage is enabled when **Use pre-signed URLs** is OFF:
75+
76+
![Screenshot of storage page with use pre-signed off](/images/storages/use-presigned-off.png)
77+
78+
##### Proxy storage
79+
80+
When in proxy mode, the Label Studio backend fetches objects server-side and streams them directly to the browser.
81+
82+
![Storage diagram proxy](/images/storages/storage-proxy.png)
83+
84+
This has multiple benefits, including:
85+
86+
- **Security**
87+
- Access to media files is further restricted based on Label Studio user roles and project access.
88+
- This access is applied to cached files. This means that even if the media is cached, access will be restricted to that file if a user's access to the task is revoked.
89+
- Data stays within the Label Studio network boundary. This is especially useful for on-prem environments who want to maintain a single entry point for their network traffic.
90+
- **Configuration**
91+
- No CORS settings are needed.
92+
- No pre-signed permissions are needed.
93+
94+
To allow proxy storage, you need to ensure your permissions include the following:
95+
96+
{% details <b>AWS S3</b> %}
97+
98+
```json
99+
{
100+
"Version": "2012-10-17",
101+
"Statement": [
102+
{
103+
"Effect": "Allow",
104+
"Action": [
105+
"s3:GetObject",
106+
"s3:ListBucket"
107+
],
108+
"Resource": [
109+
"arn:aws:s3:::your-bucket-name",
110+
"arn:aws:s3:::your-bucket-name/*"
111+
]
112+
}
113+
]
114+
}
115+
116+
```
117+
118+
{% enddetails %}
119+
120+
<br>
121+
122+
{% details <b>Google Cloud Storage</b> %}
123+
124+
- `storage.objects.get` - Read object data and metadata
125+
- `storage.objects.list` - List objects in the bucket (if using prefix)
126+
127+
{% enddetails %}
128+
129+
<br>
130+
131+
{% details <b>Azure Blob Storage</b> %}
132+
133+
Add the **Storage Blob Data Reader** role, which includes:
134+
- `Microsoft.Storage/storageAccounts/blobServices/containers/blobs/read`
135+
- `Microsoft.Storage/storageAccounts/blobServices/containers/blobs/getTags/action`
136+
137+
{% enddetails %}
138+
139+
<br>
140+
141+
!!! note Note for on-prem deployments
142+
Very large media files are streamed in sequential 8 MB chunks, which are split into different GET requests. This can result in frequent requests to the backend to get the next portion of data and uses additional resources.
143+
144+
You can configure this using the following environment variables:
145+
146+
* `RESOLVER_PROXY_MAX_RANGE_SIZE` - Defaults to 8 MB, and defines the largest chunk size returned per request.
147+
* `RESOLVER_PROXY_TIMEOUT` - Defaults to 20 seconds, and defines the maximum time uWSGI workers spend on a single request.
148+
149+
150+
##### Pre-signed redirect
151+
152+
In this scenario, your browser receives an HTTP 303 redirect to a time-limited S3/GCS/Azure URL. This is the default behavior.
153+
154+
![Screenshot of storage page with use pre-signed off](/images/storages/storage-proxy-presigned.png)
155+
156+
The main benefit to using pre-signed URLs is if you want to ensure that your media files are isolated from the Label Studio network as much as possible.
157+
70158
#### Treat every bucket object as a source file
71159

72160
Label Studio Source Storages feature an option called "Treat every bucket object as a source file." This option enables two different methods of loading tasks into Label Studio.
@@ -178,7 +266,6 @@ When enabled, Label Studio automatically lists files from the storage bucket and
178266

179267
<img src="/images/source-storages-treat-on.png" class="make-intense-zoom">
180268

181-
182269
### Target storage
183270

184271
When annotators click **Submit** or **Update** while labeling tasks, Label Studio saves annotations in the Label Studio database.

0 commit comments

Comments
 (0)