Skip to content

Commit dd788ec

Browse files
docs: DOC-295: Add docs for proxy storage (#7578)
Co-authored-by: Max Tkachenko <[email protected]>
1 parent 688a2c0 commit dd788ec

File tree

4 files changed

+93
-0
lines changed

4 files changed

+93
-0
lines changed

docs/source/guide/storage.md

Lines changed: 93 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -67,6 +67,8 @@ Source storage functionality can be divided into two parts:
6767

6868
<img src="/images/source-cloud-storages.png" class="make-intense-zoom">
6969

70+
71+
7072
#### Treat every bucket object as a source file
7173

7274
Label Studio Source Storages feature an option called "Treat every bucket object as a source file." This option enables two different methods of loading tasks into Label Studio.
@@ -179,6 +181,97 @@ When enabled, Label Studio automatically lists files from the storage bucket and
179181
<img src="/images/source-storages-treat-on.png" class="make-intense-zoom">
180182

181183

184+
#### Pre-signed URLs vs. storage proxies
185+
186+
There are two secure mechanisms in which Label Studio fetches media data from cloud storage: via proxy and via pre-signed URLS.
187+
188+
Which one you use depends on whether you have **Use pre-signed URLs** toggled on or off when setting up your source storage. Proxy storage is enabled when **Use pre-signed URLs** is OFF:
189+
190+
<img src="/images/storages/use-presigned-off.png" style="max-width:600px; margin: 0 auto" alt="Screenshot of storage page with use pre-signed off">
191+
192+
##### Proxy storage
193+
194+
When in proxy mode, the Label Studio backend fetches objects server-side and streams them directly to the browser.
195+
196+
<img src="/images/storages/storage-proxy.png" style="max-width:600px; margin: 0 auto" alt="Diagram of proxy flow">
197+
198+
This has multiple benefits, including:
199+
200+
- **Security**
201+
- Access to media files is further restricted based on Label Studio user roles and project access.
202+
- This access is applied to cached files. This means that even if the media is cached, access will be restricted to that file if a user's access to the task is revoked.
203+
- Data stays within the Label Studio network boundary. This is especially useful for on-prem environments who want to maintain a single entry point for their network traffic.
204+
- **Configuration**
205+
- No CORS settings are needed.
206+
- No pre-signed permissions are needed.
207+
208+
To allow proxy storage, you need to ensure your permissions include the following:
209+
210+
{% details <b>AWS S3</b> %}
211+
212+
```json
213+
{
214+
"Version": "2012-10-17",
215+
"Statement": [
216+
{
217+
"Effect": "Allow",
218+
"Action": [
219+
"s3:GetObject",
220+
"s3:ListBucket"
221+
],
222+
"Resource": [
223+
"arn:aws:s3:::your-bucket-name",
224+
"arn:aws:s3:::your-bucket-name/*"
225+
]
226+
}
227+
]
228+
}
229+
230+
```
231+
232+
{% enddetails %}
233+
234+
<br>
235+
236+
{% details <b>Google Cloud Storage</b> %}
237+
238+
- `storage.objects.get` - Read object data and metadata
239+
- `storage.objects.list` - List objects in the bucket (if using prefix)
240+
241+
{% enddetails %}
242+
243+
<br>
244+
245+
{% details <b>Azure Blob Storage</b> %}
246+
247+
Add the **Storage Blob Data Reader** role, which includes:
248+
- `Microsoft.Storage/storageAccounts/blobServices/containers/blobs/read`
249+
- `Microsoft.Storage/storageAccounts/blobServices/containers/blobs/getTags/action`
250+
251+
{% enddetails %}
252+
253+
<br>
254+
255+
!!! note Note for on-prem deployments
256+
Large media files are streamed in sequential 8 MB chunks, which are split into different GET requests. This can result in frequent requests to the backend to get the next portion of data and uses additional resources.
257+
258+
You can configure this using the following environment variables:
259+
260+
* `RESOLVER_PROXY_MAX_RANGE_SIZE` - Defaults to 8 MB, and defines the largest chunk size returned per request.
261+
* `RESOLVER_PROXY_TIMEOUT` - Defaults to 20 seconds, and defines the maximum time uWSGI workers spend on a single request.
262+
263+
264+
##### Pre-signed URLs
265+
266+
In this scenario, your browser receives an HTTP 303 redirect to a time-limited S3/GCS/Azure presigned URL. This is the default behavior.
267+
268+
The main benefit to using pre-signed URLs is if you want to ensure that your media files are isolated **from** the Label Studio network as much as possible.
269+
270+
<img src="/images/storages/storage-proxy-presigned.png" style="max-width:600px; margin: 0 auto" alt="Diagram of presigned URL flow">
271+
272+
The permissions required for this are already included in the cloud storage configuration documentation below.
273+
274+
182275
### Target storage
183276

184277
When annotators click **Submit** or **Update** while labeling tasks, Label Studio saves annotations in the Label Studio database.
Loading
Loading
Loading

0 commit comments

Comments
 (0)