You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Label Studio Source Storages feature an option called "Treat every bucket object as a source file." This option enables two different methods of loading tasks into Label Studio.
@@ -179,6 +181,97 @@ When enabled, Label Studio automatically lists files from the storage bucket and
There are two secure mechanisms in which Label Studio fetches media data from cloud storage: via proxy and via pre-signed URLS.
187
+
188
+
Which one you use depends on whether you have **Use pre-signed URLs** toggled on or off when setting up your source storage. Proxy storage is enabled when **Use pre-signed URLs** is OFF:
189
+
190
+
<imgsrc="/images/storages/use-presigned-off.png"style="max-width:600px; margin: 0auto"alt="Screenshot of storage page with use pre-signed off">
191
+
192
+
##### Proxy storage
193
+
194
+
When in proxy mode, the Label Studio backend fetches objects server-side and streams them directly to the browser.
195
+
196
+
<imgsrc="/images/storages/storage-proxy.png"style="max-width:600px; margin: 0auto"alt="Diagram of proxy flow">
197
+
198
+
This has multiple benefits, including:
199
+
200
+
-**Security**
201
+
- Access to media files is further restricted based on Label Studio user roles and project access.
202
+
- This access is applied to cached files. This means that even if the media is cached, access will be restricted to that file if a user's access to the task is revoked.
203
+
- Data stays within the Label Studio network boundary. This is especially useful for on-prem environments who want to maintain a single entry point for their network traffic.
204
+
-**Configuration**
205
+
- No CORS settings are needed.
206
+
- No pre-signed permissions are needed.
207
+
208
+
To allow proxy storage, you need to ensure your permissions include the following:
209
+
210
+
{% details <b>AWS S3</b> %}
211
+
212
+
```json
213
+
{
214
+
"Version": "2012-10-17",
215
+
"Statement": [
216
+
{
217
+
"Effect": "Allow",
218
+
"Action": [
219
+
"s3:GetObject",
220
+
"s3:ListBucket"
221
+
],
222
+
"Resource": [
223
+
"arn:aws:s3:::your-bucket-name",
224
+
"arn:aws:s3:::your-bucket-name/*"
225
+
]
226
+
}
227
+
]
228
+
}
229
+
230
+
```
231
+
232
+
{% enddetails %}
233
+
234
+
<br>
235
+
236
+
{% details <b>Google Cloud Storage</b> %}
237
+
238
+
-`storage.objects.get` - Read object data and metadata
239
+
-`storage.objects.list` - List objects in the bucket (if using prefix)
240
+
241
+
{% enddetails %}
242
+
243
+
<br>
244
+
245
+
{% details <b>Azure Blob Storage</b> %}
246
+
247
+
Add the **Storage Blob Data Reader** role, which includes:
Large media files are streamed in sequential 8 MB chunks, which are split into different GET requests. This can result in frequent requests to the backend to get the next portion of data and uses additional resources.
257
+
258
+
You can configure this using the following environment variables:
259
+
260
+
* `RESOLVER_PROXY_MAX_RANGE_SIZE` - Defaults to 8 MB, and defines the largest chunk size returned per request.
261
+
* `RESOLVER_PROXY_TIMEOUT` - Defaults to 20 seconds, and defines the maximum time uWSGI workers spend on a single request.
262
+
263
+
264
+
##### Pre-signed URLs
265
+
266
+
In this scenario, your browser receives an HTTP 303 redirect to a time-limited S3/GCS/Azure presigned URL. This is the default behavior.
267
+
268
+
The main benefit to using pre-signed URLs is if you want to ensure that your media files are isolated **from** the Label Studio network as much as possible.
269
+
270
+
<imgsrc="/images/storages/storage-proxy-presigned.png"style="max-width:600px; margin: 0auto"alt="Diagram of presigned URL flow">
271
+
272
+
The permissions required for this are already included in the cloud storage configuration documentation below.
273
+
274
+
182
275
### Target storage
183
276
184
277
When annotators click **Submit** or **Update** while labeling tasks, Label Studio saves annotations in the Label Studio database.
0 commit comments