locally-euclidean

This is an implementation of Facebook's Manifold blob storage API that Runs On A Computer(tm). The purpose of this service is to give buck2 somewhere to put its logs, buck2 rage output and other similar data that it would ordinarily send to Manifold at Facebook.

Why?

Rewriting the buck2 source code to use a different HTTP API is kind of pointless: the used subset of the Manifold API is not that complicated and it's not going to be S3 compatible anyway as it supports (and uses) appends for both multipart upload and uploads of unknown length. Since we have to write a service and it can't be a truly trivial S3 wrapper, we might as well just implement the Manifold HTTP API.

N.B. There exists a S3 storage class which is appendable, but it has a limit of 1000 object parts, and any proposed S3-based implementation would require significant rewriting to accommodate that edge case if we find out it hits that limit. Fixing that edge case would require becoming stateful among other things which introduce much complexity; it would also be necessary to have lifecycle rules, etc, and then one would have to deal with the service not Running On A Computer. We don't expect to hit the scale where that's necessary with this service, and if we do, the solution is probably to rotate >1 day old data out to S3.

Goals

The goals of this service are:

Fast to write and deploy
- Does not cause unexpected hassles once deployed
Operable: has OpenTelemetry and it's possible to know what it's doing
Simple
Store data we care about about as much as build logs (i.e. not very much)
- Auth is delegated to the proxy, intended to be deployed behind e.g. Tailscale; we do not need to keep these extremely secret
- Everything in this service is expected to be garbage-collected after a period of time, durability is not that important
Small scale: it will survive a terabyte of data without any rework, past that we should consider spending a couple days writing a better solution for that
Runs On A Computer: just needs a postgres, which contains all mutable data including file blobs.

For the reasons of quickness of writing it and the goals of not having to touch it much later, it's written in Rust.

Functionality

Buckets are defined by locally-euclidean maintenance create-bucket NAME [ttl].

PUT `/v0/write/:filename?bucketName=:bucketName`

This creates a file in the bucket with the given name and returns 201 Created.

Idempotent: if the file already exists with the provided content, 200 OK is returned. If the content is not matching, 409 Conflict is returned.

Takes the Content-Type header from the request and if not present, sets it to text/plain. This is what will be returned when browsing the file.

FIXME(in buck2): Add the content-type on upload of files. I don't want a content type sniffer. You don't want a content type sniffer. Let's not build one.

POST `/v0/append/:filename?bucketName=:bucketName&writeOffset=:writeOffset`

This appends to the file with the given name at the given offset and returns 200, assuming that the given position is at the end of the file. If the given position is not actually at the end of the file and it also doesn't match the chunk in the given position, 409 Conflict is returned.

Idempotent: if the uploaded data at the given offset is identical to the data uploaded, returns 200 OK.

GET `/explore/:bucketName/:filename`

This shows the file at the given path to the browser with the Content-Type given on upload.

Unanswered questions

Is it semantically acceptable to stream the request body?

Yes! We are writing into a transactional database. Just do the whole thing in a transaction, it's Fine(tm).

FIXME: currently creating a file and writing into it are in separate transactions IIRC, which is weird. We probably should fix that.

Setup for buck2

You want the following buckets; the TTL does not especially matter as buck2 sets it itself as well, and we will respect what it tells us (FIXME: in the future!):

buck2_logs: build logs
buck2_re_logs: remote execution logs
buck2_installer_logs: logs for the buck2 installer
buck2_rage_dumps: output from buck2 rage

Then, with a buck2 with the right patch, configure .buckconfig like so:

[buckets]
upload_url = https://locally-euclidean.example.com
file_view_url = https://locally-euclidean.example.com/explore/

[buck2]
log_url = https://locally-euclidean.example.com

This will upload logs to locally-euclidean automatically and allow downloading them transparently when they are not available locally.

Development

This is a pretty normal Rust project with the exception of oddities relating to sqlx. If you have a local cargo toolchain it will just work, modulo needing to have a database.

There's a nix and nix-direnv environment provisioned for you, which you can activate with direnv allow.

sqlx verifies SQL queries at build time using the DATABASE_URL environment variable, the results of which are cached in .sqlx/ via cargo sqlx prepare --workspace.

If you don't want to use a system postgres, the .envrc is configured by default to let you use process-compose up to start a project-specific postgres server and automatically configure it.

Since we use this caching feature, nix builds do not need a postgres in the cargo build itself and can just use temp-postgres for tests.

Database stuff

You can use the sqlx tools to do migration development:

Wipe DB and run migrations: sqlx database reset
Create a migration: sqlx migrate add 'initial schema'

Currently (this would be bad practice if the app were larger), migrations are run on application startup and no effort is made to prevent blowing up prod with this.

Don't write migrations that break back-compat for the prior version of the app.

Deploying our prod instance

If you work at Mercury, you currently have to manually deploy the prod instance.

Trigger this GitHub action (in our private repo) to deploy: https://github.com/MercuryTechnologies/infra-apps/actions/workflows/deploy-locally-euclidean.yml

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
.cargo		.cargo
.github/workflows		.github/workflows
.sqlx		.sqlx
migrations		migrations
nix		nix
scripts		scripts
server		server
storage		storage
.envrc		.envrc
.gitignore		.gitignore
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
LICENSE		LICENSE
README.md		README.md
flake.lock		flake.lock
flake.nix		flake.nix
garnix.yaml		garnix.yaml
process-compose.yaml		process-compose.yaml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

locally-euclidean

Why?

Goals

Functionality

PUT `/v0/write/:filename?bucketName=:bucketName`

POST `/v0/append/:filename?bucketName=:bucketName&writeOffset=:writeOffset`

GET `/explore/:bucketName/:filename`

Unanswered questions

Setup for buck2

Development

Database stuff

Deploying our prod instance

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

MercuryTechnologies/locally-euclidean

Folders and files

Latest commit

History

Repository files navigation

locally-euclidean

Why?

Goals

Functionality

PUT /v0/write/:filename?bucketName=:bucketName

POST /v0/append/:filename?bucketName=:bucketName&writeOffset=:writeOffset

GET /explore/:bucketName/:filename

Unanswered questions

Setup for buck2

Development

Database stuff

Deploying our prod instance

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

PUT `/v0/write/:filename?bucketName=:bucketName`

POST `/v0/append/:filename?bucketName=:bucketName&writeOffset=:writeOffset`

GET `/explore/:bucketName/:filename`

Packages