Skip to content

Allow for manual execution of long running scripts (in MIT Learn) #3189

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
shanbady opened this issue May 7, 2025 · 1 comment
Open

Allow for manual execution of long running scripts (in MIT Learn) #3189

shanbady opened this issue May 7, 2025 · 1 comment

Comments

@shanbady
Copy link
Contributor

shanbady commented May 7, 2025

Description/Context

There are times where we will need to ssh into a pod in order to run (potentially long running) scripts via django shell. Currently it seems like the pods are frequently culled and forces me re-run kubectl get pods -n mitlearn find a valid pod name and then ssh back in - only to have it again culled moments later.

We need some way to allow for instances of this where a developer will need to ssh in and run some long running script(s).

The other potential problem case (havnt confirmed this) is with the celery worker pods - There are some tasks such as etl pipelines and embeddings that take a while to run - the tasks themselves are resilient to restarts but if the pods are too ephemeral, i can see this causing certain celery tasks to endlessly restart. On heroku there was something similar happening on rc (but that was due to resource constraints)

@feoh
Copy link
Contributor

feoh commented May 7, 2025

OK one response: There is a way to schedule long running jobs in kubernetes which can't be killed. You can find the recipe here. If you want the process to NOT be attached to your tty so you can log out, walk away, etc omit the --tty, but then you won't get immediate interactive output and will need to query the logs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants