-
-
Notifications
You must be signed in to change notification settings - Fork 3
feat: Redis PubSub for Queuer side progress #225
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…into feature/issue-224-Revisit-Redis-PubSub
Now just calculating progress and task info every iteration of the loop instead of checking new data flags. It looks like there may be a race condition of some sort causing inaccurate worker numbers. I might consider thread locks if they work with Redis' pubsub run_in_thread.
Alright, this is kind of funny. Turns out I've been reinventing the wheel this whole time. Celery's GroupResult contains AsyncResult instances with all the data we need to get active worker count, task completion and task encoding progress (with a custom task state definition). So: - dropped Redis database polling - dropped Redis PubSub - use unique worker IDs instead of sequential prefix to easily start additional workers at any time
Ironically, revisiting PubSub made me realise the mechanism for deriving encoding progress and task status queuer-side has actually been available in Celery's AsyncResult this whole time... Literally reinvented the wheel. Anyway, it works! No more Redis-py, no more PubSub, no more polling the database, storing SHAs in dictionaries or researching bloom filters. It's siqque. I've also changed the worker names to use unique IDs instead of sequential numbering to allow multiple workers to be started at any time. Hopefully this can give way to automatic scaling support. |
Alright. Here's the dealio.
Proxima uses Celery with Redis to queue video transcode as tasks with multiple workers.
I want to use Redis pubsub to keep track of these long running tasks and Celery worker metrics. We are well within the use case but the desire to get a MVP for queuer side progress took precedence at the time. Polling the database manually is undesireable because:
Ideally I'd like:
How can we do this with pubsub?
We gave up on pubsub initially because of issues keeping the subscriber in sync with the publisher. Redis implicitly queues messages at the TCP buffer level when the subscriber is too slow to receive them. The fix is to properly implement asyncio handling to keep the subscriber thread unblocked. Acknowledge messages when they come through and dump them in a variable for the main loop to handle. For our use case, it's okay for us to drop messages. Guaranteed delivery is not important. Staying in sync is far more important for the sake of our progress accuracy. Better to drop frames than slow down the video!
Most Recent Worker Event
Most recent worker event is easy, we just need to check that the pubsub payload includes our group ID in its metadata.
Average Progress
Current implementation does a lot of gymnastics with a lot of spaghetti, task matching and additional polling are needed to get the average progress bar working. Workers are publishing individually and we need to aggregate and calculate average queuer-side. That's what's slowing down the loop performance and
Discrete progress
The discrete one is easy, since we're just increasing the ratio of complete tasks out of known queued tasks whenever we get a success message from a task that belongs to us. That handler works a treat.
Dramas
What probably doesn't help is the cross-reference gymnastics we're using to check if the task progress message belongs to the task group of the queuer. If we can get the Celery group ID on the worker side once we receive a new task, we can pass that as an argument into ffmpeg and set the channel pattern dynamically:
That's great! That means we only need one pubsub handler for task-progress.
Do the percentage calculations worker side too! Away with this advance / seconds processed vs total duration... This makes our calculations way easier. Everything is out of 100%
closes #224