Skip to content

feat: Redis PubSub for Queuer side progress #225

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 17 commits into from
Sep 26, 2022

Conversation

github-actions[bot]
Copy link
Contributor

@github-actions github-actions bot commented Sep 6, 2022

Alright. Here's the dealio.

Proxima uses Celery with Redis to queue video transcode as tasks with multiple workers.
I want to use Redis pubsub to keep track of these long running tasks and Celery worker metrics. We are well within the use case but the desire to get a MVP for queuer side progress took precedence at the time. Polling the database manually is undesireable because:

  • it reinvents the wheel, and I'm not as good a wheel designer as the Redis team
  • added complexity reduces readability and maintainability
  • polling creates a new TCP connection for each poll in contrast to pubsub sockets

Ideally I'd like:

  • most recent Celery worker event ('started', 'finished', 'failed') console log
  • all tasks average progress bar
  • discrete task completion progress bar or text as fraction at end of average progress bar

How can we do this with pubsub?

We gave up on pubsub initially because of issues keeping the subscriber in sync with the publisher. Redis implicitly queues messages at the TCP buffer level when the subscriber is too slow to receive them. The fix is to properly implement asyncio handling to keep the subscriber thread unblocked. Acknowledge messages when they come through and dump them in a variable for the main loop to handle. For our use case, it's okay for us to drop messages. Guaranteed delivery is not important. Staying in sync is far more important for the sake of our progress accuracy. Better to drop frames than slow down the video!

Most Recent Worker Event
Most recent worker event is easy, we just need to check that the pubsub payload includes our group ID in its metadata.

Average Progress
Current implementation does a lot of gymnastics with a lot of spaghetti, task matching and additional polling are needed to get the average progress bar working. Workers are publishing individually and we need to aggregate and calculate average queuer-side. That's what's slowing down the loop performance and

Discrete progress
The discrete one is easy, since we're just increasing the ratio of complete tasks out of known queued tasks whenever we get a success message from a task that belongs to us. That handler works a treat.

Dramas

What probably doesn't help is the cross-reference gymnastics we're using to check if the task progress message belongs to the task group of the queuer. If we can get the Celery group ID on the worker side once we receive a new task, we can pass that as an argument into ffmpeg and set the channel pattern dynamically:

channel: task-progress-gh84d

data:
task-id: "jk39d"
percent: 20

That's great! That means we only need one pubsub handler for task-progress.

Do the percentage calculations worker side too! Away with this advance / seconds processed vs total duration... This makes our calculations way easier. Everything is out of 100%
closes #224

@in03 in03 changed the title Revisit Redis PubSub feat: Redis PubSub for Queuer side progress Sep 6, 2022
in03 added 15 commits September 6, 2022 13:33
…into feature/issue-224-Revisit-Redis-PubSub
Now just calculating progress and task info every iteration of the loop
instead of checking new data flags. It looks like there may be a race
condition of some sort causing inaccurate worker numbers.

I might consider thread locks if they work with Redis' pubsub run_in_thread.
Alright, this is kind of funny. Turns out I've been reinventing the wheel
this whole time. Celery's GroupResult contains AsyncResult instances with
all the data we need to get active worker count, task completion and
task encoding progress (with a custom task state definition).

So:
- dropped Redis database polling
- dropped Redis PubSub
- use unique worker IDs instead of sequential prefix to easily start
additional workers at any time
@in03 in03 marked this pull request as ready for review September 26, 2022 06:43
@in03
Copy link
Owner

in03 commented Sep 26, 2022

Ironically, revisiting PubSub made me realise the mechanism for deriving encoding progress and task status queuer-side has actually been available in Celery's AsyncResult this whole time... Literally reinvented the wheel.

Anyway, it works! No more Redis-py, no more PubSub, no more polling the database, storing SHAs in dictionaries or researching bloom filters. It's siqque.

I've also changed the worker names to use unique IDs instead of sequential numbering to allow multiple workers to be started at any time. Hopefully this can give way to automatic scaling support.

@in03 in03 merged commit f75e167 into main Sep 26, 2022
@in03 in03 deleted the feature/issue-224-Revisit-Redis-PubSub branch September 26, 2022 06:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Revisit Redis PubSub
1 participant