-
-
Notifications
You must be signed in to change notification settings - Fork 3
feat: Queuer-side progress indication #190
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
- Removed some dead code - Turned some very detached statements into utils functions - Centralised logging statements
Update task progress from ffmpeg_process using task id and Redis connection.
MVP atm. Set up pretty basically. Currently each task gets its own bar. - Uses redis pub/sub, to communicate encoding progress updates from worker - Basic progress bar routing based on Celery task ID Next steps: - Get average progress from all reporting tasks instead of multiple bars - Get task group active worker count as single-line spinner - Secondary progress bar to display discretely total tasks done
…ttps://github.com/in03/resolve-proxy-encoder into feature/issue-145-Queuer-side-progress-indication
Looks like it's working! Had to refactor a lot. And more refactoring will be necessary. Looking like a bit of a dog's breakfast. But now we have secondary bar showing discrete total progress with a 'last encoded' data field. Nice.
…ttps://github.com/in03/resolve-proxy-encoder into feature/issue-145-Queuer-side-progress-indication
- Worker spinner displays host machine count, will alter to worker count - Task progress bar displays output file name instead of task_id now
…ttps://github.com/in03/resolve-proxy-encoder into feature/issue-145-Queuer-side-progress-indication
Rationale for changes: Job expiry allows jobs that are never received to be cleared, so they don't stick around until a worker picks them up two days later. This should also deal with broken tasks that cannot be received for some reason. We're also beginning to make terminology changes that fully commit to Redis, since we're relying on it for progress report and webapp monitor and can't use other brokers now. Hence 'job' instead of Celery's 'task'. We want to abstract Celery's settings a bit since the end user doesn't need the granular access anymore. -
- Renamed 'celery' section 'broker' - Dropped 'result_backend' key, 'host_address' key, 'broker_url' key for simple 'url' - Dropped 'flower_url' since being replaced by webapp monitor
Also removed 'mon' subcommand that relies on removed 'flower_url'. Easily enough saved as a desktop shortcut, but will likely be superseded by webapp monitor anyway.
Moved 'worker_concurrency', 'worker_prefetch_multiplier' and 'worker_max_tasks_per_child' to celery.py app.conf.update section. These settings force Celery workers to run tasks one at a time. That means one task per process at a time. Higher concurrency on a host requires more processes to be run. This is the workaround for prefork not being supported on windows. As such the worker launcher is set up to work with this pattern only. There's no point supporting other methods for other OSes, since they're all compatible with this one. At this point, changing the concurrency settings would break the launcher anyway.
app.config_from_object expected keys to match celery documentation of course. Key names are a little obtuse for end users not touching Celery. Since dropping support for other brokers, Redis result backend and broker url are the same. We just remap the terms in a new dict. It's another thing to maintain, but hopefully won't change much.
Broker command acts as a gateway to Celery commands. This is necessary since access to Celery is buried deep within pipx virtualenv if following recommended install procedure. Initial support only. Doesn't support any options yet, just command. We need to figure out how to pass options ('-f', '-Q', etc) unparsed by Typer. Also fixed purge command. Purge command only purged default queue ('celery'). We now pass the version constraint key to purge jobs on our version.
- Refactored TaskTracker as broker and ProgressTracker classes - Started trying to fix average progress... Not there yet - Added worker pickup/success/fail logging above progress - Added logger debug messages above progress - Remove unused imports�
…ttps://github.com/in03/resolve-proxy-encoder into feature/issue-145-Queuer-side-progress-indication
- Added overall progress bar - Check prevents progress going backwards, may need additional work - Worker status updates printed same line, missing all but "STARTED"
…ttps://github.com/in03/resolve-proxy-encoder into feature/issue-145-Queuer-side-progress-indication
Assertion was causing corrupt proxies to crash Resolve Proxy Encoder. Not sure why I put it in there. Probably just for testing... Dummy.
…ttps://github.com/in03/resolve-proxy-encoder into feature/issue-145-Queuer-side-progress-indication
This really was to get this long-awaited feature out the door. Issues with messages buffering with Redis pub/sub was preventing progress from showing accurately. It would slow right down when lots of tasks were queued. Pub/sub probably needs some multi-threading support to work properly. Since the buffering is taking place at a TCP level, we can't just ask Redis to ignore 'guaranteed delivery' (which is not important for us). This works for now, but I'm no expert on interacting with databases. Otherwise, the pub/sub pattern worked really nicely for the following reasons: - easy channel pattern matching - channel-based handlers - messages are only received once, as opposed to having to ignore the same keys with get/set - keys aren't stored in Redis once all subscribers have received
Worker names are difficult to get without deep-diving Celery signals. Instead of getting worker names through progress reports from the worker task, we can just catch task events and get worker names from there. If the name is unique, we count it. Done.
Forgot to remove the argument passed to the ffmpeg run call from tasks. It was throwing: TypeError: run() got an unexpected keyword argument 'worker_name'
To avoid reading the same data multiple times, we need to make sure we can compare new data to stale data. I'm not sure what the best way to do it is, but we were storing all the progress dicts and comparing each new dict with the list of old ones. Surely performance would eventually degrade and storing whole dicts is not wise. I've replaced storing dicts with storing quick SHA1 sums. Calculating the sums doesn't seem to affect performance negatively. Will keep an eye on it. Open to suggestions.
Sometimes when all tasks finish and progress is still going, we can get a DivisionByZero error. This happens because the criteria for loop finish is actually Celery's `results.ready`, not when progress bar reaches end.
…ttps://github.com/in03/resolve-proxy-encoder into feature/issue-145-Queuer-side-progress-indication
I think we can consider this mergable now. It serves its basic purpose, but there are still some niggly things to iron out. They're not easy fixes though, so we'll merge for now and come back to it later. Maybe someone more knowledgable on progress bars can help. Known issues
|
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
It would be really nice if we could have some indication of progress (ideally a progress bar) showing how far through the current jobs we have. Alive progress handles multiple nested progress bars. That could be really cool, although not sure how feasible depending on how many jobs we have queued. Otherwise an overall progress bar would be fine. Some single-line logging on which files have finished would be awesome too.
closes #145