refactor: Job dict object as Class #234

github-actions · 2022-10-18T03:49:22Z

Original issue description

Problem

We've been living dangerously for the past couple years since our humble beginings, and we're still passing a dictionary around as our job object. This isn't great. We don't get any type-validation or intellisense that comes with classes. And, ignoring the FP vs OOP debate, we don't get any instance methods or shared state between very related functions. Most of our handlers can join the job class. We're iterating the same jobs over and over again to filter, validate and prompt based on different criteria of the same data, which is very inefficient. We could do most of this on __init__.

Dataclasses

Dataclasses do away with the __init__ section to make working with lots of variables easy. They also implement some common class methods like _repr_ and __eq__ to allow comparing classes. They keep data-centric classes tidy, helping keep bugs to a minimum. Implementing dataclasses will win us:

static type safety
a single object to structure our data in instead of a loosely defined dict, meaning:
- easier comparisons
- more solid definitions

Unfortunately we have some hurdles to make them work.

Mutability

Currently, jobs have data that can be treated as immutable, mostly in the form of clip properties and project metadata. Some of this metadata we manipulate to reflect user choices and detected paths, notably:

proxy - Set to None to rerender offline
proxy_media_path - set to determined output path

We use the existing keys to save us from extra complications later – these are big dictionaries, and potentially a lot of them to iterate through!

Unfortunately this is slow 🐌 since we're constantly handling mutable data, and there's no static type safety. Just KeyError, TypeError or worst of all NoneType has no attribute ___

Parsing

We need to parse certain values before we use them in certain places. That's exactly what the __init__ section is for, handling the instance variables... but that's a lot of handling! We don't need to handle them all. To avoid handling them all, we could update only the ones we need using the __post_init__ dunder.

Clip properties form the main 'chunk' of a job.

clip_properties = {

    "clip_name": cp["Clip Name"],
    "file_name": cp["File Name"],
    "file_path": cp["File Path"],
    "duration": cp["Duration"],
    "resolution": list(cp["Resolution"]).split("x"),
    "frames": int(cp["Frames"]),
    "fps": float(cp["FPS"]),
    "h_flip": True if cp["H-FLIP"] == "On" else False,
    "v_flip": True if cp["H-FLIP"] == "On" else False,
    "proxy_dir": proxy_dir,
    "start": int(cp["Start"]),
    "end": int(cp["End"]),
    "start_tc": cp["Start TC"],
    "proxy_status": cp["Proxy"],
    "proxy_media_path": cp["Proxy Media Path"]

    if not len(cp["Proxy Media Path"])
    else cp["Proxy Media Path"],

    "end_tc": cp["End TC"],
    "media_pool_item": media_pool_item,

}

Then there's project properties that we add later.

        project=project_name,
        timeline=timeline_name,
        proxy_settings=settings["proxy"],
        paths_settings=settings["paths"],

    )

Proposal

A traditional class, Job, with immutable, SourceMetadata and ProjectMetadata dataclasses; and dynamic variables to easily update mediapoolitem , proxy status and any changing paths.

All types are set to target types, like bool for h_flip and v_flip, list for horizontal and vertical resolution, instead of str for everything. This means the worker gets what it expects, pre-parsed.

Instead of passing proxy and path settings, we pass the entire settings object, then we can validate the object as type SettingsManager

Any handlers that don't prompt the user can be become class methods, like getting a collision-free output path.

from typing import Any, Union

@dataclass(frozen=True)
class SourceMetadata:

    clip_name: str
    file_name: str
    file_path: str
    duration: str
    resolution: list
    frames: int
    fps: float
    h_flip: bool
    v_flip: bool
    proxy_dir: str
    start: int
    end int
    start_tc: str
    proxy_status: str
    proxy_media_path: str
    end_tc: str
    
@dataclass(frozen=True)
class ProjectMetadata:
	project_name: str
	timeline_name: str
	[etc...]

# Create a list of MediaPoolItems that stay in the queuer. 
# We look up our items for post-encode link using file-name.
class MediaPoolRefs():
	def __init__(self):
		self.media_pool_items = list()

	def add_ref(self, filename:str, media_pool_item:PyRemoteObj):
		"""filename should reference source media media_pool_item belongs to"""
		if filename not in self.media_pool_items:
			if media_pool_item not in self.media_pool_items:
				self.media_pool_items.append({filename, media_pool_item})
				return
			raise ValueError(f"{media_pool_item} already registered!")
		raise ValueError(f"{filename} already registered!")
		
	def get_ref(self, filename:str, all:bool=False):

		if all:
			return dict(self.media_pool_items) 

		mpi = self.media_pool_items.get(filename)
		if mpi == None:
			raise ValueError(f"{filename} not registered!")
		return mpi
		
	def drop_refs(self):
		self.media_pool_items = []
	
class Job():
	def __init__(
		self, 
		src_meta: SourceMetadata, 
		proj_meta: ProjectMetadata,
		settings: SettingsManager,
	):
	
		# Get data
		self.source_metadata = source_metadata
		self.project_metadata = project_metadata

		# Get dynamic vars
		self.output_path = self._get_output_path()
		self.online = _is_online()
		self.orphan = self._is_orphan()
		self.linkable = self._is_linkable()

	def is_online(self):
		status = self.source_metadata.proxy_status
		switch = {
			"Offline": None,
			"None": False,
		}
		# Status is resolution (depends on source-res) when online
		return switch.get(status, True)

	# Refactor 'handle_file_collisions'
	def _get_output_path(self):
		# increment_file_if_exist_blah...
		return output_path


	def is_orphan(self)
		if not self.proxy_status:
			# check if orphaned blah...
		return True


media_pool_refs = MediaPoolRefs()

# approximately like so
def get_proxy_jobs(r_, media_pool_ref, settings):
	for mpi in mpis:
		cp = mpi.GetClipProperty()
		# bunch of filtering...

		project_metadata = ProjectMetadata(r_.project, r_.timeline)
		
		# Some picking and parsing, need to get hands dirty
		source_metadata = SourceMetadata(
		
            clip_name = cp["Clip Name"],
            file_name = cp["File Name"],
            file_path = cp["File Path"],
            duration = cp["Duration"],
            resolution = list(cp["Resolution"]).split("x"),
            frames = int(cp["Frames"]),
            fps = float(cp["FPS"]),
            h_flip = True if cp["H-FLIP"] == "On" else False,
            v_flip = True if cp["H-FLIP"] == "On" else False,
            proxy_status = cp["Proxy"],
            proxy_media_path = cp["Proxy Media Path"]
            proxy_dir = proxy_dir,
            start = int(cp["Start"]),
            end = int(cp["End"]),
            start_tc = cp["Start TC"],
            end_tc = cp["End TC"]            
        )

		# Define job
		job = Job(project_metadata, source_metadata, settings)
		
		# Add media pool item reference
		media_pool_ref.add_mpi(source_metadata.file_name, mpi)

Moving forward from here

At some point we'll want to introduce chunking. Since the resolve module actually instantiates all the Jobs with get_proxy_jobs(), it returns them to the queuer class. If we want them to be chunked, we need to re instantiate each job, since every job should be a chunk. We don't want to re-init everything again by making new classes...

We can merge project_metadata dataclass with source_metadata. That way we don't have to iterate each source_metadata object and add the same project_metadata data.

We can make MediaPoolRefs a Singleton class. That way we can register all the media_pool_items from the Resolve class and just import MediaPoolRefs in the queuer module. The reason why we can't ship them with source_metadata is that they are unhashable, referencing a pointer in memory to a proprietary API, and will fail transport in Celery. We also don't want to remove them, and we want easy access to them from other modules. A singleton provides that.

There seems to be some discouragement of singletons in Python. Many call them an anti-pattern. We may be able to substitute them for a plain module? All global code and functions. We'd just have to be careful that we are indeed importing the same module instance whenever we do.

If that works out well for us, it may be an idea to re-evaluate SettingsManager as a plain module.

Perhaps instead of returning jobs when we call get_proxy_jobs, we can add get_proxy_jobs and other sister functions to the Resolve class. Then we can save any variables we need to that self and call them like so:

# job module

class Job():
	# All that job stuff from the 
	# Job class code block before

def construct_jobs(
    self, 
	settings:SettingsManager, 
    source_metadata: SourceMetadata,
    chunky:bool=True
)

	if chunky:
		sm = sm.i_like_em_chunky(seconds=30)

	jobs = [Job(x, settings) for x in sm]
	self.jobs = jobs
	return jobs

from resolve import Resolve
from SettingsManager import settings
from job import Job, MediaPoolRefs

r = Resolve(settings)
sm = r.get_source_metadata(make_mpi_refs=True)
jobs = job.construct_jobs(settings, sm, chunky=True)


mpr = MediaPoolRefs()
all_refs = mpr.get_ref(all=True)

closes #202

…ttps://github.com/in03/resolve-proxy-encoder into feature/issue-202-Refactor-job-dict-object-as-Class

in03

I reckon we've got it!

Create draft PR for #202

25f9fe7

github-actions bot added the enhancement label Oct 18, 2022

in03 added 28 commits October 25, 2022 09:10

fix: Celery progress invalid type

40bbfd5

refactor: Add media pool index

3b6ba0a

build: Remove unused import

6fdbd08

refactor: Add already registered exception

a37d2d2

refactor: Add job class

d88de14

refactor: Add jobs class

f2283d7

refactor(link.py): Use PyDavinci

f27a69b

refactor: new job classes replace dict

e47aa4b

refactor(resolve.py): Module reshuffle, pydavinci

2e42e28

refactor: WIP - module-style single instance

54bb07c

feat(settings): Add proxy suffix match criteria

800762c

feat(settings): Add proxy preset nickname

fba54ac

refactor: Remove handlers module

4efe145

build: Add own forked dependencies

c993d93

refactor: Remove obvious default settings

a77c075

build: Export updated requirements.txt

b105ecf

feat: Add typer Rich

fe33e78

fix(tasks): Reject task if no metadata

74a45f5

fix(ffprobe): Raise instead of exit if no data

ab95cb1

feat(utils): Add missing annotation

d3fe20c

feat: Add Celery reject

80ee86d

Merge branch 'feature/issue-202-Refactor-job-dict-object-as-Class' of h…

ad7ce72

…ttps://github.com/in03/resolve-proxy-encoder into feature/issue-202-Refactor-job-dict-object-as-Class

refactor: MVP classes everywhere

e18ff57

feat: Fully int job class queuer-side

77e0f3f

build: Export updated requirements.txt

195d8bc

feat: Job class int worker-side

784cc01

fix: Collision free path assertion error

739f957

docs: Add missing docstrings

77086cc

in03 added 9 commits November 16, 2022 12:38

Merge branch 'feature/issue-202-Refactor-job-dict-object-as-Class' of h…

9b15696

…ttps://github.com/in03/resolve-proxy-encoder into feature/issue-202-Refactor-job-dict-object-as-Class

refactor: Update version_constraint_key

40ac11c

refactor: Restructure modules again

9e94d09

build: Move packages to src dir

1878b72

Merge branch 'feature/issue-202-Refactor-job-dict-object-as-Class' of h…

581611f

…ttps://github.com/in03/resolve-proxy-encoder into feature/issue-202-Refactor-job-dict-object-as-Class

build: Export updated requirements.txt

5f6fde5

fix: VC key relative import

a923f85

Merge branch 'feature/issue-202-Refactor-job-dict-object-as-Class' of h…

f970aab

…ttps://github.com/in03/resolve-proxy-encoder into feature/issue-202-Refactor-job-dict-object-as-Class

refactor: Update version_constraint_key

c181791

in03 reviewed Nov 16, 2022

View reviewed changes

in03 marked this pull request as ready for review November 16, 2022 05:29

in03 merged commit cf93b40 into main Nov 16, 2022

in03 deleted the feature/issue-202-Refactor-job-dict-object-as-Class branch November 16, 2022 05:30

in03 approved these changes Nov 16, 2022

View reviewed changes

in03 changed the title ~~Refactor job dict object as Class~~ refactor: Job dict object as Class Nov 16, 2022

in03 mentioned this pull request Feb 7, 2023

Wildcard matching is too broad when matching linkable existing proxies #215

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor: Job dict object as Class #234

refactor: Job dict object as Class #234

github-actions bot commented Oct 18, 2022

in03 left a comment

refactor: Job dict object as Class #234

refactor: Job dict object as Class #234

Conversation

github-actions bot commented Oct 18, 2022

Problem

Dataclasses

Mutability

Parsing

Proposal

Moving forward from here

in03 left a comment

Choose a reason for hiding this comment