Skip to content

refactor: Job dict object as Class #234

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 38 commits into from
Nov 16, 2022

Conversation

github-actions[bot]
Copy link
Contributor

Original issue description

Problem

We've been living dangerously for the past couple years since our humble beginings, and we're still passing a dictionary around as our job object. This isn't great. We don't get any type-validation or intellisense that comes with classes. And, ignoring the FP vs OOP debate, we don't get any instance methods or shared state between very related functions. Most of our handlers can join the job class. We're iterating the same jobs over and over again to filter, validate and prompt based on different criteria of the same data, which is very inefficient. We could do most of this on __init__.

Dataclasses

Dataclasses do away with the __init__ section to make working with lots of variables easy. They also implement some common class methods like _repr_ and __eq__ to allow comparing classes. They keep data-centric classes tidy, helping keep bugs to a minimum. Implementing dataclasses will win us:

  • static type safety
  • a single object to structure our data in instead of a loosely defined dict, meaning:
    • easier comparisons
    • more solid definitions

Unfortunately we have some hurdles to make them work.

Mutability

Currently, jobs have data that can be treated as immutable, mostly in the form of clip properties and project metadata. Some of this metadata we manipulate to reflect user choices and detected paths, notably:

  • proxy - Set to None to rerender offline
  • proxy_media_path - set to determined output path

We use the existing keys to save us from extra complications later – these are big dictionaries, and potentially a lot of them to iterate through!

Unfortunately this is slow 🐌 since we're constantly handling mutable data, and there's no static type safety. Just KeyError, TypeError or worst of all NoneType has no attribute ___

Parsing

We need to parse certain values before we use them in certain places. That's exactly what the __init__ section is for, handling the instance variables... but that's a lot of handling! We don't need to handle them all. To avoid handling them all, we could update only the ones we need using the __post_init__ dunder.

Clip properties form the main 'chunk' of a job.

clip_properties = {

    "clip_name": cp["Clip Name"],
    "file_name": cp["File Name"],
    "file_path": cp["File Path"],
    "duration": cp["Duration"],
    "resolution": list(cp["Resolution"]).split("x"),
    "frames": int(cp["Frames"]),
    "fps": float(cp["FPS"]),
    "h_flip": True if cp["H-FLIP"] == "On" else False,
    "v_flip": True if cp["H-FLIP"] == "On" else False,
    "proxy_dir": proxy_dir,
    "start": int(cp["Start"]),
    "end": int(cp["End"]),
    "start_tc": cp["Start TC"],
    "proxy_status": cp["Proxy"],
    "proxy_media_path": cp["Proxy Media Path"]

    if not len(cp["Proxy Media Path"])
    else cp["Proxy Media Path"],

    "end_tc": cp["End TC"],
    "media_pool_item": media_pool_item,

}

Then there's project properties that we add later.

        project=project_name,
        timeline=timeline_name,
        proxy_settings=settings["proxy"],
        paths_settings=settings["paths"],

    )

Proposal

A traditional class, Job, with immutable, SourceMetadata and ProjectMetadata dataclasses; and dynamic variables to easily update mediapoolitem , proxy status and any changing paths.

All types are set to target types, like bool for h_flip and v_flip, list for horizontal and vertical resolution, instead of str for everything. This means the worker gets what it expects, pre-parsed.

Instead of passing proxy and path settings, we pass the entire settings object, then we can validate the object as type SettingsManager

Any handlers that don't prompt the user can be become class methods, like getting a collision-free output path.

from typing import Any, Union

@dataclass(frozen=True)
class SourceMetadata:

    clip_name: str
    file_name: str
    file_path: str
    duration: str
    resolution: list
    frames: int
    fps: float
    h_flip: bool
    v_flip: bool
    proxy_dir: str
    start: int
    end int
    start_tc: str
    proxy_status: str
    proxy_media_path: str
    end_tc: str
    
@dataclass(frozen=True)
class ProjectMetadata:
	project_name: str
	timeline_name: str
	[etc...]

# Create a list of MediaPoolItems that stay in the queuer. 
# We look up our items for post-encode link using file-name.
class MediaPoolRefs():
	def __init__(self):
		self.media_pool_items = list()

	def add_ref(self, filename:str, media_pool_item:PyRemoteObj):
		"""filename should reference source media media_pool_item belongs to"""
		if filename not in self.media_pool_items:
			if media_pool_item not in self.media_pool_items:
				self.media_pool_items.append({filename, media_pool_item})
				return
			raise ValueError(f"{media_pool_item} already registered!")
		raise ValueError(f"{filename} already registered!")
		
	def get_ref(self, filename:str, all:bool=False):

		if all:
			return dict(self.media_pool_items) 

		mpi = self.media_pool_items.get(filename)
		if mpi == None:
			raise ValueError(f"{filename} not registered!")
		return mpi
		
	def drop_refs(self):
		self.media_pool_items = []
	
class Job():
	def __init__(
		self, 
		src_meta: SourceMetadata, 
		proj_meta: ProjectMetadata,
		settings: SettingsManager,
	):
	
		# Get data
		self.source_metadata = source_metadata
		self.project_metadata = project_metadata

		# Get dynamic vars
		self.output_path = self._get_output_path()
		self.online = _is_online()
		self.orphan = self._is_orphan()
		self.linkable = self._is_linkable()

	def is_online(self):
		status = self.source_metadata.proxy_status
		switch = {
			"Offline": None,
			"None": False,
		}
		# Status is resolution (depends on source-res) when online
		return switch.get(status, True)

	# Refactor 'handle_file_collisions'
	def _get_output_path(self):
		# increment_file_if_exist_blah...
		return output_path


	def is_orphan(self)
		if not self.proxy_status:
			# check if orphaned blah...
		return True


media_pool_refs = MediaPoolRefs()

# approximately like so
def get_proxy_jobs(r_, media_pool_ref, settings):
	for mpi in mpis:
		cp = mpi.GetClipProperty()
		# bunch of filtering...

		project_metadata = ProjectMetadata(r_.project, r_.timeline)
		
		# Some picking and parsing, need to get hands dirty
		source_metadata = SourceMetadata(
		
            clip_name = cp["Clip Name"],
            file_name = cp["File Name"],
            file_path = cp["File Path"],
            duration = cp["Duration"],
            resolution = list(cp["Resolution"]).split("x"),
            frames = int(cp["Frames"]),
            fps = float(cp["FPS"]),
            h_flip = True if cp["H-FLIP"] == "On" else False,
            v_flip = True if cp["H-FLIP"] == "On" else False,
            proxy_status = cp["Proxy"],
            proxy_media_path = cp["Proxy Media Path"]
            proxy_dir = proxy_dir,
            start = int(cp["Start"]),
            end = int(cp["End"]),
            start_tc = cp["Start TC"],
            end_tc = cp["End TC"]            
        )

		# Define job
		job = Job(project_metadata, source_metadata, settings)
		
		# Add media pool item reference
		media_pool_ref.add_mpi(source_metadata.file_name, mpi)

Moving forward from here

At some point we'll want to introduce chunking. Since the resolve module actually instantiates all the Jobs with get_proxy_jobs(), it returns them to the queuer class. If we want them to be chunked, we need to re instantiate each job, since every job should be a chunk. We don't want to re-init everything again by making new classes...

We can merge project_metadata dataclass with source_metadata. That way we don't have to iterate each source_metadata object and add the same project_metadata data.

We can make MediaPoolRefs a Singleton class. That way we can register all the media_pool_items from the Resolve class and just import MediaPoolRefs in the queuer module. The reason why we can't ship them with source_metadata is that they are unhashable, referencing a pointer in memory to a proprietary API, and will fail transport in Celery. We also don't want to remove them, and we want easy access to them from other modules. A singleton provides that.

There seems to be some discouragement of singletons in Python. Many call them an anti-pattern. We may be able to substitute them for a plain module? All global code and functions. We'd just have to be careful that we are indeed importing the same module instance whenever we do.

If that works out well for us, it may be an idea to re-evaluate SettingsManager as a plain module.

Perhaps instead of returning jobs when we call get_proxy_jobs, we can add get_proxy_jobs and other sister functions to the Resolve class. Then we can save any variables we need to that self and call them like so:

# job module

class Job():
	# All that job stuff from the 
	# Job class code block before

def construct_jobs(
    self, 
	settings:SettingsManager, 
    source_metadata: SourceMetadata,
    chunky:bool=True
)

	if chunky:
		sm = sm.i_like_em_chunky(seconds=30)

	jobs = [Job(x, settings) for x in sm]
	self.jobs = jobs
	return jobs
from resolve import Resolve
from SettingsManager import settings
from job import Job, MediaPoolRefs

r = Resolve(settings)
sm = r.get_source_metadata(make_mpi_refs=True)
jobs = job.construct_jobs(settings, sm, chunky=True)


mpr = MediaPoolRefs()
all_refs = mpr.get_ref(all=True)

closes #202

Copy link
Owner

@in03 in03 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reckon we've got it!

@in03 in03 marked this pull request as ready for review November 16, 2022 05:29
@in03 in03 merged commit cf93b40 into main Nov 16, 2022
@in03 in03 deleted the feature/issue-202-Refactor-job-dict-object-as-Class branch November 16, 2022 05:30
@in03 in03 changed the title Refactor job dict object as Class refactor: Job dict object as Class Nov 16, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

refactor: Job dict object as Class
1 participant