Skip to content

Support for SDXL refiner #227

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 33 commits into from
Sep 26, 2023
Merged

Support for SDXL refiner #227

merged 33 commits into from
Sep 26, 2023

Conversation

ZachNagengast
Copy link
Contributor

@ZachNagengast ZachNagengast commented Aug 3, 2023

This PR provides updates for refiner conversion to the torch2coreml script and inference utilities to the StableDiffusionXLPipeline.

How to test

Conversion (standalone):
python -m python_coreml_stable_diffusion.torch2coreml --convert-text-encoder --convert-vae-decoder --convert-vae-encoder --convert-unet --model-version stabilityai/stable-diffusion-xl-refiner-1.0 --compute-unit ALL --attention-implementation ORIGINAL --bundle-resources-for-swift-cli -o <your_output_path> --xl-version

Conversion (base + refiner):
python -m python_coreml_stable_diffusion.torch2coreml --convert-text-encoder --convert-vae-decoder --convert-vae-encoder --convert-unet --model-version stabilityai/stable-diffusion-xl-base-1.0 --refiner-version stabilityai/stable-diffusion-xl-refiner-1.0 --compute-unit ALL --attention-implementation ORIGINAL --bundle-resources-for-swift-cli -o <your_output_path> --xl-version

CLI Inference:
swift run StableDiffusionSample "a photo of an astronaut riding a horse on mars" --resource-path <your_output_path>/Resources --step-count 50 --save-every 5 --strength 0.5 --output-path <your_output_path> --image <your_input_image_path>.png --xl

Summary of the changes

Conversion script

  • Update to the latest DiffusionPipeline as it now automatically sets up the pipeline based on model name, so it can share all the same logic with SDXL and SD 1/2 pipelines
  • SDXL refiner does not have a text_encoder, only text_encoder_2, so this handles either of them being missing
  • VAE encoder uses fp32 precision for SDXL or else it overflows
  • Updated the vae encoder input variable name from "z" to "x" to match huggingface/diffusers library
  • Changed unet time_ids input shape from [12] to [2, {5,6}] to better match huggingface/diffusers
  • Unet inputs now include time_ids that can either be [2,6] or [2,5] shape to support the refiner's aesthetic_score conditioning

Part of SDXL's micro-conditioning as explained in section 2.2 of
https://huggingface.co/papers/2307.01952. Can be used to
simulate an aesthetic score of the generated image by influencing the position and negative text conditions.

Inference

  • The existing function plannerRGBShapedArray() was having major issues with 1024-sized images, so I had to add new logic for converting cgimages to a normalized MLShapedArray with the new toRGBShapedArray()
  • Added configs for all time_ids parameters, although it assumes a square image right now. Will need to be updated to support aspect ratios.
  • Since there are models out there that still have the time_ids with shape [12], I have both this and the new shapes handled.
  • Refiner only uses 1 text encoder with shape [1, 77, 1280], so I updated encodePrompt() to recognize this.

Now for the catch:

I noticed that trying to run inference on the CLI wasn't working quite right, and I figured out that it needed the Unet to be float32 precision to work. I'm not sure why this happens, and the refiner works perfectly fine when running through the Diffusers app.

So the question is for this PR - would you prefer to default to float32 for the refiner (adds 4.5gb to the Unet), or disable the CLI for refiner unless the unet is in float32 precision, and give the option as a conversion argument? Or some other path? Open to any ideas of how to best go about it.

Sample outputs:

Base here done with 50 steps, Refiner done with 0.5 strength (25 steps)

Base Refined
a_photo_of_an_astronaut_riding_a_horse_on_mars a_photo_of_an_astronaut_riding_a_horse_on_mars str50 93 final
image-latent-26 Labrador_in_the_style_of_Vermeer str50 93 final

Thank you for your interest in contributing to Core ML Stable Diffusion! Please review CONTRIBUTING.md first. If you would like to proceed with making a pull request, please indicate your agreement to the terms outlined in CONTRIBUTING.md by checking the box below. If not, please go ahead and fork this repo and make your updates.

We appreciate your interest in the project!

Do not erase the below when submitting your pull request:
#########

  • I agree to the terms outlined in CONTRIBUTING.md

@ZachNagengast ZachNagengast marked this pull request as draft August 4, 2023 00:44
@ZachNagengast
Copy link
Contributor Author

More info on the CLI issue:

Here's a visualization of what the unet is outputting on float16 vs float32 prevision via the CLI swift command. This is a greyscale visualization of the latent channels for comparison.

image

The expected output should be some prediction of noise in the image (in latent space), but the fp16 output has a large number of repeated values, which of course breaks the diffusion process.

@ZachNagengast ZachNagengast marked this pull request as ready for review August 8, 2023 03:45
@itsjoshpark
Copy link
Contributor

@atiorh this PR is ready for review :)

@atiorh
Copy link
Contributor

atiorh commented Aug 13, 2023

Thanks for the PR @ZachNagengast! I am traveling this week but will aim to review it asap.

Copy link
Contributor

@atiorh atiorh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did a cursory pass to kick off some discussions. Thanks for the great work @ZachNagengast!

One more high-level note: Looks like the current behavior is to detect whether the given XL Unet model is a refiner or a base one and run the pipeline accordingly. Could we also handle the cascaded base+refiner workflow similar to the canonical configuration on diffusers? What do you think? We could enforce a reduceMemory-like policy for the base2refiner transition to avoid 2x memory.

let i = 4 * (y * width + x)
if ptr[i+3] == 0 {
// Alpha mask for controlnets
redChannel[y * width + x] = alphaMaskValue
Copy link
Contributor

@atiorh atiorh Aug 15, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vzsg had specified a similar convention in #209 for ControlNet inpainting. Tagging for visibility to ensure we are not moving in a different direction with this potential switch from plannerRGBShapedArray to toRGBShapedArray.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I attempted to replicate it with this new function, and the plannerRGBShapedArray function is still in use for the controlnet conditioning. Would be great to get another eye on it if it's working as intended.

@@ -50,7 +50,7 @@ public struct Encoder: ResourceManaging {
scaleFactor: Float32,
random: inout RandomSource
) throws -> MLShapedArray<Float32> {
let imageData = try image.plannerRGBShapedArray(minValue: -1.0, maxValue: 1.0)
let imageData = try image.toRGBShapedArray(minValue: -1.0, maxValue: 1.0)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please summarize the reasons for moving away from plannerRGBShapedArray?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function is intended to fix a bug in the plannerRGBShapedArray that wasn't working right with 1024x1024. There may be another approach that fixes whatever the bug was, but I opted to just use the CoreGraphics library for simplicity.

Example of the bug (converted to plannerRGBShapedArray then back using CGImage.fromShapedArray):
newimg

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow, thanks @ZachNagengast! Is this for any 1024x1024 or some pixel value-dependent edge case? I might want to file a ticket for this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In any case, makes sense to workaround it by using your function at the moment

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea you can try it out just do

let imageData = try image.plannerRGBShapedArray(minValue: -1.0, maxValue: 1.0)
let decodedImage = try CGImage.fromShapedArray(imageData)

and inspect the decoded image, it should look off like this for the SDXL pipeline.

@@ -93,7 +93,7 @@ public struct Encoder: ResourceManaging {

var inputDescription: MLFeatureDescription {
try! model.perform { model in
model.modelDescription.inputDescriptionsByName["z"]!
model.modelDescription.inputDescriptionsByName.first!.value
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This preserves backwards compatibility so I am not concerned about the rename and thanks for the semantically correct fix :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to update, just appreciating the change

@@ -44,7 +44,19 @@ public struct PipelineConfiguration: Hashable {
public var encoderScaleFactor: Float32 = 0.18215
/// Scale factor to use on the latent before decoding
public var decoderScaleFactor: Float32 = 0.18215

/// If `originalSize` is not the same as `targetSize` the image will appear to be down- or upsampled.
Copy link
Contributor

@atiorh atiorh Aug 15, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am inclined to recommend a dedicated StableDiffusionXLPipeline.Configuration.swift as quite a few things on the input conditioning and pipeline are different. @ZachNagengast & @pcuenca What do you think?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, I'd agree it probably shouldn't be a requirement on the caller to set the scale factor properly, especially when we already know the pipeline is SDXL and can set the default properly. My question would be whether we'd want a new class for this or just setup the defaults somewhere in the pipeline. Or potentially a broader option of just including the config files with the converted models directly, alongside the merges.txt and vocab.json, and using those for the default config.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

potentially a broader option of just including the config files with the converted models directly

That would be nice and we can even make that backward compatible. However, it still wouldn't address the argument set mismatch across XL and non-XL right? IMO, the easiest is to create 2 (XL and non-XL) default configs and then add your proposal from above as an extension.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea that makes sense to me, and seems to be the approach Diffusers likes to take. My only nitpick is the repeated code, be it can be consolidated a fair amount by committing to the two pipelines and pulling out shared code as needed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, we can refactor over time :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with the approach discussed, separate configs do seem necessary to make usage simpler.

@@ -1392,6 +1425,13 @@ def parser_spec():
"If specified, enable unet to receive additional inputs from controlnet. "
"Each input added to corresponding resnet output."
)
parser.add_argument(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we rename this to --unet-compute-precision-float32? Sidebar: I still don't understand why the float32 constraint would only apply to the CLI and not any other Swift program that calls StableDiffusionXLPipeline 🤔 We should discuss this more 🙏

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea I'll rename it. The CLI thing is still a mystery to me, I'd love to hear if anyone can replicate it or if it's just my machine. I was just getting noise on the output with using fp16, which went away when in fp32 format.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am still traveling with bad internet so I am yet to convert and test the refiner model.. (hence the slow review). Will confirm once I do!

@ZachNagengast
Copy link
Contributor Author

@atiorh Regarding your note on the base2refiner transition, here are two approaches that could work well:

  1. Include "UnetRefiner.mlmodelc" as a path to look for when loading an xl model, where the refiner unet is included in the folder with the base unet, and intelligently switch between them depending on whether it exists, and whether we need to use reduced memory, as determined by the configuration. Since the encoders/vaes are shared, it would be efficient to keep them in one folder, rather than duplicated for both base and refiner. It would just require 2 unet passes during conversion if the refiner is also desired. The downside is they would then be tightly coupled as one "model" conceptually.
  2. Allow devs to control and swap out the Unet as needed for the pipeline they're aiming for, outside of the normal loadResources call. This would give developers more control over the pipeline, at the expense of a little more legwork upfront.

Maybe both would be useful down the road, what do you think?

@atiorh
Copy link
Contributor

atiorh commented Aug 20, 2023

@atiorh Regarding your note on the base2refiner transition, here are two approaches that could work well:

  1. Include "UnetRefiner.mlmodelc" as a path to look for when loading an xl model, where the refiner unet is included in the folder with the base unet, and intelligently switch between them depending on whether it exists, and whether we need to use reduced memory, as determined by the configuration. Since the encoders/vaes are shared, it would be efficient to keep them in one folder, rather than duplicated for both base and refiner. It would just require 2 unet passes during conversion if the refiner is also desired. The downside is they would then be tightly coupled as one "model" conceptually.
  2. Allow devs to control and swap out the Unet as needed for the pipeline they're aiming for, outside of the normal loadResources call. This would give developers more control over the pipeline, at the expense of a little more legwork upfront.

Maybe both would be useful down the road, what do you think?

Totally, this is what I had in mind too:

  • if refiner (UnetRefiner.mlmodelc) doesn't exist, we run a base-only pipeline
  • if base doesn't exist, we error out unless a starting image was provided?
  • if both exist, we run base first and refiner second using their dedicated subconfigs (e.g. --step-count, --strength etc.)

@@ -796,9 +827,15 @@ def convert_unet(pipe, args):
for k, v in sample_unet_inputs.items()
}

if args.xl_version and pipe.config.requires_aesthetics_score and args.unet_support_cli:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if args.xl_version and pipe.config.requires_aesthetics_score and args.unet_support_cli:
if args.xl_version and pipe.config.requires_aesthetics_score and args.unet_compute_precision_float32:

Copy link
Contributor

@pcuenca pcuenca left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've just tested this PR and have some high-level overview for now:

  • I haven't been able to reproduce the unet inference issues in float16 with the CLI. It works for me using both cpuAndGPU and all compute units (although using all results in very slow compilation times, as expected). It'd be really useful to identify an easy reproducible case. Any ideas about how to proceed @ZachNagengast?
  • The plannerRGBShapedArray issue is caused by row padding applied by vImage for performance reasons. I have a simple fix that I can submit as a separate PR for discussion. (I could submit a PR to this PR if that's preferred, but I think the problem is general enough to deal with separately).
  • I didn't see any provisions in the Swift portion of the code to deal with absent textEncoder or textEncoder2 modules, so applying the refiner in text2image mode results in an error unless both encoders exist. This is only a problem because the original refiner checkpoints don't include a textEncoder, so a conversion process followed by inference on the output folder would fail.
  • There was a minor typo in the name of the arg to control float32 conversion of the UNet (it had only been changed in one of two places); I submitted a small suggestion for it.

Other than that, I agree in providing a dedicated method to run both the base and refiner models one after the other, in addition to using them separately.

I'll dive deeper into the code details later :)

Great work @ZachNagengast, this is a really exciting feature!

@pcuenca pcuenca mentioned this pull request Aug 23, 2023
1 task
- Also skip loading model if check_output_correctness is missing, since the model does not require inferencing at conversion time
@ZachNagengast
Copy link
Contributor Author

ZachNagengast commented Aug 28, 2023

@pcuenca Just updated the conversion script to bundle the UnetRefiner.mlmodelc with the base unet if specified via --refiner-version. Take a look and let me know if you have any edits, likewise @atiorh. I was blocked for a while due to the model load stage of the convert() coremltools method causing OOM issues with the refiner (not sure what brought that in because it never had that issue until recently). Resolved that by uncommenting the skip_model_load param and making it based on args.check_output_correctness because we don't need to load the converted model for inference unless specified.

Also saw your update to the plannerRGBShapedArray, nice fix. I'll include that back in with the pipeline updates, which should be coming later today.

Regarding the float precision thing - good to know your computer didn't have a problem. Unless anyone else reports a similar issue, I'll just remove that logic for simplicity.

@ZachNagengast
Copy link
Contributor Author

Here's an example refiner + base conversion script. Thinking behind the new parameter was the potential for future refiners not tightly coupled with the base, so this will let you bundle any pre-trained unet via the --refiner-version param as the refiner.

python -m python_coreml_stable_diffusion.torch2coreml --convert-text-encoder --convert-vae-decoder --convert-vae-encoder --convert-unet --model-version stabilityai/stable-diffusion-xl-base-1.0 --refiner-version stabilityai/stable-diffusion-xl-refiner-1.0 --compute-unit ALL --attention-implementation ORIGINAL --bundle-resources-for-swift-cli -o <output_path> --xl-version

@ZachNagengast
Copy link
Contributor Author

ZachNagengast commented Aug 29, 2023

@atiorh @pcuenca Update - just pushed some code to run the UnetRefiner and also handle the potential missing models in the bundle. Haven't included any dedicated subconfigs yet, but this latest update uses strength as the fraction for when the refiner should take over. Worth noting that we're diverging a bit from diffusers because they use two separate pipelines for base+refiner whereas this is currently set up to do it all at once. Will update the CLI next but looking for feedback on what's here so far, also whether you'd prefer to have the dedicated subconfigs in this PR or have that come in separately. Thanks! Heres some outputs from my tests:

Strength 1.0 Strength 0.8 Strength 0.5
Labrador_in_the_style_of_Vermeer-unrefined Labrador_in_the_style_of_Vermeer-refined-80% Labrador_in_the_style_of_Vermeer-refined-50%
0/25 refiner steps 5/25 refiner steps 13/25 refiner steps

var textEncoder2: TextEncoderXLModel

/// Model used to predict noise residuals given an input, diffusion time step, and conditional embedding
var unet: Unet

/// Model used to refine the image, if present
var unetRefiner: Unet?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall we make unet also an optional since we were talking about the following scenarios:
1-) Only refiner unet provided, we only refine
2-) Only base unet provided, we only run base gen
3-) Both provided, we run full pipeline (loading/unloading base/refiner during the transition)
4-) Neither provided, we error out

If you think (1-) is not relevant, we can discuss 👍

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question, 1-) might have a catch in the case that someone wants to use the refiner as a base model (in the case of imageToImage mode). If only a refiner is provided then we can assume it's imageToImage mode and require a startingImage as well. It might be useful to include some flexibility to use either a Unet or UnetRefiner for purely imageToImage applications to save disk space.

This question also goes back to torch2coreml.py I think - in the conversion script we could logically separate refiner unets in a similar way to how the controlnets are separate, and discourage using refiner unets as base models. Currently, refiner unets are considered the same as any other sdxl unet unless specifically set by the new argument. I do think there are cases where you'd want to use a "base" model as a refiner as more fine-tuned models/loras come out, and there are also some cases I've seen people use the refiner as a base model, but probably less common overall.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see! Let's keep it as is for now then. I will test the workflows as is.

logger.info(f"Converting refiner")
del pipe
gc.collect()
args.model_version = args.refiner_version
Copy link
Contributor

@atiorh atiorh Sep 14, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ZachNagengast Could we restore the original model_version after the refiner conversion is done 2 lines down? The name override breaks some filename assumptions in the coremlcompiler calls

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep! Just pushed

@atiorh
Copy link
Contributor

atiorh commented Sep 15, 2023

Update: still testing but things are looking great so far!

@ZachNagengast One request: could you please hardcode cpuAndGPU for the SDXL VAE (with a FIXME note) since we are forcing it to float32 for now? Neural Engine does not support float32 and the VAE ends up on the CPU when cpuAndNeuralEngine is selected

Copy link
Contributor

@atiorh atiorh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested base+refiner flow and ran into an issue that I think may require some changes, thanks for the continued iterations @ZachNagengast!

@atiorh
Copy link
Contributor

atiorh commented Sep 17, 2023

@ZachNagengast Tested out the base+refiner and base only workflows, works great! Final things to put in place before I can approve:

  • unetRefiner chunking support
  • Lazily loading unetRefiner just in time for iOS
  • coremlcompiler calls still skip the refiner compilation because the _get_out_path(args, source_name) uses args.model_version instead of args.refiner_version, it is tricky to handle both model versions throughput the script..

@ZachNagengast
Copy link
Contributor Author

Thanks for testing it out @atiorh, just pushed refiner chunking, and the compilation step should work now too, where it will save the files as {base model version}_refiner_chunk#.mlpackage so that they are closely coupled to the base (similar to controlnet). Lazy loading should be working similar to normal unets, and now added an extra step to make sure it's unloaded afterwards.

public var refinerStart: Float = 0.7
public var refinerStart: Float = 0.8
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think should be the default here? I see 0.7 here, but 0.8 in the examples here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was based on my own testing, but according to the paper it was specialized on the first 200 steps, so 20% seems reasonable. I noticed fewer "wrong" artifacts show up with 0.8 which seems preferred as a default, whereas I didn't notice much more "refinement" with a higher percentage of refinement steps.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome, thanks for the clarification!

Comment on lines 79 to 88
if FileManager.default.fileExists(atPath: urls.unetRefinerURL.path) {
unetRefiner = Unet(modelAt: urls.unetRefinerURL, configuration: config)
if FileManager.default.fileExists(atPath: urls.unetRefinerChunk1URL.path) &&
FileManager.default.fileExists(atPath: urls.unetRefinerChunk2URL.path) {
unetRefiner = Unet(chunksAt: [urls.unetRefinerChunk1URL, urls.unetRefinerChunk2URL],
configuration: config)
} else {
unetRefiner = nil
unetRefiner = Unet(modelAt: urls.unetRefinerURL, configuration: config)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't unetRefiner be nil when it's not there?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update: after testing I can confirm that the generation loop fails if there's no refiner in the bundle.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, good catch. I copied this from the controlnet logic but missed the first parts.

self.unet.unloadResources()
unet.unloadResources()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is cool. But if reduceMemory is false, I think both the normal unet and the refiner would be loaded while iterating through the unet. I would suggest we always prewarm the refiner so it's unloaded until needed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This way the UnetRefiner will always lazy-loaded, which makes sense because we're always unloading the base unet here every time. @atiorh FYI, I'll update the logic so that UnetRefiner will always be unloaded initially on launch, not just for iOS. It was touched on before but there might be a use case for a reduceLatency type of setting if the user wants to keep both in memory at all times, but that can be for later.

@@ -270,7 +270,8 @@ public struct StableDiffusionXLPipeline: StableDiffusionPipelineProtocol {
}

if reduceMemory {
unetModel.unloadResources()
unetRefiner?.unloadResources()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd maybe move this line out of the conditional for the same reason above.

@pcuenca
Copy link
Contributor

pcuenca commented Sep 19, 2023

@ZachNagengast Tested out the base+refiner and base only workflows, works great! Final things to put in place before I can approve:

  • unetRefiner chunking support
  • Lazily loading unetRefiner just in time for iOS
  • coremlcompiler calls still skip the refiner compilation because the _get_out_path(args, source_name) uses args.model_version instead of args.refiner_version, it is tricky to handle both model versions throughput the script..

I think we are also missing the hardcoded cpuAndGPU for the VAE, unless I missed that change.

@atiorh atiorh mentioned this pull request Sep 25, 2023
Copy link
Contributor

@atiorh atiorh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great, thanks for the perseverance and great work @ZachNagengast! Merging now and we can address minor issues in subsequent PRs

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants