You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
|[Tutorial](https://huggingface.co/docs/diffusers/tutorials/tutorial_overview)| A basic crash course for learning how to use the library's most important features like using models and schedulers to build your own diffusion system, and training your own diffusion model. |
115
-
|[Loading](https://huggingface.co/docs/diffusers/using-diffusers/loading_overview)| Guides for how to load and configure all the components (pipelines, models, and schedulers) of the library, as well as how to use different schedulers. |
116
-
|[Pipelines for inference](https://huggingface.co/docs/diffusers/using-diffusers/pipeline_overview)| Guides for how to use pipelines for different inference tasks, batched generation, controlling generated outputs and randomness, and how to contribute a pipeline to the library. |
117
-
|[Optimization](https://huggingface.co/docs/diffusers/optimization/opt_overview)| Guides for how to optimize your diffusion model to run faster and consume less memory. |
115
+
|[Loading](https://huggingface.co/docs/diffusers/using-diffusers/loading)| Guides for how to load and configure all the components (pipelines, models, and schedulers) of the library, as well as how to use different schedulers. |
116
+
|[Pipelines for inference](https://huggingface.co/docs/diffusers/using-diffusers/overview_techniques)| Guides for how to use pipelines for different inference tasks, batched generation, controlling generated outputs and randomness, and how to contribute a pipeline to the library. |
117
+
|[Optimization](https://huggingface.co/docs/diffusers/optimization/fp16)| Guides for how to optimize your diffusion model to run faster and consume less memory. |
118
118
|[Training](https://huggingface.co/docs/diffusers/training/overview)| Guides for how to train a diffusion model for different tasks with different training techniques. |
<!-- Copyright 2024 The HuggingFace Team. All rights reserved.
2
+
3
+
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
4
+
the License. You may obtain a copy of the License at
5
+
6
+
http://www.apache.org/licenses/LICENSE-2.0
7
+
8
+
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
9
+
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
10
+
specific language governing permissions and limitations under the License. -->
11
+
12
+
# AutoencoderDC
13
+
14
+
The 2D Autoencoder model used in [SANA](https://huggingface.co/papers/2410.10629) and introduced in [DCAE](https://huggingface.co/papers/2410.10733) by authors Junyu Chen\*, Han Cai\*, Junsong Chen, Enze Xie, Shang Yang, Haotian Tang, Muyang Li, Yao Lu, Song Han from MIT HAN Lab.
15
+
16
+
The abstract from the paper is:
17
+
18
+
*We present Deep Compression Autoencoder (DC-AE), a new family of autoencoder models for accelerating high-resolution diffusion models. Existing autoencoder models have demonstrated impressive results at a moderate spatial compression ratio (e.g., 8x), but fail to maintain satisfactory reconstruction accuracy for high spatial compression ratios (e.g., 64x). We address this challenge by introducing two key techniques: (1) Residual Autoencoding, where we design our models to learn residuals based on the space-to-channel transformed features to alleviate the optimization difficulty of high spatial-compression autoencoders; (2) Decoupled High-Resolution Adaptation, an efficient decoupled three-phases training strategy for mitigating the generalization penalty of high spatial-compression autoencoders. With these designs, we improve the autoencoder's spatial compression ratio up to 128 while maintaining the reconstruction quality. Applying our DC-AE to latent diffusion models, we achieve significant speedup without accuracy drop. For example, on ImageNet 512x512, our DC-AE provides 19.1x inference speedup and 17.9x training speedup on H100 GPU for UViT-H while achieving a better FID, compared with the widely used SD-VAE-f8 autoencoder. Our code is available at [this https URL](https://github.com/mit-han-lab/efficientvit).*
19
+
20
+
The following DCAE models are released and supported in Diffusers.
Copy file name to clipboardExpand all lines: docs/source/en/api/pipelines/flux.md
+1-1
Original file line number
Diff line number
Diff line change
@@ -148,7 +148,7 @@ image.save("output.png")
148
148
**Note:**`black-forest-labs/Flux.1-Depth-dev` is _not_ a ControlNet model. [`ControlNetModel`] models are a separate component from the UNet/Transformer whose residuals are added to the actual underlying model. Depth Control is an alternate architecture that achieves effectively the same results as a ControlNet model would, by using channel-wise concatenation with input control condition and ensuring the transformer learns structure control by following the condition as closely as possible.
Notice that we are using a particular CLIP checkpoint, i.e., `openai/clip-vit-large-patch14`. This is because the Stable Diffusion pre-training was performed with this CLIP variant. For more details, refer to the [documentation](https://huggingface.co/docs/transformers/model_doc/clip).
@@ -350,7 +350,7 @@ class DirectionalSimilarity(nn.Module):
0 commit comments