Skip to content

Clarification on mutually exclusive use of self-attention and cross-attention in DiffusionModelUNetMaisi #1993

Open
@danielemolino

Description

@danielemolino

Hi MONAI team,

while reading through the implementation of DiffusionModelUNetMaisi, I noticed the following logic for enabling attention at each level:

with_attn = attention_levels[i] and not with_conditioning
with_cross_attn = attention_levels[i] and with_conditioning

This effectively means that self-attention is never used when the model is in conditioning mode (with_conditioning=True), even if attention_levels[i] is True.

Is this behavior intentional?

In other diffusion-based architectures such as Stable Diffusion, it is common practice to enable both self-attention and cross-attention simultaneously within the same layers.

Would it be acceptable, or even recommended, to modify the logic as follows to allow both mechanisms in parallel?
Or is there a specific reason this mutual exclusivity was enforced?

Looking forward to your insights, and thank you for the great work on this model!

Best regards,
Daniele Molino

Metadata

Metadata

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions