Clarification on mutually exclusive use of self-attention and cross-attention in DiffusionModelUNetMaisi

Hi MONAI team,

while reading through the implementation of DiffusionModelUNetMaisi, I noticed the following logic for enabling attention at each level:

```python 
with_attn = attention_levels[i] and not with_conditioning
with_cross_attn = attention_levels[i] and with_conditioning
```

This effectively means that self-attention is never used when the model is in conditioning mode (with_conditioning=True), even if attention_levels[i] is True.

Is this behavior intentional?

In other diffusion-based architectures such as Stable Diffusion, it is common practice to enable both self-attention and cross-attention simultaneously within the same layers. 

Would it be acceptable, or even recommended, to modify the logic as follows to allow both mechanisms in parallel?
Or is there a specific reason this mutual exclusivity was enforced?

Looking forward to your insights, and thank you for the great work on this model!

Best regards,
Daniele Molino



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Clarification on mutually exclusive use of self-attention and cross-attention in DiffusionModelUNetMaisi #1993

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Clarification on mutually exclusive use of self-attention and cross-attention in DiffusionModelUNetMaisi #1993

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions