Skip to content

Hardcoded 4096 dimension in DiffusionModelEncoder prevents architectural flexibility #8496

Open
@parissashahabi

Description

@parissashahabi

The DiffusionModelEncoder class has a hardcoded input dimension of 4096 in the final linear layer, preventing architectural flexibility and causing shape mismatch errors when using different encoder configurations.

Steps to reproduce the behavior:

  1. Create a DiffusionModelEncoder with custom channels or input dimensions that don't result in 4096 flattened features
  2. Forward pass through the model
  3. Encounter shape mismatch error at the hardcoded nn.Linear(4096, 512) layer

The final linear layer should dynamically adapt to the actual flattened feature size from the encoder blocks.

Environment

================================
Printing MONAI config...
================================
MONAI version: 1.6.dev2525
Numpy version: 1.26.4
Pytorch version: 2.6.0+cu124
MONAI flags: HAS_EXT = False, USE_COMPILED = False, USE_META_DICT = False
MONAI rev id: 56fe5f014f424ad0e6dbc8345515fc49295dd849
MONAI __file__: /usr/local/lib/python3.11/dist-packages/monai/__init__.py

Optional dependencies:
Pytorch Ignite version: 0.5.2
ITK version: NOT INSTALLED or UNKNOWN VERSION.
Nibabel version: 5.3.2
scikit-image version: 0.25.2
scipy version: 1.15.2
Pillow version: 11.1.0
Tensorboard version: 2.18.0
gdown version: 5.2.0
TorchVision version: 0.21.0+cu124
tqdm version: 4.67.1
lmdb version: NOT INSTALLED or UNKNOWN VERSION.
psutil version: 7.0.0
pandas version: 2.2.3
einops version: 0.8.1
transformers version: 4.51.3
mlflow version: NOT INSTALLED or UNKNOWN VERSION.
pynrrd version: NOT INSTALLED or UNKNOWN VERSION.
clearml version: NOT INSTALLED or UNKNOWN VERSION.

For details about installing the optional dependencies, please visit:
    https://docs.monai.io/en/latest/installation.html#installing-the-recommended-dependencies


================================
Printing system config...
================================
System: Linux
Linux version: Ubuntu 22.04.4 LTS
Platform: Linux-6.6.56+-x86_64-with-glibc2.35
Processor: x86_64
Machine: x86_64
Python version: 3.11.11
Process name: python3
Command: ['/usr/bin/python3', '-m', 'colab_kernel_launcher', '-f', '/root/.local/share/jupyter/runtime/kernel-2d79cbbb-e64e-40c4-b611-bb0a65b9b06d.json']
Open files: [popenfile(path='/root/.ipython/profile_default/history.sqlite', fd=46, position=0, mode='r+', flags=688130), popenfile(path='/root/.ipython/profile_default/history.sqlite', fd=48, position=0, mode='r+', flags=688130), popenfile(path='/root/.ipython/profile_default/history.sqlite-journal', fd=74, position=0, mode='r+', flags=688130)]
Num physical CPUs: 2
Num logical CPUs: 4
Num usable CPUs: 4
CPU usage (%): [2.6, 3.7, 2.9, 2.7]
CPU freq. (MHz): 2000
Load avg. in last 1, 5, 15 mins (%): [33.9, 15.3, 6.8]
Disk usage (%): 28.7
Avg. sensor temp. (Celsius): UNKNOWN for given OS
Total physical memory (GB): 31.4
Available memory (GB): 29.2
Used memory (GB): 1.7

================================
Printing GPU config...
================================
Num GPUs: 1
Has CUDA: True
CUDA version: 12.4
cuDNN enabled: True
NVIDIA_TF32_OVERRIDE: None
TORCH_ALLOW_TF32_CUBLAS_OVERRIDE: None
cuDNN version: 90100
Current device: 0
Library compiled for CUDA architectures: ['sm_50', 'sm_60', 'sm_70', 'sm_75', 'sm_80', 'sm_86', 'sm_90']
GPU 0 Name: Tesla P100-PCIE-16GB
GPU 0 Is integrated: False
GPU 0 Is multi GPU board: False
GPU 0 Multi processor count: 56
GPU 0 Total memory (GB): 15.9
GPU 0 CUDA capability (maj.min): 6.0

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions