General Doc refresher (#232)

MMathisLab · stes · web-flow · commit 823c9ca8378b · 2025-03-03T14:25:26.000+01:00
* Update installation.rst

- python 3.9+

* Update index.rst

* Update figures.rst

* Update index.rst

-typo fix

* Update usage.rst

- update suggestion on data split

* Update docs/source/usage.rst

Co-authored-by: Steffen Schneider &lt;stes@hey.com&gt;

* Update usage.rst

- indent error fixed

* Update usage.rst

- changed infoNCE to new GoF

* Update usage.rst

- finx numpy() doctest

* Update usage.rst

- small typo fix (label)

* Update usage.rst

---------

Co-authored-by: Steffen Schneider &lt;stes@hey.com&gt;
diff --git a/docs/source/figures.rst b/docs/source/figures.rst
@@ -1,7 +1,7 @@
 Figures
 =======
 
-CEBRA was introduced in `Schneider, Lee and Mathis (2022)`_ and applied to various datasets across
+CEBRA was introduced in `Schneider, Lee and Mathis (2023)`_ and applied to various datasets across
 animals and recording modalities.
 
 In this section, we provide reference code for reproducing the figures and experiments. Since especially
@@ -56,4 +56,4 @@ differ in minor typographic details.
 
 
 
-.. _Schneider, Lee and Mathis (2022): https://arxiv.org/abs/2204.00673
+.. _Schneider, Lee and Mathis (2023): https://www.nature.com/articles/s41586-023-06031-6
diff --git a/docs/source/index.rst b/docs/source/index.rst
@@ -34,27 +34,18 @@ Please support the development of CEBRA by starring and/or watching the project
 Installation and Setup
 ----------------------
 
-Please see the dedicated :doc:`Installation Guide </installation>` for information on installation options using ``conda``, ``pip`` and ``docker``.
-
-Have fun! 😁
+Please see the dedicated :doc:`Installation Guide </installation>` for information on installation options using ``conda``, ``pip`` and ``docker``. Have fun! 😁
 
 Usage
 -----
 
 Please head over to the :doc:`Usage </usage>` tab to find step-by-step instructions to use CEBRA on your data. For example use cases, see the :doc:`Demos </demos>` tab.
 
-Integrations
-------------
-
-CEBRA can be directly integrated with existing libraries commonly used in data analysis. The ``cebra.integrations`` module
-is getting actively extended. Right now, we offer integrations for ``scikit-learn``-like usage of CEBRA, a package making use of ``matplotlib`` to plot the CEBRA model results, as well as the
-possibility to compute CEBRA embeddings on DeepLabCut_ outputs directly.
-
 
 Licensing
 ---------
-
-Since version 0.4.0, CEBRA is open source software under an Apache 2.0 license.
+The ideas presented in our package are currently patent pending (Patent No. WO2023143843).
+Since version 0.4.0, CEBRA's source is licenced under an Apache 2.0 license.
 Prior versions 0.1.0 to 0.3.1 were released for academic use only.
 
 Please see the full license file on Github_ for further information.
@@ -65,13 +56,19 @@ Contributing
 
 Please refer to the :doc:`Contributing </contributing>` tab to find our guidelines on contributions.
 
-Code contributors
+Code Contributors
 -----------------
 
-The CEBRA code was originally developed by Steffen Schneider, Jin H. Lee, and Mackenzie Mathis (up to internal version 0.0.2). As of March 2023, it is being actively extended and maintained by `Steffen Schneider`_, `Célia Benquet`_, and `Mackenzie Mathis`_.
+The CEBRA code was originally developed by Steffen Schneider, Jin H. Lee, and Mackenzie Mathis (up to internal version 0.0.2). Please see our AUTHORS file for more information.
 
-References
-----------
+Integrations
+------------
+
+CEBRA can be directly integrated with existing libraries commonly used in data analysis. Namely, we provide a ``scikit-learn`` style interface to use CEBRA. Additionally, we offer integrations with our ``scikit-learn``-style of using CEBRA, a package making use of ``matplotlib`` and ``plotly`` to plot the CEBRA model results, as well as the possibility to compute CEBRA embeddings on DeepLabCut_ outputs directly. If you have another suggestion, please head over to Discussions_ on GitHub_!
+
+
+Key References
+--------------
 .. code::
 
   @article{schneider2023cebra,
@@ -82,14 +79,22 @@ References
     year = {2023},
   }
 
+  @article{xCEBRA2025,
+    author={Steffen Schneider and Rodrigo Gonz{\'a}lez Laiz and Anastasiia Filippova and Markus Frey and Mackenzie W Mathis},
+    title = {Time-series attribution maps with regularized contrastive learning},
+    journal = {AISTATS},
+    url = {https://openreview.net/forum?id=aGrCXoTB4P},
+    year = {2025},
+  }
+
 This documentation is based on the `PyData Theme`_.
 
 
 .. _`Twitter`: https://twitter.com/cebraAI
 .. _`PyData Theme`: https://github.com/pydata/pydata-sphinx-theme
 .. _`DeepLabCut`: https://deeplabcut.org
+.. _`Discussions`: https://github.com/AdaptiveMotorControlLab/CEBRA/discussions
 .. _`Github`: https://github.com/AdaptiveMotorControlLab/cebra
 .. _`email`: mailto:mackenzie.mathis@epfl.ch
 .. _`Steffen Schneider`: https://github.com/stes
-.. _`Célia Benquet`: https://github.com/CeliaBenquet
 .. _`Mackenzie Mathis`: https://github.com/MMathisLab
diff --git a/docs/source/installation.rst b/docs/source/installation.rst
@@ -4,7 +4,7 @@ Installation Guide
 System Requirements
 -------------------
 
-CEBRA is written in Python (3.8+) and PyTorch. CEBRA is most effective when used with a GPU, but CPU-only support is provided. We provide instructions to run CEBRA on your system directly.  The instructions below were tested on different compute setups with Ubuntu 18.04 or 20.04, using Nvidia GTX 2080, A4000, and V100 cards. Other setups are possible (including Windows), as long as CUDA 10.2+ support is guaranteed.
+CEBRA is written in Python (3.9+) and PyTorch. CEBRA is most effective when used with a GPU, but CPU-only support is provided. We provide instructions to run CEBRA on your system directly.  The instructions below were tested on different compute setups with Ubuntu 18.04 or 20.04, using Nvidia GTX 2080, A4000, and V100 cards. Other setups are possible (including Windows), as long as CUDA 10.2+ support is guaranteed.
 
 - Software dependencies and operating systems:
     - Linux or MacOS
@@ -93,11 +93,11 @@ we outline different options below.
 
         * 🚀 For more advanced users, CEBRA has different extra install options that you can select based on your usecase:
 
-            * ``[integrations]``: This will install (experimental) support for our streamlit and jupyter integrations.
+            * ``[integrations]``: This will install (experimental) support for integrations, such as plotly.
             * ``[docs]``: This will install additional dependencies for building the package documentation.
             * ``[dev]``: This will install additional dependencies for development, unit and integration testing,
               code formatting, etc. Install this extension if you want to work on a pull request.
-            * ``[demos]``: This will install additional dependencies for running our demo notebooks.
+            * ``[demos]``: This will install additional dependencies for running our demo notebooks in Jupyter.
             * ``[datasets]``: This extension will install additional dependencies to use the pre-installed datasets
               in ``cebra.datasets``.
 
diff --git a/docs/source/usage.rst b/docs/source/usage.rst
@@ -1207,42 +1207,47 @@ Putting all previous snippet examples together, we obtain the following pipeline
 
      # 1. Define a CEBRA model
      cebra_model = cebra.CEBRA(
-         model_architecture = "offset10-model",
-         batch_size = 512,
-         learning_rate = 1e-4,
-         temperature_mode='constant',
-         temperature = 0.1,
-         max_iterations = 10, # TODO(user): to change to ~500-10000 depending on dataset size
-         #max_adapt_iterations = 10, # TODO(user): use and to change to ~100-500 if adapting
-         time_offsets = 10,
-         output_dimension = 8,
-         verbose = False
+        model_architecture = "offset10-model",
+        batch_size = 512,
+        learning_rate = 1e-4,
+        temperature_mode='constant',
+        temperature = 0.1,
+        max_iterations = 10, # TODO(user): to change to ~500-10000 depending on dataset size
+        #max_adapt_iterations = 10, # TODO(user): use and to change to ~100-500 if adapting
+        time_offsets = 10,
+        output_dimension = 8,
+        verbose = False
      )
-
+    
      # 2. Load example data
      neural_data = cebra.load_data(file="neural_data.npz", key="neural")
      new_neural_data = cebra.load_data(file="neural_data.npz", key="new_neural")
      continuous_label = cebra.load_data(file="auxiliary_behavior_data.h5", key="auxiliary_variables", columns=["continuous1", "continuous2", "continuous3"])
      discrete_label = cebra.load_data(file="auxiliary_behavior_data.h5", key="auxiliary_variables", columns=["discrete"]).flatten()
-
+    
+    
      assert neural_data.shape == (100, 3)
      assert new_neural_data.shape == (100, 4)
      assert discrete_label.shape == (100, )
      assert continuous_label.shape == (100, 3)
-
-     # 3. Split data and labels
-     (
-         train_data,
-         valid_data,
-         train_discrete_label,
-         valid_discrete_label,
-         train_continuous_label,
-         valid_continuous_label,
-     ) = train_test_split(neural_data,
-                         discrete_label,
-                         continuous_label,
-                         test_size=0.3)
-
+    
+     # 3. Split data and labels into train/validation
+     from sklearn.model_selection import train_test_split
+    
+     split_idx = int(0.8 * len(neural_data))
+     # suggestion: 5%-20% depending on your dataset size; note that this splits the
+     # into an early and late part, which might not be ideal for your data/experiment!
+     # As a more involved alternative, consider e.g. a nested time-series split.
+    
+     train_data = neural_data[:split_idx]
+     valid_data = neural_data[split_idx:]
+    
+     train_continuous_label = continuous_label[:split_idx]
+     valid_continuous_label = continuous_label[split_idx:]
+    
+     train_discrete_label = discrete_label[:split_idx]
+     valid_discrete_label = discrete_label[split_idx:]
+    
      # 4. Fit the model
      # time contrastive learning
      cebra_model.fit(train_data)
@@ -1252,33 +1257,36 @@ Putting all previous snippet examples together, we obtain the following pipeline
      cebra_model.fit(train_data, train_continuous_label)
      # mixed behavior contrastive learning
      cebra_model.fit(train_data, train_discrete_label, train_continuous_label)
-
+    
+    
      # 5. Save the model
      tmp_file = Path(tempfile.gettempdir(), 'cebra.pt')
      cebra_model.save(tmp_file)
-
+    
      # 6. Load the model and compute an embedding
      cebra_model = cebra.CEBRA.load(tmp_file)
      train_embedding = cebra_model.transform(train_data)
      valid_embedding = cebra_model.transform(valid_data)
-     assert train_embedding.shape == (70, 8) # TODO(user): change to split ratio & output dim
-     assert valid_embedding.shape == (30, 8) # TODO(user): change to split ratio & output dim
-
+    
+     assert train_embedding.shape == (80, 8) # TODO(user): change to split ratio & output dim
+     assert valid_embedding.shape == (20, 8) # TODO(user): change to split ratio & output dim
+    
      # 7. Evaluate the model performance (you can also check the train_data)
-     goodness_of_fit = cebra.sklearn.metrics.infonce_loss(cebra_model,
+     goodness_of_fit = cebra.sklearn.metrics.goodness_of_fit_score(cebra_model,
                                                           valid_data,
                                                           valid_discrete_label,
-                                                          valid_continuous_label,
-                                                          num_batches=5)
-
+                                                          valid_continuous_label)
+    
      # 8. Adapt the model to a new session
      cebra_model.fit(new_neural_data, adapt = True)
-
+    
      # 9. Decode discrete labels behavior from the embedding
      decoder = cebra.KNNDecoder()
      decoder.fit(train_embedding, train_discrete_label)
      prediction = decoder.predict(valid_embedding)
-     assert prediction.shape == (30,)
+     assert prediction.shape == (20,)
+
+
 
 👉 For further guidance on different/customized applications of CEBRA on your own data, refer to the ``examples/`` folder or to the full documentation folder ``docs/``.