google-deepmind · copybara-service · Apr 4, 2022 · Apr 4, 2022
diff --git a/examples/README.md b/examples/README.md
@@ -0,0 +1,56 @@
+# KFAC-JAX Examples
+
+This folder contains code with common functionality used in all examples, and 
+the examples subfolders as well.
+Each example follows the following structure:
+* `experiment.py` has the model definition, loss definition, and pipeline 
+experiment class.
+* `pipeline.py` has the hyper-parameter configuration.
+
+
+To run the examples you will need to install additional dependencies via:
+
+```shell
+$ pip install -r examples/requirements.txt
+```
+
+To run an example simply do:
+
+```shell
+$ python example_name/pipeline.py
+```
+
+## Autoencoder on MNIST
+
+This example uses the K-FAC optimizer to perform deterministic (i.e. full batch) 
+training of a deep autoencoder on MNIST.
+The default configuration uses the automatic learning rate, momentum, and 
+damping adaptation techniques from the original K-FAC paper.
+
+## Classifier on MNIST
+
+This example uses the K-FAC optimizer to perform deterministic (i.e. full batch) 
+training of a very small convolutional network for MNIST classification.
+The default configuration uses the automatic learning rate, momentum, and 
+damping adaptation techniques from the original K-FAC paper.
+
+## Resnet50 on ImageNet
+
+This example uses the K-FAC optimizer to perform stochastic training (with 
+fixed batch size) of a Resnet50 network for ImageNet classification. 
+The default configuration uses the automatic damping adaptation technique from 
+the original K-FAC paper.
+The momentum is fixed at `0.9` and the learning rate follows an ad-hoc schedule.
+
+
+## Resnet101 with TAT on ImageNet
+
+This example uses the K-FAC optimizer to perform stochastic training (with 
+fixed batch size) of a Resnet101 network for ImageNet classification, 
+with no residual connections or normalization layers as in the
+[TAT paper].
+The default configuration uses a fixed damping of `0.001`.
+The momentum is fixed at `0.9` and the learning rate follows a cosine decay 
+schedule.
+
+[TAT paper]: https://arxiv.org/abs/2203.08120
diff --git a/examples/autoencoder_mnist/pipeline.py b/examples/autoencoder_mnist/pipeline.py
@@ -38,6 +38,7 @@ def get_config() -> config_dict.ConfigDict:
   config.checkpoint_dir = "/tmp/kfac_jax_jaxline/"
   config.train_checkpoint_all_hosts = False
 
+  # Experiment config.
   config.experiment_kwargs = config_dict.ConfigDict(
       dict(
           config=dict(

diff --git a/examples/lrelunet101_imagenet/pipeline.py b/examples/lrelunet101_imagenet/pipeline.py
@@ -77,7 +77,7 @@ def get_config() -> config_dict.ConfigDict:
                       use_adaptive_momentum=False,
                       use_adaptive_damping=False,
                       learning_rate_schedule=dict(
-                          initial_learning_rate=0.1,
+                          initial_learning_rate=3e-4,
                           warmup_epochs=5,
                           name="cosine",
                       ),

diff --git a/examples/resnet50_imagenet/pipeline.py b/examples/resnet50_imagenet/pipeline.py
@@ -42,7 +42,7 @@ def get_config() -> config_dict.ConfigDict:
   config.experiment_kwargs = config_dict.ConfigDict(
       dict(
           config=dict(
-              l2_reg=0.0,
+              l2_reg=1e-5,
               training=dict(
                   steps=200_000,
                   epochs=None,