|
6 | 6 | Release Notes
|
7 | 7 | ###############
|
8 | 8 |
|
| 9 | +************** |
| 10 | + Version 0.38 |
| 11 | +************** |
| 12 | + |
| 13 | +Version 0.38.0 |
| 14 | +============== |
| 15 | + |
| 16 | +**Release Date:** November 22, 2024 |
| 17 | + |
| 18 | +**Breaking Changes** |
| 19 | + |
| 20 | +- ASHA: All experiments using ASHA hyperparameter search must now configure ``max_time`` and |
| 21 | + ``time_metric`` in the experiment config, instead of ``max_length``. Additionally, training code |
| 22 | + must report the configured ``time_metric`` in validation metrics. As a convenience, Determined |
| 23 | + training loops now automatically report ``batches`` and ``epochs`` with metrics, which you can |
| 24 | + use as your ``time_metric``. ASHA experiments without this modification will no longer run. |
| 25 | + |
| 26 | +- Custom Searchers: All custom searchers including DeepSpeed Autotune were deprecated in ``0.36.0`` |
| 27 | + and are now being removed. Users are encouraged to use a preset searcher, which can be easily |
| 28 | + :ref:`configured <experiment-configuration_searcher>` for any experiment. |
| 29 | + |
| 30 | +- API: Custom Searcher (including DeepSpeed AutoTune) was deprecated in 0.36.0 and is now removed. |
| 31 | + We will maintain first-class support for a variety of preset searchers, which can be easily |
| 32 | + configured for any experiment. Visit :ref:`search-methods` for details. |
| 33 | + |
| 34 | +**New Features** |
| 35 | + |
| 36 | +- API/CLI: Add support for access tokens. Add the ability to create and administer access tokens |
| 37 | + for users to authenticate in automated workflows. Users can define the lifespan of these tokens, |
| 38 | + making it easier to securely authenticate and run processes. Users can set global defaults and |
| 39 | + limits for the validity of access tokens by configuring ``default_lifespan_days`` and |
| 40 | + ``max_lifespan_days`` in the master configuration. Setting ``max_lifespan_days`` to ``-1`` |
| 41 | + indicates an **infinite** lifespan for the access token. This feature enhances automation while |
| 42 | + maintaining strong security protocols by allowing tighter control over token usage and |
| 43 | + expiration. This feature requires Determined Enterprise Edition. |
| 44 | + |
| 45 | + - CLI: |
| 46 | + |
| 47 | + - ``det token create``: Create a new access token. |
| 48 | + - ``det token login``: Sign in with an access token. |
| 49 | + - ``det token edit``: Update an access token's description. |
| 50 | + - ``det token list``: List all active access tokens, with options for displaying revoked |
| 51 | + tokens. |
| 52 | + - ``det token describe``: Show details of specific access tokens. |
| 53 | + - ``det token revoke``: Revoke an access token. |
| 54 | + |
| 55 | + - API: |
| 56 | + |
| 57 | + - ``POST /api/v1/tokens``: Create a new access token. |
| 58 | + - ``GET /api/v1/tokens``: Retrieve a list of access tokens. |
| 59 | + - ``PATCH /api/v1/tokens/{token_id}``: Edit an existing access token. |
| 60 | + |
| 61 | +- API: Introduce ``keras.DeterminedCallback``, a new high-level training API for TF Keras that |
| 62 | + integrates Keras training code with Determined through a single :ref:`Keras Callback |
| 63 | + <api-keras-ug>`. |
| 64 | + |
| 65 | +- API: Introduce ``deepspeed.Trainer``, a new high-level training API for DeepSpeedTrial that |
| 66 | + allows for Python-side training loop configurations and includes support for local training. |
| 67 | + |
| 68 | +- Cluster: In the enterprise edition of Determined, add :ref:`config policies <config-policies>` to |
| 69 | + enable administrators to set limits on how users can define workloads (e.g., experiments, |
| 70 | + notebooks, TensorBoards, shells, and commands). Administrators can define two types of |
| 71 | + configurations: |
| 72 | + |
| 73 | + - **Invariant Configs for Experiments**: Settings applied to all experiments within a specific |
| 74 | + scope (global or workspace). Invariant configs for other tasks (e.g. notebooks, TensorBoards, |
| 75 | + shells, and commands) is not yet supported. |
| 76 | + |
| 77 | + - **Constraints**: Restrictions that prevent users from exceeding resource limits within a |
| 78 | + scope. Constraints can be set independently for experiments and tasks. |
| 79 | + |
| 80 | +- Helm: Support configuring ``determined_master_host``, ``determined_master_port``, and |
| 81 | + ``determined_master_scheme``. These control how tasks address the Determined API server and are |
| 82 | + useful when installations span multiple Kubernetes clusters or there are proxies in between tasks |
| 83 | + and the master. Also, ``determined_master_host`` now defaults to the service host, |
| 84 | + ``<det_namespace>.<det_service_name>.svc.cluster.local``, instead of the service IP. |
| 85 | + |
| 86 | +- Helm: Add support for capturing and restoring snapshots of the database persistent volume. Visit |
| 87 | + :ref:`helm-config-reference` for more details. |
| 88 | + |
| 89 | +- New RBAC role: In the enterprise edition of Determined, add a ``TokenCreator`` RBAC role, which |
| 90 | + allows users to create, view, and revoke their own :ref:`access tokens <access-tokens>`. This |
| 91 | + role can only be assigned globally. |
| 92 | + |
| 93 | +- Experiments: Add a ``name`` field to ``log_policies``. When a log policy matches, its name shows |
| 94 | + as a label in the WebUI, making it easy to spot specific issues during a run. Labels appear in |
| 95 | + both the run table and run detail views. |
| 96 | + |
| 97 | + In addition, there is a new format: ``name`` is required, and ``action`` is now a plain string. |
| 98 | + For more details, refer to :ref:`log_policies <config-log-policies>`. |
| 99 | + |
| 100 | +**Improvements** |
| 101 | + |
| 102 | +- Master Configuration: Add support for crypto system configuration for ssh connection. |
| 103 | + ``security.key_type`` now accepts ``RSA``, ``ECDSA`` or ``ED25519``. Default key type is changed |
| 104 | + from ``1024-bit RSA`` to ``ED25519``, since ``ED25519`` keys are faster and more secure than the |
| 105 | + old default, and ``ED25519`` is also the default key type for ``ssh-keygen``. |
| 106 | + |
| 107 | +**Removed Features** |
| 108 | + |
| 109 | +- WebUI: "Continue Training" no longer supports configurable number of batches in the Web UI and |
| 110 | + will simply resume the trial from the last checkpoint. |
| 111 | + |
| 112 | +**Known Issues** |
| 113 | + |
| 114 | +- PyTorch has `deprecated |
| 115 | + <https://pytorch.org/tutorials/intermediate/tensorboard_profiler_tutorial.html#use-tensorboard-to-view-results-and-analyze-model-performance>` |
| 116 | + their Profiler TensorBoard Plugin (``tb_plugin``), so some features may not be compatible with |
| 117 | + PyTorch 2.0 and above. Our current default environment image comes with PyTorch 2.3. If users are |
| 118 | + experiencing issues with this plugin, we suggest using an image with a PyTorch version earlier |
| 119 | + than 2.0. |
| 120 | + |
| 121 | +**Bug Fixes** |
| 122 | + |
| 123 | +- Previously, during a grid search, if a hyperparameter contained an empty nested hyperparameter |
| 124 | + (that is, just an empty map), that hyperparameter would not appear in the hparams passed to the |
| 125 | + trial. |
| 126 | + |
| 127 | +**Deprecations** |
| 128 | + |
| 129 | +- Experiment Config: The ``max_length`` field of the searcher configuration section has been |
| 130 | + deprecated for all experiments and searchers. Users are expected to configure the desired |
| 131 | + training length directly in training code. |
| 132 | + |
| 133 | +- Experiment Config: The ``optimizations`` config has been deprecated. Please see :ref:`Training |
| 134 | + APIs <apis-howto-overview>` to configure supported optimizations through training code directly. |
| 135 | + |
| 136 | +- Experiment Config: The ``scheduling_unit``, ``min_checkpoint_period``, and |
| 137 | + ``min_validation_period`` config fields have been deprecated. Instead, these configuration |
| 138 | + options should be specified in training code. |
| 139 | + |
| 140 | +- Experiment Config: The ``entrypoint`` field no longer accepts ``model_def:TrialClass`` as trial |
| 141 | + definitions. Please invoke your training script directly (``python3 train.py``). |
| 142 | + |
| 143 | +- Core API: The ``SearcherContext`` (``core.searcher``) has been deprecated. Training code no |
| 144 | + longer requires ``core.searcher.operations`` to run, and progress should be reported through |
| 145 | + ``core.train.report_progress``. |
| 146 | + |
| 147 | +- DeepSpeed: The ``num_micro_batches_per_slot`` and ``train_micro_batch_size_per_gpu`` attributes |
| 148 | + on ``DeepSpeedContext`` have been replaced with ``get_train_micro_batch_size_per_gpu()`` and |
| 149 | + ``get_num_micro_batches_per_slot()``. |
| 150 | + |
| 151 | +- Horovod: The Horovod distributed training backend has been deprecated. Users are encouraged to |
| 152 | + migrate to the native distributed backend of their training framework (``torch.distributed`` or |
| 153 | + ``tf.distribute``). |
| 154 | + |
| 155 | +- Trial APIs: ``TFKerasTrial`` has been deprecated. Users are encouraged to migrate to the new |
| 156 | + :ref:`Keras Callback <api-keras-ug>`. |
| 157 | + |
| 158 | +- Launchers: The ``--trial`` argument in Determined launchers has been deprecated. Please invoke |
| 159 | + your training script directly. |
| 160 | + |
| 161 | +- ASHA: The ``stop_once`` field of the ``searcher`` config for ASHA searchers has been deprecated. |
| 162 | + All ASHA searches are now early-stopping based (``stop_once: true``) instead of promotion based. |
| 163 | + |
| 164 | +- CLI: The ``--test`` and ``--local`` flags for ``det experiment create`` have been deprecated. All |
| 165 | + training APIs now support local execution (``python3 train.py``). Please see ``training apis`` |
| 166 | + for details specific to your framework. |
| 167 | + |
| 168 | +- Web UI: Previously, trials that reported an ``epoch`` metric enabled an epoch X-axis in the Web |
| 169 | + UI metrics tab. This metric name has been changed to ``epochs``, with ``epoch`` as a fallback |
| 170 | + option. |
| 171 | + |
| 172 | +- Database: After Amazon Aurora V1 reaches End of Life, support for Amazon Aurora V1 in ``det |
| 173 | + deploy aws`` will be removed. Future deployments will default to the ``simple-rds`` type, which |
| 174 | + uses Amazon RDS for PostgreSQL. We recommend that users migrate to Amazon RDS for PostgreSQL. For |
| 175 | + more information, visit the `migration instructions |
| 176 | + <https://gist.github.com/maxrussell/c67f4f7d586d55c4eb2658cc2dd1c290>`_. |
| 177 | + |
| 178 | +- Database: As a follow-up to the earlier notice, PostgreSQL 12 will reach End of Life on November |
| 179 | + 14, 2024. Instances still using PostgreSQL 12 or earlier should upgrade to PostgreSQL 13 or later |
| 180 | + to maintain compatibility. The application will log a warning if it detects a connection to any |
| 181 | + PostgreSQL version older than 12, and this warning will be updated to include PostgreSQL 12 once |
| 182 | + it is End of Life. |
| 183 | + |
9 | 184 | **************
|
10 | 185 | Version 0.37
|
11 | 186 | **************
|
|
0 commit comments