This repository contains a simple implementation of MaxVol algorithms for optimal design, using the D-optimality criterion. The classical reference for this method is the book "Theory of Optimal Experiments" V. V. Fedorov (1972), Academic Press. For a short overview of this class of methods, see the Wikipedia article on Optimal Design of Experiments. The derivation below is based on the author's understanding of the algorithms, but can probably be found in many other sources as well.
The goal of this repository is to provide a minimal self-contained derivation and implementation, which is not intended to be the most efficient, but rather to illustrate the basic principles of MaxVol algorithms.
Copyright 2025, Toon Verstraelen.
The tutorials in the Jupyter notebooks and the Markdown files are licensed under the
Creative Commons CC BY-SA 4.0 license license.
The source code in the optdesign
Python package is licensed under the
LGPL-3.0 license.
You can install the optdesign
package from this repository
using the following command:
pip install optdesign
The Jupyter notebooks make use of the following Python packages:
numpy
https://numpy.org/matplotlib
https://matplotlib.org/
These are readily available when opening the notebooks in Google Colab. (See link below.)
This section contains a quick recap of the theory of multivariate linear regression and the D-optimality criterion.
Consider the following linear set of equations:
where
The solution of this system of equations is given by:
The expected value of the parameter vector
where we use the notation
The covariance matrix of the parameter vector
The covariance matrix is thus known before the experiment is performed,
i.e. before
The goal of the
This can be seen as a measure of the "volume" of the
information matrix
To facilitate manipulations of the design matrix
where
The determinant of the information matrix can then be expressed as:
where
Consider an initial non-singular design matrix
Consider the following addition of a row vector
We work out the determinant after row addition
by making use of the SVD of the initial design matrix
The second term is a rank-1 matrix. We now make use of the following properties of matrix determinants:
- The determinant is unvariant under orthogonal transformations of the rows and columns.
- The determinant of a product of matrices is the product of the determinants.
Left-multiplying with
Then we factor out the singular values:
For the last step, we introduce
The matrix in the second determinant is diagonal in any basis
where
The second factor expresses the change in determinant
after the addition of the new vector
When testing a large set of candidate vectors opt_dmetric()
in the optdesign
packages uses this approach
to construct an optimal design matrix in a "greedy" fashion,
i.e. by always adding and removing the optimal row at each step.
Note that such a greedy algorithm performs a local search
and does not guarantee a globally optimal design matrix.
Consider a square matrix
The derivation of this algorithm makes use of Cramer's rule, which can be used to express the inverse of a matrix as:
where
We will derive an expression for the determinant of a matrix
Of this matrix identify, we need only the diagonal element with indexes
Because
In conclusion, the change in determinant after replacing a row
If one has a matrix
The function opt_maxvol()
in the optdesign
package uses this approach to construct
a square optimal design matrix in a "greedy" fashion,
i.e. by always replacing the optimal row at each step.
Again, this corresponds to a local search and does not guarantee a globally optimal design matrix.
The optdesign
package also contains a few other utilities to set up a design matrix.
The function setup_greedy()
will start building a square matrix from a set of candidate vectors,
to initiate the greedy algorithm.
It does this by selecting each row to be added to maximize
the non-zero singular values of the design matrix being built.
The function setup_random()
will randomly select a set of rows from the candidate vectors.