You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The resultant operator has shape $(nm \times nm)$ and, as such, can act on vectors of length $nm$. The Kronecker sum of $\mathbf{A}$ and $\mathbf{B}$, denoted $\mathbf{A} \oplus \mathbf{B}$ can be defined in terms of the Kronecker product as
where $\mathbf{I}_d$ is the $d$-dimensional identity matrix, resulting in an operator of the same size as $\mathbf{A} \otimes \mathbf{B}$. By applying these definitions recursively, the Kronecker product or sum of more than two matrices can also be defined. In general, the Kronecker product/sum of $k$ square matrices $\{ \mathbf{A}^{(i)} \}_{i=1}^k$, with shapes $\{n_i \times n_i\}_{i=1}^k$ can be written respectively as
The resultant operators can act on either vectors of length $N = \prod_{i=1}^k n_i$, or equivalently tensors of shape $(n_1, n_2, \dots n_k)$.
56
56
@@ -68,7 +68,7 @@ In PyKronecker, expressions are written in terms of a high-level operator abstra
68
68
69
69
b) *To execute matrix-vector multiplications in a way that is maximally efficient and runs on parallel GPU/TPU hardware.*
70
70
71
-
Significant effort has gone into optimising the execution of matrix-vector and matrix-tensor multiplications. In particular, this comprises the kronx algorithm, Just In Time (JIT) compilation, and parallel processing on GPU/TPU hardware. As a result of this, PyKronecker is able to achieve very fast execution times compared to alternative implementations (see table 1) .
71
+
Significant effort has gone into optimising the execution of matrix-vector and matrix-tensor multiplications. In particular, this comprises the kronx algorithm, Just In Time (JIT) compilation, and parallel processing on GPU/TPU hardware. As a result of this, PyKronecker is able to achieve very fast execution times compared to alternative implementations (see Table 1) .
72
72
73
73
c) *To allow automatic differentiation for complex loss functions involving Kronecker products.*
74
74
@@ -82,7 +82,7 @@ One potential alternative in Python is the PyLops library which provides an inte
82
82
83
83
Another alternative is the library Kronecker.jl [@Stock2020], implemented in the Julia programming language [@bezanson2017]. Kronecker.jl has many of the same aims as PyKronecker and has a a clean interface, making use of Julia's support for unicode and infix functions to create Kronecker products with a custom $\otimes$ operator. However, at this time, the library does not support GPU acceleration or automatic differentiation, although the former is in development.
84
84
85
-
Table 1. shows a feature comparison of these libraries, along with the kronx algorithm implemented in "vanilla" (i.e running on the CPU without JIT compilation) NumPy. The table also shows the time to compute the multiplication of a Kronecker product against a vector in two scenarios. In the first, the Kronecker product is constructed from two of matrices of size $(400 \times 400)$ and $(500 \times 500)$, and in the second Kronecker product is constructed from three of matrices of size $(100 \times 100)$, $(150 \times 150)$ and $(200 \times 200)$ respectively. Experiments were performed with an Intel Core 2.80GHz i7-7700HQ CPU, and an Nvidia 1050Ti GPU. In both cases, PyKronecker on the GPU is the fastest by a significant margin.
85
+
Table 1. shows a feature comparison of these libraries, along with the kronx algorithm implemented in "vanilla" (i.e., running on the CPU without JIT compilation) NumPy. The table also shows the time to compute the multiplication of a Kronecker product against a vector in two scenarios. In the first scenario, the Kronecker product is constructed from two matrices of size $(400 \times 400)$ and $(500 \times 500)$, and in the second scenario Kronecker product is constructed from three matrices of size $(100 \times 100)$, $(150 \times 150)$ and $(200 \times 200)$, respectively. Experiments were performed with an Intel Core 2.80GHz i7-7700HQ CPU, and an Nvidia 1050Ti GPU. In both cases, PyKronecker on the GPU is the fastest by a significant margin.
86
86
87
87
| Implementation | Python | Auto-diff | GPU support | Compute time (400, 500) | Compute time (100, 150, 200) |
@@ -95,7 +95,7 @@ Table 1. shows a feature comparison of these libraries, along with the kronx alg
95
95
96
96
# Outlook and Future Work
97
97
98
-
There are several features that we are developing to expand the functionality of PyKronecker. The first is to provide support for non-square operators. In a typical problem, the Kronecker operators encountered represent simple linear transformations which preserve dimensionality, however, there are a significant minority of contexts where this is not the case. The inclusion of this feature would increase the range of possible applications. Secondly, we would like add support for sparse matrices. This would enable computation with larger matrices and faster execution times where applicable. However this would require integration with Jax's sparse module, which is currently under development. Finally, for convenience, it may be useful to add some commonly used algorithms such as the conjugate gradient method for solving linear systems [@shewchuk1994], least squares, and various matrix decompositions such as eigenvalue, Cholesky and LU.
98
+
There are several features that we are developing to expand the functionality of PyKronecker. The first is to provide support for non-square operators. In a typical problem, the Kronecker operators encountered represent simple linear transformations which preserve dimensionality. However, there are a significant minority of contexts where this is not the case. The inclusion of this feature would increase the range of possible applications. Secondly, we would like to add support for sparse matrices. This would enable computation with larger matrices and faster execution times where applicable. However this would require integration with Jax's sparse module, which is currently under development. Finally, for convenience, it may be useful to add some commonly used algorithms such as the conjugate gradient method for solving linear systems [@shewchuk1994], least squares, and various matrix decompositions such as eigenvalue, Cholesky and LU.
0 commit comments