Dual Numbers#
The rateslib.dual
module creates dual number datatypes that provide this library
with the ability to perform automatic
differentiation (AD). The implementation style here is known as forward mode, as
opposed to reverse mode (otherwise called automatic adjoint differentiation or
back propagation).
Summary#
Classes#
|
Dual number data type to perform first derivative automatic differentiation. |
|
Dual number data type to perform second derivative automatic differentiation. |
Methods#
|
Return derivatives of a dual number. |
Calculate the exponential value of a regular int or float or a dual number. |
|
|
Calculate the logarithm of a regular int or float or a dual number. |
|
Example#
First Derivatives#
In [1]: from rateslib.dual import *
In [2]: def func(x, y, z):
...: return x**6 + dual_exp(x/y) + dual_log(z)
...:
In [3]: func(2, 1, 2)
Out[3]: 72.0822032794906
For extracting only first derivatives it is more efficient
to use Dual
:
In [4]: x, y, z = Dual(2, ["x"], []), Dual(1, ["y"], []), Dual(2, ["z"], [])
In [5]: func(x, y, z)
Out[5]: <Dual: 72.082203, (x, y, z), [199.4, -14.8, 0.5]>
In [6]: gradient(func(x, y, z), ["x", "y", "z"])
Out[6]: array([199.3890561, -14.7781122, 0.5 ])
Second Derivatives#
For extracting second derivatives we must use Dual2
:
In [7]: x, y, z = Dual2(2, ["x"], [], []), Dual2(1, ["y"], [], []), Dual2(2, ["z"], [], [])
In [8]: func(x, y, z)
Out[8]: <Dual2: 72.082203, (x, y, z), [199.4, -14.8, 0.5], [[...]]>
In [9]: gradient(func(x, y, z), ["x", "y", "z"])
Out[9]: array([199.3890561, -14.7781122, 0.5 ])
In [10]: gradient(func(x, y, z), ["x", "y"], order=2)
Out[10]:
array([[487.3890561 , -22.1671683 ],
[-22.1671683 , 59.11244879]])
The keep_manifold
argument is also exclusively available
for Dual2
. When
extracting a first order gradient from a Dual2
this is
will use information about
second order and transfer it to first order thus representing a linear manifold
of the gradient. This is useful for allowing composited automatic calculation of
second order gradients. For example
consider the following functions, \(g(x)=x^2\) and \(h(y)=y^2\), evaluated at
the points \(x=2\) and \(y=4\). This creates the quadratic manifolds centered
at those points expressed in the following Dual2
numbers:
In [11]: g = Dual2(4, ["x"], [4], [1]) # g(x=2)
In [12]: h = Dual2(16, ["y"], [8], [1]) # h(y=4)
If we wish to multiply these two functions and evaluate the second order derivatives at (2, 4) we can simply do,
In [13]: gradient(g*h, order=2)
Out[13]:
array([[32., 32.],
[32., 8.]])
And observe that, say, \(\frac{\partial (gh)}{\partial x \partial y} = 4xy|_{(2, 4)} = 32\), as shown in the above array.
But, we can also use the product rule of differentiation to assert that,
which we express in our dual language as,
In [14]: gradient(g, ["x", "y"], keep_manifold=True) * h + g * gradient(h, ["x", "y"], keep_manifold=True)
Out[14]:
array([<Dual2: 64.000000, (x, y), [36.0, 36.0], [[...]]>,
<Dual2: 32.000000, (x, y), [48.0, 24.0], [[...]]>], dtype=object)
If the manifold is not maintained the product rule fails because information that is required to ultimately determine that desired second derivative is discarded.
In [15]: gradient(g, ["x", "y"]) * h + g * gradient(h, ["x", "y"])
Out[15]:
array([<Dual2: 64.000000, (y, x), [32.0, 0.0], [[...]]>,
<Dual2: 32.000000, (y, x), [0.0, 32.0], [[...]]>], dtype=object)
More specifically,
In [16]: gradient(g, ["x", "y"], keep_manifold=True)
Out[16]:
array([<Dual2: 4.000000, (x, y), [2.0, 0.0], [[...]]>,
<Dual2: 0.000000, (x, y), [1.0, 1.0], [[...]]>], dtype=object)
while,
In [17]: gradient(g, ["x", "y"])
Out[17]: array([4., 0.])
Implementation#
Forward mode AD is implemented using operating overloading and custom compatible functions. The operations implemented are;
addition (+),
subtraction and negation (-),
multiplication (*),
division and inversion (/) (**-1),
n’th power where n is an integer or a float (**n),
exponential and logarithms (which require the specific methods below),
equality of dual numbers with integers and floats and with each other.
Compatability with NumPy#
To enable this library to perform its calculations in a vectorised way we need to
leverage NumPy’s array calculations. NumPy arrays containing dual numbers are
forced to have an object
dtype configuration. This is imposed by NumPy and means
that certain functions may not be compatible, for example np.einsum
(although,
support for object
dtypes was added to np.einsum
as of version 1.25.0).
However, many functions are compatible.
Broadcasting#
Operations of Dual
and Dual2
with int
and float
dtypes permit the NumPy versions; np.int8, np.int16,
np.int32, np.int64, np.float16, np.float32, np.float64, and np.float128.
Broadcasting of arrays has been implemented so that the following
operations work as expected.
In [18]: np_arr = np.array([1, 2])
In [19]: Dual(3, ["x"], []) * np_arr
Out[19]:
array([<Dual: 3.000000, (x), [1.0]>, <Dual: 6.000000, (x), [2.0]>],
dtype=object)
In [20]: np_arr / Dual(4, ["y"], [])
Out[20]:
array([<Dual: 0.250000, (y), [-0.1]>, <Dual: 0.500000, (y), [-0.1]>],
dtype=object)
In [21]: Dual(4, ["x"], []) ** np_arr
Out[21]:
array([<Dual: 4.000000, (x), [1.0]>, <Dual: 16.000000, (x), [8.0]>],
dtype=object)
Elementwise Operations#
Simple operations on tensors also work as expected.
In [22]: x = np.array([Dual(1, ["x"], []), Dual(2, ["y"], [])])
In [23]: y = np.array([Dual(3, ["x"], []), Dual(4, ["y"], [])])
In [24]: x + y
Out[24]:
array([<Dual: 4.000000, (x), [2.0]>, <Dual: 6.000000, (y), [2.0]>],
dtype=object)
In [25]: x * y
Out[25]:
array([<Dual: 3.000000, (x), [4.0]>, <Dual: 8.000000, (y), [6.0]>],
dtype=object)
In [26]: x / y
Out[26]:
array([<Dual: 0.333333, (x), [0.2]>, <Dual: 0.500000, (y), [0.1]>],
dtype=object)
Linear Algebra#
Common linear algebraic operations are also available, such as:
np.matmul
np.inner
np.dot
np.tensordot
In [27]: np.dot(x, y)
Out[27]: <Dual: 11.000000, (x, y), [4.0, 6.0]>
In [28]: np.inner(x, y)
Out[28]: <Dual: 11.000000, (x, y), [4.0, 6.0]>
In [29]: np.matmul(x[:, np.newaxis], y[np.newaxis, :])
Out[29]:
array([[<Dual: 3.000000, (x), [4.0]>,
<Dual: 4.000000, (x, y), [4.0, 1.0]>],
[<Dual: 6.000000, (y, x), [3.0, 2.0]>,
<Dual: 8.000000, (y), [6.0]>]], dtype=object)
In [30]: np.tensordot(x[np.newaxis, :, np.newaxis], y[np.newaxis, :], (1, 1))
Out[30]: array([[[<Dual: 11.000000, (x, y), [4.0, 6.0]>]]], dtype=object)
Solving the linear system, \(Ax=b\), is not not directly possible from NumPy,
thus a custom solver, dual_solve()
, has been implemented
using the Doolittle algorithm with partial pivoting.
In [31]: A = np.array([
....: [1, 0],
....: [Dual(2, ["z"], []), 1]
....: ], dtype="object")
....:
In [32]: b = np.array([Dual(2, ["y"], []), Dual(5, ["x", "y"], [])])[:, np.newaxis]
In [33]: x = dual_solve(A, b)
In [34]: x
Out[34]:
array([[<Dual: 2.000000, (z, x, y), [0.0, 0.0, 1.0]>],
[<Dual: 1.000000, (z, x, y), [-2.0, 1.0, -1.0]>]], dtype=object)
In [35]: np.matmul(A, x)
Out[35]:
array([[<Dual: 2.000000, (z, x, y), [0.0, 0.0, 1.0]>],
[<Dual: 5.000000, (z, x, y), [0.0, 1.0, 1.0]>]], dtype=object)