Dual Numbers#

The rateslib.dual module creates dual number datatypes that provide this library with the ability to perform automatic differentiation (AD). The implementation style here is known as forward mode, as opposed to reverse mode (otherwise called automatic adjoint differentiation or back propagation).

Summary#

Classes#

`rateslib.dual.Dual`(real, vars, dual)	Dual number data type to perform first derivative automatic differentiation.
`rateslib.dual.Dual2`(real, vars, dual, dual2)	Dual number data type to perform second derivative automatic differentiation.

Methods#

`rateslib.dual.gradient`(dual[, vars, order, ...])	Return derivatives of a dual number.
`rateslib.dual.dual_exp`(x)	Calculate the exponential value of a regular int or float or a dual number.
`rateslib.dual.dual_log`(x[, base])	Calculate the logarithm of a regular int or float or a dual number.
`rateslib.dual.dual_solve`(A, b[, allow_lsq, ...])

Example#

First Derivatives#

\[\begin{split}f(x, y, z) = x^6 + e^{\frac{x}{y}} + \ln {z} \\ f(2, 1, 2) = 72.0822...\end{split}\]

In [1]: from rateslib.dual import *

In [2]: def func(x, y, z):
   ...:     return x**6 + dual_exp(x/y) + dual_log(z)
   ...: 

In [3]: func(2, 1, 2)
Out[3]: 72.0822032794906

For extracting only first derivatives it is more efficient to use Dual:

\[\begin{split}\frac{\partial f}{\partial x} &= \left . 6 x^5 + \frac{1}{y} e^{\frac{x}{y}} \right |_{(2,1,2)} = 199.3890... \\ \frac{\partial f}{\partial y} &= \left . -\frac{x}{y^2} e^{\frac{x}{y}} \right |_{(2,1,2)} = -14.7781... \\ \frac{\partial f}{\partial z} &= \left . \frac{1}{z} \right |_{(2,1,2)} = 0.50 \\\end{split}\]

In [4]: x, y, z = Dual(2, ["x"], []), Dual(1, ["y"], []), Dual(2, ["z"], [])

In [5]: func(x, y, z)
Out[5]: <Dual: 72.082203, (x, y, z), [199.4, -14.8, 0.5]>

In [6]: gradient(func(x, y, z), ["x", "y", "z"])
Out[6]: array([199.3890561, -14.7781122,   0.5      ])

Second Derivatives#

For extracting second derivatives we must use Dual2:

In [7]: x, y, z = Dual2(2, ["x"], [], []), Dual2(1, ["y"], [], []), Dual2(2, ["z"], [], [])

In [8]: func(x, y, z)
Out[8]: <Dual2: 72.082203, (x, y, z), [199.4, -14.8, 0.5], [[...]]>

In [9]: gradient(func(x, y, z), ["x", "y", "z"])
Out[9]: array([199.3890561, -14.7781122,   0.5      ])

In [10]: gradient(func(x, y, z), ["x", "y"], order=2)
Out[10]: 
array([[487.3890561 , -22.1671683 ],
       [-22.1671683 ,  59.11244879]])

The keep_manifold argument is also exclusively available for Dual2. When extracting a first order gradient from a Dual2 this is will use information about second order and transfer it to first order thus representing a linear manifold of the gradient. This is useful for allowing composited automatic calculation of second order gradients. For example consider the following functions, \(g(x)=x^2\) and \(h(y)=y^2\), evaluated at the points \(x=2\) and \(y=4\). This creates the quadratic manifolds centered at those points expressed in the following Dual2 numbers:

In [11]: g = Dual2(4, ["x"], [4], [1])  # g(x=2)

In [12]: h = Dual2(16, ["y"], [8], [1])  # h(y=4)

If we wish to multiply these two functions and evaluate the second order derivatives at (2, 4) we can simply do,

In [13]: gradient(g*h, order=2)
Out[13]: 
array([[32., 32.],
       [32.,  8.]])

And observe that, say, \(\frac{\partial (gh)}{\partial x \partial y} = 4xy|_{(2, 4)} = 32\), as shown in the above array.

But, we can also use the product rule of differentiation to assert that,

\[\begin{split}d_{x\zeta}^2(gh) = d_x \left ( d_\zeta(g)h + gd_\zeta(h) \right ) \\\\ d_{y\zeta}^2(gh) = d_y \left ( d_\zeta(g)h + gd_\zeta(h) \right ) \\\\\end{split}\]

which we express in our dual language as,

In [14]: gradient(g, ["x", "y"], keep_manifold=True) * h + g * gradient(h, ["x", "y"], keep_manifold=True)
Out[14]: 
array([<Dual2: 64.000000, (x, y), [36.0, 36.0], [[...]]>,
       <Dual2: 32.000000, (x, y), [48.0, 24.0], [[...]]>], dtype=object)

If the manifold is not maintained the product rule fails because information that is required to ultimately determine that desired second derivative is discarded.

In [15]: gradient(g, ["x", "y"]) * h + g * gradient(h, ["x", "y"])
Out[15]: 
array([<Dual2: 64.000000, (y, x), [32.0, 0.0], [[...]]>,
       <Dual2: 32.000000, (y, x), [0.0, 32.0], [[...]]>], dtype=object)

More specifically,

In [16]: gradient(g, ["x", "y"], keep_manifold=True)
Out[16]: 
array([<Dual2: 4.000000, (x, y), [2.0, 0.0], [[...]]>,
       <Dual2: 0.000000, (x, y), [1.0, 1.0], [[...]]>], dtype=object)

while,

In [17]: gradient(g, ["x", "y"])
Out[17]: array([4., 0.])

Implementation#

Forward mode AD is implemented using operating overloading and custom compatible functions. The operations implemented are;

addition (+),

subtraction and negation (-),

multiplication (*),

division and inversion (/) (**-1),

n’th power where n is an integer or a float (**n),

exponential and logarithms (which require the specific methods below),

equality of dual numbers with integers and floats and with each other.

Warning

Dual and Dual2 are not designed to operate with each other. The purpose for this is to avoid miscalculation of second derivatives. Dual should always be replaced by Dual2 in this instance. TypeErrors will be raised otherwise.

Compatability with NumPy#

To enable this library to perform its calculations in a vectorised way we need to leverage NumPy’s array calculations. NumPy arrays containing dual numbers are forced to have an object dtype configuration. This is imposed by NumPy and means that certain functions may not be compatible, for example np.einsum (although, support for object dtypes was added to np.einsum as of version 1.25.0). However, many functions are compatible.

Broadcasting#

Operations of Dual and Dual2 with int and float dtypes permit the NumPy versions; np.int8, np.int16, np.int32, np.int64, np.float16, np.float32, np.float64, and np.float128. Broadcasting of arrays has been implemented so that the following operations work as expected.

In [18]: np_arr = np.array([1, 2])

In [19]: Dual(3, ["x"], []) * np_arr
Out[19]: 
array([<Dual: 3.000000, (x), [1.0]>, <Dual: 6.000000, (x), [2.0]>],
      dtype=object)

In [20]: np_arr / Dual(4, ["y"], [])
Out[20]: 
array([<Dual: 0.250000, (y), [-0.1]>, <Dual: 0.500000, (y), [-0.1]>],
      dtype=object)

In [21]: Dual(4, ["x"], []) ** np_arr
Out[21]: 
array([<Dual: 4.000000, (x), [1.0]>, <Dual: 16.000000, (x), [8.0]>],
      dtype=object)

Elementwise Operations#

Simple operations on tensors also work as expected.

In [22]: x = np.array([Dual(1, ["x"], []), Dual(2, ["y"], [])])

In [23]: y = np.array([Dual(3, ["x"], []), Dual(4, ["y"], [])])

In [24]: x + y
Out[24]: 
array([<Dual: 4.000000, (x), [2.0]>, <Dual: 6.000000, (y), [2.0]>],
      dtype=object)

In [25]: x * y
Out[25]: 
array([<Dual: 3.000000, (x), [4.0]>, <Dual: 8.000000, (y), [6.0]>],
      dtype=object)

In [26]: x / y
Out[26]: 
array([<Dual: 0.333333, (x), [0.2]>, <Dual: 0.500000, (y), [0.1]>],
      dtype=object)

Linear Algebra#

Common linear algebraic operations are also available, such as:

np.matmul

np.inner

np.dot

np.tensordot

In [27]: np.dot(x, y)
Out[27]: <Dual: 11.000000, (x, y), [4.0, 6.0]>

In [28]: np.inner(x, y)
Out[28]: <Dual: 11.000000, (x, y), [4.0, 6.0]>

In [29]: np.matmul(x[:, np.newaxis], y[np.newaxis, :])
Out[29]: 
array([[<Dual: 3.000000, (x), [4.0]>,
        <Dual: 4.000000, (x, y), [4.0, 1.0]>],
       [<Dual: 6.000000, (y, x), [3.0, 2.0]>,
        <Dual: 8.000000, (y), [6.0]>]], dtype=object)

In [30]: np.tensordot(x[np.newaxis, :, np.newaxis], y[np.newaxis, :], (1, 1))
Out[30]: array([[[<Dual: 11.000000, (x, y), [4.0, 6.0]>]]], dtype=object)

Solving the linear system, \(Ax=b\), is not not directly possible from NumPy, thus a custom solver, dual_solve(), has been implemented using the Doolittle algorithm with partial pivoting.

In [31]: A = np.array([
   ....:     [1, 0],
   ....:     [Dual(2, ["z"], []), 1]
   ....: ], dtype="object")
   ....: 

In [32]: b = np.array([Dual(2, ["y"], []), Dual(5, ["x", "y"], [])])[:, np.newaxis]

In [33]: x = dual_solve(A, b)

In [34]: x
Out[34]: 
array([[<Dual: 2.000000, (z, x, y), [0.0, 0.0, 1.0]>],
       [<Dual: 1.000000, (z, x, y), [-2.0, 1.0, -1.0]>]], dtype=object)

In [35]: np.matmul(A, x)
Out[35]: 
array([[<Dual: 2.000000, (z, x, y), [0.0, 0.0, 1.0]>],
       [<Dual: 5.000000, (z, x, y), [0.0, 1.0, 1.0]>]], dtype=object)