jump to main content

Itô's Lemma

Attention Conservation Notice

The 3rd post in a series of 5 in which the Black-Scholes-Merton Model for pricing European put and call options is derived.

  1. Brownian Motion as a Symmetric Random Walk Limit
  2. Stochastic Differential Equations & Geometric Brownian Motion

This follows Stephen Blyth's Excellent Book An Introduction to Quantitative Finance closely, with embellishment using python and some additional notes. This is probably:

  • not very helpful if you pragmatically want to price an option
  • overwhelming if you don't like math
  • may miss some of the contexts you'd want if you have a lot of mathematical maturity

Now what?

In the previous post, we learned about stochastic differential equations and we derived geometric Brownian motion, a possible process for the evolution of a stock over time that has some reasonable assumptions:

\(dS_t = \mu S_t dt + \sigma S_t d W_t\)

Now that we have a form for \(S_t\), a good next step is to figure out the distribution of \(S_t\) under the assumptions of geometric Brownian motion. To do this, we will need some additional tools, namely Itô's lemma.

Some Motivation

Kiyosi Itô

Kiyosi Itô in 1978

In the last post, we initially looked at the differential equation \(dS_t = \mu(S_t, t)dt + \sigma(S_t, t) d W_t\), where \(W_t\) is the derivative of a Brownian motion (or a Wiener process) and \(\mu, \sigma\) are deterministic functions of time \(t\) and the stock price at time \(t\), \(S_t\). From this we derived geometric Brownian motion, but let's consider the more general form for a bit.

To start, let's make this a simpler function. Suppose our stock price at time \(t\) is given by the stochastic differential equation:

\(dS_t = \mu(t) dt + \sigma(t)dW_t\)

Then \(\mu, \sigma\) are deterministic functions of time (as opposed to non-deterministic functions of time and \(S_t\)) and we can write an integral solution like so:

\(S_t = \int_0^t \mu(s) ds + \int_0^t \sigma(s)dW_s\)

Nice! We have the price as a weighted sum of integrals. We can't easily solve it in terms of \(W_s\), but we can get the mean and variance.

Since every \(dW_s\) has a mean of 0 (\(W \sim \mathcal{N}(0, 1)\)), \(\mathbb{E}[S_t] = \int_0^t \mu(s) ds\), the integral of the drift function.

Similarly, since \(dW_s\) has a variance of 1 and are i.i.d., \(\mathrm{Var}[S_t] = \int_0^t \sigma^2(s)\), or the integral of the variance of each step in the continuous random walk

This is all well and good, but what do you do when you have a more complex process \(S_t\) that appears on both sides of the stochastic differential equation? Something like \(dS_t = \mu(S_t, t)dt + \sigma(S_t, t) d W_t\) ?

We can't follow the same steps above. Instead, we want to rewrite \(S_t\) as a function of a simpler process \(Y_t\) where \(Y_t\) takes the simpler form above. That is, \(S_t = f(t, Y_t)\), and \(dY_t = \mu(t)dt + \sigma(t) dW_t\).

Itô's lemma is used to find this transformation, and once we have that transformation we can find the mean and higher moments of the process (variance, skewness, kurtosis, etc). We will also see in a later blog post that this is very helpful for changing the underlying numeraire, which is a very useful technique.

Handy Dandy Taylor Expansion

Consider a function \(f(S_t, t)\) of \(S_t\) and suppose that \(S_t\) is deterministic and \(f\) is twice differentiable.

Then taking the Taylor series, we get

\(\Delta f = \frac{\partial f}{\partial t}\Delta t + \frac{1}{2} \frac{\partial^2 f}{\partial (S_t)^2}\Delta (S_t)^2 + \frac{\partial^2 f}{\partial S_t \partial t}\Delta S_t \Delta t + \frac{1}{2} \frac{\partial^2 f}{\partial t^2}(\Delta t)^2\) + …

Now suppose that

\(\Delta S_t = \mu \Delta t + \sigma \sqrt{\Delta t} W\), \(W \sim \mathcal{N}(0, 1)\)


\((\Delta S_t)^2 = \sigma^2 \Delta t W^2 + O(\Delta t^{\frac{3}{2}})\)

Since \(\mathbb{E} (W^2) = 1\) [1] and \(\mathrm{Var}(W^2) = 2\) [2] then \((\Delta S_t)^2 \to \sigma^2 \Delta t\) as \(\Delta t \to 0\) and we have

\(\Delta f \approx \frac{\delta f}{\delta t}\Delta t + \frac{\delta f}{\delta S_t}\Delta S_t + \frac{1}{2} \frac{\delta^2 f}{\delta^2 S_t^2} \sigma^2 \Delta t\)

Taking limits we obtain Itô's lemma

Itô's Lemma

With limits, we get

\(df = \frac{df}{dt}dt + \frac{df}{dS_t}dS_t + \frac{1}{2}\frac{d^2f}{dS_t^2} \sigma^2 dt\)

If \(dS_t = \mu(S_t, t) dt + \sigma(S_t, t) dW_t\), then we can substitute and get

\(\frac{df}{dt}dt + \frac{df}{dS_t} (\mu(S_t, t) dt + \sigma(S_t, t) dW_t) + \frac{1}{2}\frac{d^2f}{dS_t^2} \sigma^2(S_t, t) dt\)

grouping \(dt\) terms gives us the most well-known form of the lemma:

\((\frac{df}{dt} + \frac{df}{dS_t} \mu(S_t, t) + \frac{1}{2}\frac{d^2f}{dS_t^2} \sigma^2(S_t, t))dt + \frac{df}{dS_t} \sigma(S_t, t) dW_t\)

Itô's Lemma is easiest to remember in the following form: If \(dx = \mu dt + \sigma d W_t\) then

\(df = \frac{df}{dt}dt + \frac{df}{dx}dx + \frac{1}{2}\frac{d^2f}{dx^2} dx^2\)

where \(dx^2\) is defined by the identities \(dt^2 = 0, dtdW_t = 0, (dW_t)^2 = dt\)

Plainly put, the derivative of a function of a random variable and time is a term about time, a term about the random variable, and a term reflecting the quadratic variation of the underlying Brownian motion (Wiener process). This final (memorable) form makes it much clearer that Itô's lemma is the stochastic calculus counterpart to the chain rule. [3]

Applied to Geometric Brownian Motion

Applying Itô's lemma to \(\log S_t\) where \(S_t\) follows geometric Brownian motion:

Let \(f(S_t) = \log S_t\). Then

\(\frac{\partial f}{\partial t} = 0\)

\(\frac{\partial f}{\partial S_t} = \frac{1}{S_t}\)

\(\frac{\partial^2f}{\partial S_t^2} = -\frac{1}{S_t^2}\)

Applying Itô's lemma gives us

\(d(\log S_t) = (\frac{1}{S_t} S_t \mu - \frac{1}{2} \sigma^2 S_t^2 \frac{1}{S_t^2}) dt + \sigma S_t \frac{1}{S_t}d W_t = (\mu - \frac{1}{2}\sigma^2)dt + \sigma dW_t\)

Thus, \(\log S_t\) follows standard Brownian motion and is normally distributed. Specifically,

\(\log S_T | S_t \sim \mathcal{N}(\log S_t + (\mu - \frac{1}{2}\sigma^2)(T - t), \sigma^2(T - t))\)

showing that under geometric Brownian motion the distribution of \(\log S_T | S_t\) is log-normal.

We can use this information to temper our expectations about the next step in a stock.

In the plot [4] above, the red line is a geometric Brownian motion process with \(\mu = 0\) and \(\sigma = 0.2\). The green points are drawn from \(\log S_T | S_t \sim \mathcal{N}(\log S_t + (\mu - \frac{1}{2}\sigma^2)(T - t), \sigma^2(T - t))\), showing a distribution of where the next point is likely to fall. We can see in this plot that almost every actual fell within the predicted distribution. It also kind of looks like a Christmas wreath

Extra Notes

Note [1]

b\(\mathbb{E}(X^2) = \mathrm{Var}(X) + [\mathbb{E}[x]]^2 = 1 + 0\) or see Note 2 ▽

Note [2]

if \(X\) is distributed as a standard normal, \(X^2\) is distributed as a Chi-squared r.v. (\(X^2 \sim \chi^2(1)\)) which has a variance of \(2k = 2 \cdot 1 = 2\))

Note [3]

Itô's lemma is a smaller part of the overall Itô Calculus, which extends calculus to stochastic processes. The central concept is the Itô stochastic integral \(Y_t = \int_0^t H_s dX_s\) where \(H\) is an integrable process adapted to the filtration generated by the stochastic process \(X\), which is generally a Brownian motion (or a semimartingale). Completing this integration gives us another stochastic process. There are also analogs for integration by parts and differentiation.

We can conceptualize the stochastic integral in the vocabulary of finance as follows:

For \(Y_t = \int_0^t H_s dX_s\):

  • \(Y_t\) is how much money we have in total at a given moment \(t\)
  • \(H_s\) is how much of a security/stock we hold
  • \(dX_s\) is the movement of the prices of the security

Overall, this equation represents a continuous-time trading strategy that consists of holding \(H_t\) of a stock at time \(t\).

Note [4]

Make your own predictions with:

from cytoolz import compose_left import numpy as np

def predict_next_step(s0: float, mu: float = 0., sigma: float = 0.2) -> float: def ito_after(st) -> float: sig_sq = sigma**2 return np.random.normal(np.log(st) + (mu - (1/2) * sig_sq), sig_sq) return compose_left(ito_after, np.exp)(s0)

print([predict_next_step(100) for _ in range(20)])

▽ Comments