Developmental – What About Those Pesky Integrals?

So what about those pesky definite integrals? I mean the ones integrating over a high number of dimensions. Many ML problems, especially in Bayesian statistics, involve computing probabilities or expectations over high-dimensional spaces. For this reason, we’re gonna need a clever way to compute these integrals.

Enter Monte Carlo Integration! This technique isn’t just a workaround; it’s a powerful tool that harnesses the power of randomness to estimate these integrals with surprising efficiency and accuracy. Imagine trading a laborious, exhaustive search for a clever, strategic sampling that yields equally valuable insights — that’s Monte Carlo Integration for you.

Monte Carlo Integration’s elegance lies in its stochastic approach, diverging from deterministic methods like the Riemann sums approach and Trapezoidal rule. In contrast to strategically chosen points, Monte Carlo samples from a uniform distribution across the integration domain. Each random sample contributes to an integral estimate, relying on the principle that the average function value at these points are used to approximate the integral. The law of large numbers asserts that with large samples, our approximations progressively converge towards the true integral value.

In your calculus courses, you likely encountered the concept of determining the average value of a function f(x) over the interval [a, b]. This is characterized by the following calculation:

\displaystyle f_{avg}(a, b) = \frac{1}{b-a}\int\limits_{a}^{b}f(x) \ dx

Subsequently, we can articulate the integral across the domain through the following expression:

\displaystyle F(a, b) = \int\limits_{a}^{b}f(x) \ dx = (b - a) * f_{avg}(a, b)

At first glance, this might seem challenging to calculate analytically. However, we can cleverly use randomness to our advantage here. By selecting random points x_1, x_2, ..., x_k \sim U(a, b) from the continuous uniform distribution over [a, b], we can approximate f_{avg}(a,b). We do this through the formula

\displaystyle f_{avg}^{k}(a , b) = \frac{\sum\limits_{i = 1}^{k} f(x_i)}{k}

which is the average of the function values at these randomly selected points. To put this into a more tangible perspective, consider this: as the number of points k increases indefinitely, our approximation f_{avg}^{k}(a , b) gradually converges to the true average value f_{avg}(a, b). In mathematical terms, this is represented as

\displaystyle \lim\limits_{k\rightarrow \infty }f_{avg}^{k}(a , b) = f_{avg}(a, b)

Now, when we multiply both sides of this equation by the interval length b - a, it evolves into

\displaystyle \lim\limits_{k\rightarrow \infty }(b - a) f_{avg}^{k}(a , b) = (b - a ) f_{avg}(a, b) = F(a, b)

This final equation is the central point of Monte Carlo Integration. It shows how, through the process of increasing k, the approximation we get becomes an accurate representation of our original integral F(a, b).

Lets dive into a straightforward yet illustrative example using python. This exercise will demonstrate the power of Monte Carlo Integration in action. To get started, ensure you have the necessary Python libraries installed. These libraries not only simplify our coding process, but also provide powerful tools to execute it efficiently.

Importing Libraries
# Importing Libraries
import numpy as np
import matplotlib
import matplotlib.pyplot as plt


Now, let’s define the function we intend to integrate. For our example, we’ll focus on a straightforward function:

\displaystyle f(x) = x^{2}

In Python, we can briefly represent this function using lambda expression. Here’s how we implement our function in Python:

Creating our function to integrate
# Creating our function to integrate
f = lambda x: x**2


We’ve selected this function for a specific reason: It’s simple enough to be calculated analytically, which allows us to validate our results. Let’s first compute the integral analytically over the domain [2 , 3] for comparison purposes:

F(2, 3) = \displaystyle \int\limits_{2}^{3} x^{2} \ dx = \frac{3^{3}}{3} - \frac{2^{3}}{3} = \frac{19}{3}

With the analytical solution in mind, we can establish a reference value in our Python code. This will allow us to later compare the outputs of the Monte Carlo method with this exact value, ensuring the accuracy of our stochastic approach.

Creating a python variable for F(2, 3)
# Value of F(2, 3)
integral = 19/3.0


Let’s now move on to the computational side of Monte Carlo Integration. We define F^{k}(a, b) = (b - a) f_{avg}^{k}(a, b). As we’ve described earlier, theoretically,

\displaystyle \lim\limits_{k \rightarrow \infty} F^{k}(a, b) = F(a, b)

This limit tells us that as the number of samples k increases, our Monte Carlo estimate will get closer to the actual integral value. To simulate this in Python, we need a function for F^{k}(a, b). This demands randomly sampling k points from a uniform distribution over the interval [a, b] and then computing the average value of f(x) at these points. Here’s how we can code this function in Python:

Creating F^{k}(a , b)
# Creating F^{k}(a,b)
def F(a, b, f, k):
x = np.random.uniform(a, b, k)
f_avg = (f(x).sum())/ float(k)
return (b-a)* f_avg


Now lets put this integration function to the test. We aim to observe how the approximation of the integral F^{k}(2, 3) improves as we increase the number of samples. To do this, we’ll compute the estimates for a range of k values, starting from 100 and going up to 100, 000 in increments of 10. This will give us a clear picture of how the accuracy of our method evolves with increasing sample sizes.

Calculating Values
# Creating a list of domain values (k)
domain_k = [i for i in range(100, 100000, 10)]

# Creating a list of F^{k}(2, 3) values
approx_integ = [F(2, 3, f, i) for i in domain_k]


The list approx_integ stores the integral estimates for each k in our specified range, and the domain_k list simply tracks the number of calculations performed, providing a handy reference for plotting or analyzing the results.

Finally, we can plot the values we obtained. We will use matplotlib, a popular Python library, for this purpose. The following code snippet sets up our plot with an appropriate style, legends, axis labels, and title, ensuring our visualization is informative and aesthetically pleasing.

Python Plotting Code
plt.figure()
matplotlib.style.use('ggplot')

# Setting the bounds of the y-axis on our plot
plt.ylim(integral+.5, integral - .5)

# Creating plots
plt.plot(domain_k, approx_integ, color = "lightblue") # Plotting F^{k}(2, 3)
plt.axhline(integral, color = 'blue', linestyle = '--' ) # Plotting F(2, 3)

# Legends, title, and labels
plt.legend(["MCI Estimate $F^{k}(2, 3)$", f"Integral Value $F(2, 3)$"], loc = "lower center")
plt.title("Monte Carlo Integration Convergence of $\int_{2}^{3} x^{2}dx$", color = 'blue')
plt.xlabel("Number of Samples (k)", color = 'blue')
plt.ylabel("$F$ & $F^{k}$", color = 'blue')

# Saving the figure (optional)
plt.savefig("mci_convergence.jpg")


In this plot, the blue dashed line represents the analytically calculated integral value of F(2, 3), while the light blue line shows the estimates obtained from Monte Carlo Integration for different values of k. As k increases, we expect the light blue line to approach and align more closely with the blue.


However the true elegance of this method is how it beautifully translates into integration over any number of dimensions. Unlike other traditional numerical integration methods, which often become increasingly complex and computationally intensive with each added dimension, Monte Carlo Integration maintains a consistent approach regardless of the number of dimensions involved.

Consider the task of computing the integral represented as

\displaystyle B = \int_{G}f(\textbf{x}) d\textbf{x}

where B is the integral of the function f(\textbf{x}) over the multi-dimensional domain G. The procedure remains conceptually similar to the one-dimensional case, uniformly sampling k points within G, denoted as \textbf{x}_1, \textbf{x}_2, ..., \textbf{x}_k.

Our next step is to calculate the average value of the function over these points. The average, denoted as f_{avg}^{k}(G), is given by the formula:

\displaystyle f_{avg}^{k}(G) = \frac{1}{k} \sum\limits_{i = 1}^{k} f(\textbf{x}_i)

It’s important to note that in the sum, each f(\textbf{x}_i) represents the function evaluated at the i^{th} sampled point in the multi-dimensional space. As we increase the number of points (k ), according to the law of large numbers, this average value converges to the expected value of the function over the domain G. In mathematical terms, this is represented as:

\displaystyle \lim\limits_{k \rightarrow \infty} V* f_{avg}^{k}(G) = \lim\limits_{k \rightarrow \infty} B^{k}(G) = B

Here, V represents the volume of our domain of integration. This volume, when multiplied by the average value of the function over the domain, gives us the approximate value of the integral B, thus providing an effective method to calculate multi-dimensional integrals.

Lets illustrate this with a specific example. Imagine we want to calculate the integral of a function with two variables, x and y:

\displaystyle \int\limits_{1}^{2} \int\limits_{3}^{4} x^{2} y \ dx \ dy

This integral represents the sum of the function x^{2}y over the rectangular region: [3, 4] \times [1, 2]. Analytically, computing this integral gives us a value of 18.5. This will allow us to gauge the accuracy of our Monte Carlo estimates.

The Python code provided below is designed to be versatile, capable of estimating integrals over spaces having dimensions greater than one. All it requires is the specification of the integration domain, the number of samples, and a function defined using Python’s lambda notation.

Compute V
def compute_volume(G):
vol = 1;
for edge in G:
vol *= (edge[1] - edge[0])
return vol
Sample G
def sample_domain(G, k_samples):
d = len(G)
edge_one = G[0]
component_samples = np.random.uniform(edge_one[0], edge_one[1], k_samples)
for edge in range(1, d):
new_edge = G[edge]
samples = np.random.uniform(new_edge[0], new_edge[1], k_samples)
component_samples = np.vstack((component_samples, samples))
return component_samples.T
Compute B^{k}(G)
def compute_integral(f, G, k_samples):
volume = compute_volume(G)
sample_points = sample_domain(G, k_samples)
f_values = []
for point in sample_points:
f_values.append(f(*point))
return volume * np.mean(f_values)


By sequentially running the code for various sample sizes, we can confirm these estimates converge towards the analytical value of 18.5:


Monte Carlo Integration is a reminder of how randomness and probability can be harnessed to uncover precise answers to intricate questions. As computational capabilities advance, the potential applications of this method are bound only by our imaginations. So lets immerse ourselves, explore creatively, and allow chance to lead us towards a fresh understanding and innovative answers.

Leave a Reply

Your email address will not be published. Required fields are marked *