Monte Carlo Simulation for Portfolio Risk in Python
Monte Carlo simulation is one of the most versatile tools in quantitative finance. By generating thousands of random scenarios, you can estimate the distribution of possible portfolio outcomes — including Value at Risk, Expected Shortfall, probability of loss, and maximum drawdown. This tutorial builds a complete simulator from scratch in Python.
1. What Is Monte Carlo Simulation?
Monte Carlo simulation is a computational technique that uses repeated random sampling to estimate the probability distribution of an uncertain outcome. The name comes from the Monte Carlo Casino in Monaco and was coined by Stanislaw Ulam and John von Neumann during their work on nuclear weapons at Los Alamos in the 1940s.
In the context of portfolio risk, the idea is simple: instead of trying to compute a single expected return or a single risk number, you simulate thousands (or tens of thousands) of possible future paths for your portfolio. Each path represents one scenario of how the market might evolve over the coming days, weeks, or months. By analyzing the distribution of terminal portfolio values across all paths, you can answer questions like:
- What is the expected return of my portfolio over the next year?
- What is the worst-case loss at the 5th percentile (Value at Risk)?
- What is the average loss in the worst 5% of scenarios (Expected Shortfall / CVaR)?
- What is the probability that my portfolio loses more than 20%?
- What does the distribution of maximum drawdowns look like?
2. Step 1: Get Historical Returns
The foundation of any Monte Carlo portfolio simulation is historical return data. We use historical returns to estimate the statistical properties (mean, volatility, correlations) that govern how the portfolio’s assets behave. We will use yfinance to download adjusted close prices and compute daily log returns.
import numpy as np
import pandas as pd
import yfinance as yf
import matplotlib.pyplot as plt
# Define portfolio
TICKERS = ["AAPL", "MSFT", "JNJ", "JPM", "XOM"]
WEIGHTS = np.array([0.25, 0.25, 0.20, 0.15, 0.15])
# Download 2 years of daily data
data = yf.download(TICKERS, period="2y", auto_adjust=True)["Close"]
data = data.dropna()
# Compute daily log returns
log_returns = np.log(data / data.shift(1)).dropna()
print(f"Assets: {list(log_returns.columns)}")
print(f"Trading days: {len(log_returns)}")
print(f"\nAnnualized returns:")
print((log_returns.mean() * 252).round(4))
print(f"\nAnnualized volatility:")
print((log_returns.std() * np.sqrt(252)).round(4))
We use log returns (also called continuously compounded returns) rather than simple returns because log returns are additive across time. This means the total log return over a period is the sum of the daily log returns, which makes the simulation math cleaner. The log return for day t is ln(P_t / P_{t-1}).
3. Step 2: Compute Mean Returns and Covariance Matrix
The two inputs to a parametric Monte Carlo simulation are the mean return vector (expected daily return for each asset) and the covariance matrix (which captures both the volatility of each asset and the correlations between them).
# Daily mean return vector
mu = log_returns.mean().values # shape: (n_assets,)
# Daily covariance matrix
cov = log_returns.cov().values # shape: (n_assets, n_assets)
n_assets = len(TICKERS)
print(f"\nDaily mean returns: {mu.round(6)}")
print(f"\nCovariance matrix shape: {cov.shape}")
print(f"\nCorrelation matrix:")
corr = log_returns.corr()
print(corr.round(3))
The covariance matrix is critical because it captures how assets move together. If two assets are positively correlated, they tend to rise and fall at the same time, reducing the diversification benefit. The simulation must preserve these correlations to produce realistic portfolio paths.
4. Step 3: Generate Correlated Random Returns (Cholesky Decomposition)
The key mathematical technique for generating correlated random numbers is Cholesky decomposition. Given a positive-definite covariance matrix Σ, the Cholesky decomposition finds a lower triangular matrix L such that Σ = L LT. If you multiply L by a vector of independent standard normal random variables z, the result L z is a vector of correlated random variables with the desired covariance structure.
# Cholesky decomposition of the covariance matrix
L = np.linalg.cholesky(cov)
# Generate one set of correlated daily returns
z = np.random.standard_normal(n_assets) # independent standard normals
correlated_returns = mu + L @ z # correlated returns with correct mean/cov
print(f"Independent normals z: {z.round(4)}")
print(f"Correlated returns: {correlated_returns.round(6)}")
The Cholesky decomposition requires the covariance matrix to be positive definite. In practice, estimated covariance matrices from historical data are almost always positive definite, but numerical issues can arise with very short sample periods or highly correlated assets. If you encounter a non-positive-definite matrix, you can use the nearest positive-definite approximation (Higham, 2002) or add a small ridge to the diagonal: cov + 1e-8 * np.eye(n_assets).
Cholesky decomposition is the standard approach because it is computationally efficient (O(n3/3) operations for an n×n matrix) and numerically stable. The alternative — eigenvalue decomposition — also works but is slower. For the typical portfolio sizes in practice (5-50 assets), Cholesky is nearly instantaneous.
5. Step 4: Simulate N Portfolio Paths
Now we scale up to simulate many paths (typically 10,000) over a multi-day horizon (252 trading days for one year). Each path represents one possible future for the portfolio.
N_SIMULATIONS = 10000
HORIZON = 252 # trading days (1 year)
INITIAL_VALUE = 100000 # $100,000 portfolio
np.random.seed(42) # for reproducibility
# Pre-allocate array: (n_simulations, horizon+1)
portfolio_paths = np.zeros((N_SIMULATIONS, HORIZON + 1))
portfolio_paths[:, 0] = INITIAL_VALUE
for t in range(1, HORIZON + 1):
# Generate (n_simulations x n_assets) independent normals
Z = np.random.standard_normal((N_SIMULATIONS, n_assets))
# Transform to correlated returns: each row is one scenario
daily_returns = mu + Z @ L.T # (N_SIMULATIONS, n_assets)
# Portfolio return = weighted sum of asset returns
port_log_return = daily_returns @ WEIGHTS # (N_SIMULATIONS,)
# Update portfolio value
portfolio_paths[:, t] = portfolio_paths[:, t-1] * np.exp(port_log_return)
print(f"Simulated {N_SIMULATIONS} paths over {HORIZON} days")
print(f"Terminal values: min=${portfolio_paths[:, -1].min():,.0f}, "
f"median=${np.median(portfolio_paths[:, -1]):,.0f}, "
f"max=${portfolio_paths[:, -1].max():,.0f}")
A few notes on the implementation. We multiply Z @ L.T (instead of L @ Z.T) because each row of Z is an independent scenario, and we want to apply the Cholesky transformation to each row. The np.exp(port_log_return) converts log returns to simple multiplicative factors, which correctly compounds the portfolio value over time.
6. Step 5: Compute Risk Statistics
With 10,000 simulated terminal values, we can compute a rich set of risk statistics:
terminal_values = portfolio_paths[:, -1]
returns = (terminal_values - INITIAL_VALUE) / INITIAL_VALUE
# Expected return
expected_return = returns.mean()
# Value at Risk (VaR) at 95% confidence
# VaR is the loss at the 5th percentile
var_95 = np.percentile(returns, 5)
# Conditional VaR (CVaR) / Expected Shortfall
# Mean of all returns below the VaR threshold
cvar_95 = returns[returns <= var_95].mean()
# Probability of loss
prob_loss = (returns < 0).mean()
# Maximum drawdown for each path
def max_drawdown(path):
"""Compute maximum peak-to-trough drawdown for a single path."""
peak = np.maximum.accumulate(path)
drawdown = (path - peak) / peak
return drawdown.min()
drawdowns = np.array([max_drawdown(portfolio_paths[i]) for i in range(N_SIMULATIONS)])
print(f"Expected annual return: {expected_return:+.2%}")
print(f"VaR (95%, 1-year): {var_95:+.2%}")
print(f"CVaR (95%, 1-year): {cvar_95:+.2%}")
print(f"Probability of loss: {prob_loss:.1%}")
print(f"Median max drawdown: {np.median(drawdowns):.2%}")
print(f"95th pctile max DD: {np.percentile(drawdowns, 5):.2%}")
Understanding VaR and CVaR
Value at Risk (VaR) at the 95% confidence level answers the question: “What is the maximum loss I should expect 95% of the time?” Equivalently, there is a 5% chance that the loss will exceed the VaR. If the 1-year 95% VaR is -18%, it means that in 95% of simulated scenarios, the portfolio lost less than 18%.
Conditional Value at Risk (CVaR), also called Expected Shortfall (ES), goes further. It asks: “Given that we are in the worst 5% of scenarios, what is the average loss?” CVaR is always worse (more negative) than VaR and provides information about the severity of tail losses. It was recommended as the primary risk measure by the Basel Committee on Banking Supervision in the Basel III framework (2013), replacing VaR for market risk capital requirements.
7. Step 6: Plot the Fan Chart
A fan chart (also called a confidence band chart) is the canonical visualization for Monte Carlo simulation results. It shows the median path with shaded bands representing different percentile ranges.
days = np.arange(HORIZON + 1)
# Compute percentiles at each time step
p5 = np.percentile(portfolio_paths, 5, axis=0)
p25 = np.percentile(portfolio_paths, 25, axis=0)
p50 = np.percentile(portfolio_paths, 50, axis=0)
p75 = np.percentile(portfolio_paths, 75, axis=0)
p95 = np.percentile(portfolio_paths, 95, axis=0)
fig, ax = plt.subplots(figsize=(12, 6))
# Confidence bands
ax.fill_between(days, p5, p95, alpha=0.15, color="#4facfe", label="5th-95th pctile")
ax.fill_between(days, p25, p75, alpha=0.3, color="#4facfe", label="25th-75th pctile")
# Median path
ax.plot(days, p50, color="#00d4aa", linewidth=2, label="Median")
# Initial value reference
ax.axhline(y=INITIAL_VALUE, color="#8b9099", linestyle="--", linewidth=0.8, alpha=0.5)
ax.set_xlabel("Trading Days")
ax.set_ylabel("Portfolio Value ($)")
ax.set_title("Monte Carlo Simulation: 10,000 Paths, 1-Year Horizon")
ax.legend(loc="upper left")
ax.set_xlim(0, HORIZON)
ax.grid(True, alpha=0.2)
plt.tight_layout()
plt.savefig("monte_carlo_fan_chart.png", dpi=150, bbox_inches="tight")
plt.show()
The fan chart immediately conveys the range of uncertainty. A narrow fan indicates low uncertainty (either the portfolio is well-diversified or the simulation horizon is short). A wide fan, especially one that extends well below the initial value, highlights significant downside risk.
8. Enhancement: Fat Tails with Student-t Distribution
The basic simulation above assumes returns are normally distributed. In reality, financial returns exhibit fat tails (more extreme outcomes than the normal distribution predicts) and negative skewness (large losses are more common than large gains). Mandelbrot (1963) documented this phenomenon, and it has been confirmed by decades of subsequent research.
A simple improvement is to replace the normal distribution with a Student-t distribution, which has heavier tails controlled by a degrees-of-freedom parameter ν. Lower ν means fatter tails. The normal distribution is the limiting case as ν → ∞. Empirically, financial returns often correspond to ν between 3 and 8.
from scipy import stats
def estimate_dof(returns_series):
"""Estimate degrees of freedom by fitting a t-distribution."""
params = stats.t.fit(returns_series)
return params[0] # degrees of freedom
# Estimate dof from portfolio returns
port_returns = (log_returns @ WEIGHTS).values
nu = estimate_dof(port_returns)
print(f"Estimated degrees of freedom: {nu:.2f}")
# Simulate with t-distribution
portfolio_paths_t = np.zeros((N_SIMULATIONS, HORIZON + 1))
portfolio_paths_t[:, 0] = INITIAL_VALUE
for t in range(1, HORIZON + 1):
# Generate t-distributed random variables
# Scale so variance matches: t-dist has var = nu/(nu-2) for nu > 2
scale_factor = np.sqrt((nu - 2) / nu) if nu > 2 else 1.0
Z_t = stats.t.rvs(df=nu, size=(N_SIMULATIONS, n_assets)) * scale_factor
daily_returns = mu + Z_t @ L.T
port_log_return = daily_returns @ WEIGHTS
portfolio_paths_t[:, t] = portfolio_paths_t[:, t-1] * np.exp(port_log_return)
# Compare VaR
var_normal = np.percentile((portfolio_paths[:, -1] / INITIAL_VALUE - 1), 5)
var_t = np.percentile((portfolio_paths_t[:, -1] / INITIAL_VALUE - 1), 5)
print(f"VaR (95%, normal): {var_normal:+.2%}")
print(f"VaR (95%, t-dist): {var_t:+.2%}")
The scaling factor sqrt((nu-2)/nu) ensures that the variance of the t-distributed random variables matches the variance of the standard normal. Without this adjustment, the t-distribution would inflate volatility beyond what the historical covariance matrix implies. We want the same average volatility but with fatter tails.
9. Enhancement: Volatility Uncertainty
Another limitation of the basic approach is that it treats volatility as constant across all simulated paths. In reality, volatility is itself uncertain and time-varying (as described by GARCH models and the VIX index). A practical enhancement is to perturb the volatility across paths:
def simulate_with_vol_uncertainty(mu, cov, weights, n_sims, horizon,
initial_value, vol_perturbation=0.20):
"""Monte Carlo with per-path volatility perturbation."""
n_assets = len(mu)
paths = np.zeros((n_sims, horizon + 1))
paths[:, 0] = initial_value
for sim in range(n_sims):
# Perturb volatility for this path: multiply cov by a random factor
# Factor drawn uniformly from [1-p, 1+p]
vol_factor = 1.0 + vol_perturbation * (2 * np.random.random() - 1)
perturbed_cov = cov * vol_factor
# Cholesky of perturbed covariance
L_perturbed = np.linalg.cholesky(perturbed_cov)
for t in range(1, horizon + 1):
z = np.random.standard_normal(n_assets)
daily_ret = mu + L_perturbed @ z
port_ret = daily_ret @ weights
paths[sim, t] = paths[sim, t-1] * np.exp(port_ret)
return paths
With vol_perturbation=0.20, each path uses a covariance matrix scaled by a factor between 0.80 and 1.20. This means some paths simulate a low-volatility regime and others simulate a high-volatility regime, producing a wider and more realistic distribution of outcomes. The tradeoff is increased computation time because the Cholesky decomposition must be recomputed for each path.
10. Enhancement: Bootstrap (Non-Parametric) Simulation
The parametric approaches above assume a specific distribution (normal or Student-t) for returns. An alternative is bootstrap simulation, which makes no distributional assumptions at all. Instead, each simulated day’s returns are sampled (with replacement) from the actual historical daily return vectors.
def bootstrap_simulation(historical_returns, weights, n_sims, horizon, initial_value):
"""Non-parametric Monte Carlo using bootstrap resampling."""
n_days = len(historical_returns)
paths = np.zeros((n_sims, horizon + 1))
paths[:, 0] = initial_value
returns_matrix = historical_returns.values # (n_days, n_assets)
for t in range(1, horizon + 1):
# Randomly sample n_sims rows from historical returns (with replacement)
sample_indices = np.random.randint(0, n_days, size=n_sims)
sampled_returns = returns_matrix[sample_indices] # (n_sims, n_assets)
# Portfolio return for each simulation
port_returns = sampled_returns @ weights
paths[:, t] = paths[:, t-1] * np.exp(port_returns)
return paths
paths_bootstrap = bootstrap_simulation(log_returns, WEIGHTS, N_SIMULATIONS,
HORIZON, INITIAL_VALUE)
The bootstrap approach has two key advantages: it automatically captures the true distribution of returns (including fat tails, skewness, and any other non-normal features), and it preserves the cross-sectional correlation structure because entire daily return vectors are sampled together. The main limitation is that it assumes the future will resemble the past — it cannot generate scenarios more extreme than anything observed in the historical sample.
11. Validation: Comparing Simulated and Historical Risk
How do you know if your simulation is producing reasonable results? The standard validation technique is to compare the simulated risk statistics with those computed directly from historical data.
# Historical portfolio returns
hist_port_returns = (log_returns @ WEIGHTS).values
# Historical annualized return and vol
hist_annual_return = hist_port_returns.mean() * 252
hist_annual_vol = hist_port_returns.std() * np.sqrt(252)
# Historical VaR (daily)
hist_var_daily = np.percentile(hist_port_returns, 5)
# Simulated daily returns (from first day across all paths)
sim_daily_returns = np.log(portfolio_paths[:, 1] / portfolio_paths[:, 0])
sim_var_daily = np.percentile(sim_daily_returns, 5)
print(f"Historical annual return: {hist_annual_return:+.2%}")
print(f"Historical annual vol: {hist_annual_vol:.2%}")
print(f"Historical daily VaR(5%): {hist_var_daily:+.4f}")
print(f"Simulated daily VaR(5%): {sim_var_daily:+.4f}")
The simulated daily VaR should be close to the historical daily VaR if your simulation is correctly calibrated. If they diverge significantly, check your inputs: is the mean return vector correct? Is the covariance matrix being computed from the right data? Is the Cholesky decomposition applied correctly?
For longer horizons, you can also validate by running the simulation over historical periods (using only data available at each point in time) and comparing the simulated confidence bands with the actual realized portfolio path. If the realized path frequently falls outside the 5th-95th percentile band, your simulation is underestimating risk.
12. Practical Considerations
How Many Simulations?
The standard error of a Monte Carlo estimate decreases proportionally to 1/sqrt(N), where N is the number of simulations. With 10,000 simulations, the standard error of the mean terminal value is roughly 1% of the standard deviation of terminal values. For most practical purposes, 10,000 simulations provide adequate precision. If you need higher precision for tail statistics (like 99th percentile VaR), increase to 50,000 or 100,000 simulations.
Stationarity Assumption
All of the methods above assume that the statistical properties of returns (mean, volatility, correlations) are stationary — that is, the same in the future as in the past. This is a strong assumption that is frequently violated in practice. Volatility clusters (GARCH effects), regime changes, and structural breaks all cause non-stationarity. More sophisticated approaches use regime-switching models, GARCH-based simulation, or time-varying copulas to address this, but they add significant complexity.
Transaction Costs and Rebalancing
The basic simulation assumes a static (buy-and-hold) portfolio. In practice, portfolios are rebalanced periodically, incurring transaction costs. If you are simulating a rebalanced portfolio, you need to add a rebalancing step at regular intervals (e.g., monthly) and deduct estimated transaction costs at each rebalancing event.
Connection to Alpha Suite
Alpha Suite uses Monte Carlo simulation to generate confidence bands in its backtesting framework. When you run a backtest, the system simulates thousands of possible paths based on the historical volatility and correlation structure of the traded securities. The resulting confidence bands show whether the backtest’s realized performance falls within the range of statistically expected outcomes — helping you distinguish genuine alpha from luck.
Monte Carlo simulation does not predict the future. It maps the distribution of possible futures given your assumptions about return distributions, volatility, and correlations. The value lies not in any single simulated path but in the aggregate statistics: VaR, CVaR, probability of loss, drawdown distributions. These statistics enable informed risk management decisions that no single-point forecast can provide.