Cointegration Testing in Python: ADF and Johansen Tests

What Is Cointegration?

Cointegration is a statistical property of two or more time series that share a long-term equilibrium relationship, even though each individual series is non-stationary (meaning it has a unit root and does not revert to a fixed mean). The concept was formalized by Robert Engle and Clive Granger in their foundational 1987 paper "Co-Integration and Error Correction: Representation, Estimation, and Testing" published in Econometrica. This work was a major factor in their being awarded the Nobel Prize in Economics in 2003.

The intuition is best understood through an analogy. Consider two people walking their dogs in a park. Each person walks a somewhat random path (non-stationary), and each dog also wanders randomly. The person and their dog are connected by a leash. Individually, neither the person nor the dog follows a predictable path. But the distance between them is bounded -- the leash keeps them from drifting too far apart. If the dog wanders too far ahead, the leash pulls it back. If the person gets ahead, the same force applies. The distance between them is mean-reverting, even though their individual paths are not.

In financial terms: two stock prices might each follow a random walk (non-stationary), but if they are cointegrated, the spread between them (more precisely, a linear combination of them) is stationary and mean-reverting. When the spread widens beyond its historical norm, it tends to contract back. This mean-reverting property is what makes cointegrated pairs valuable for trading.

Stationarity vs. Cointegration

Understanding cointegration requires understanding stationarity. A stationary time series has a constant mean, constant variance, and autocovariance that depends only on the lag (not on time). Stock prices are generally not stationary -- they follow a random walk with drift, meaning they can wander arbitrarily far from any starting point.

Stock returns (the percentage change from one period to the next) are approximately stationary. But prices themselves are not. You cannot meaningfully say that Apple's stock price "reverts to a mean" -- there is no mean for it to revert to.

Cointegration is the key insight that resolves this. Two non-stationary price series can produce a stationary linear combination. If stock Y and stock X are cointegrated, then there exists a coefficient β such that:

Z = Y − βX

where Z is stationary (mean-reverting). The value β is called the cointegrating coefficient or hedge ratio. When Z is far above its mean, you short Y and go long X (scaled by β). When Z is far below its mean, you do the opposite. This is the basis of pairs trading.

Key distinction: Correlation measures whether two series move together in the same direction over short periods. Cointegration measures whether two series maintain a stable long-term relationship. Two stocks can be highly correlated but not cointegrated, and vice versa. Correlation can break down during crises; cointegration, when genuine, implies a structural link.

The Augmented Dickey-Fuller (ADF) Test

The Augmented Dickey-Fuller test is the workhorse of stationarity testing. It tests the null hypothesis that a time series has a unit root (is non-stationary). If the test rejects the null hypothesis (p-value < 0.05), you conclude that the series is stationary.

The ADF test extends the original Dickey-Fuller test by including lagged difference terms to account for higher-order autocorrelation. The test regression is:

ΔY_t = α + γY_{t-1} + δ_1ΔY_{t-1} + ... + δ_pΔY_{t-p} + ε_t

The null hypothesis is γ = 0 (unit root, non-stationary). The alternative is γ < 0 (stationary). The test statistic is compared to critical values from the Dickey-Fuller distribution (not the standard t-distribution), which are more negative than standard critical values.

In Python, the ADF test is available in statsmodels:

from statsmodels.tsa.stattools import adfuller

result = adfuller(series, autolag='AIC')
adf_stat = result[0]
p_value  = result[1]
crit     = result[4]  # dict of critical values

print(f"ADF Statistic: {adf_stat:.4f}")
print(f"p-value:       {p_value:.4f}")
print(f"Critical values: {crit}")

If p_value < 0.05, reject the null hypothesis -- the series is stationary. For cointegration testing, you will apply this test to the residuals of a regression, as described in the next section.

Engle-Granger Two-Step Method

The Engle-Granger method tests for cointegration between two series in two steps. It is the most straightforward cointegration test and the one most commonly used in pairs trading research.

Step 1: OLS Regression

Run an ordinary least squares regression of Y on X:

Y_t = α + βX_t + ε_t

The coefficient β is the cointegrating coefficient (hedge ratio). The residuals ε_t represent the spread between the two series after accounting for their long-term relationship.

Step 2: ADF Test on Residuals

Apply the ADF test to the residuals ε_t from Step 1. If the residuals are stationary (ADF p-value < 0.05), then X and Y are cointegrated. The residuals represent the mean-reverting spread that you can trade.

Important caveat: The critical values for the ADF test on regression residuals are different from the standard ADF critical values, because the residuals are estimated (not observed). The statsmodels coint() function handles this automatically by using the correct critical values from MacKinnon (1991). Always use coint() rather than manually running ADF on OLS residuals with standard critical values.

In Python, statsmodels provides a convenient one-function implementation:

from statsmodels.tsa.stattools import coint

# Test cointegration between two price series
stat, p_value, crit_values = coint(y, x)

print(f"Cointegration test statistic: {stat:.4f}")
print(f"p-value: {p_value:.4f}")
print(f"Critical values (1%, 5%, 10%): {crit_values}")

if p_value < 0.05:
    print("Reject null: series are cointegrated")
else:
    print("Fail to reject null: no cointegration found")

The coint() function returns the test statistic, the p-value, and critical values at the 1%, 5%, and 10% significance levels. The null hypothesis is that there is NO cointegration. A p-value below 0.05 means you reject the null and conclude the series are cointegrated.

The Johansen Test

The Johansen test, developed by Søren Johansen, is a multivariate extension of cointegration testing that can handle more than two variables simultaneously. While the Engle-Granger method tests whether a single pair of series is cointegrated, the Johansen test can identify the number of cointegrating relationships among multiple series.

The Johansen test is based on a Vector Autoregression (VAR) model and uses two test statistics: the trace statistic and the maximum eigenvalue statistic. Both test the number of cointegrating vectors (relationships) present among the series.

The null hypothesis for the trace test is that the number of cointegrating vectors is less than or equal to r, tested against the alternative that it is greater than r. The test proceeds sequentially: first test r = 0 (no cointegration), then r = 1, and so on.

In Python:

from statsmodels.tsa.vector_ar.vecm import coint_johansen
import numpy as np

# data: numpy array with columns for each series
data = np.column_stack([y, x])

# det_order: -1 (no deterministic terms),
#             0 (constant), 1 (constant + trend)
# k_ar_diff: number of lagged differences in the VAR
result = coint_johansen(data, det_order=0, k_ar_diff=1)

# Trace statistics and critical values
print("Trace statistics:", result.lr1)
print("Critical values (90%, 95%, 99%):")
print(result.cvt)

# Max eigenvalue statistics
print("Max eigenvalue statistics:", result.lr2)
print("Critical values (90%, 95%, 99%):")
print(result.cvm)

For each row in the output, compare the test statistic to the critical value at your chosen significance level. If the trace statistic exceeds the 95% critical value for r = 0, there is at least one cointegrating relationship. The Johansen test is preferred when you want to test multiple pairs simultaneously or when the choice of dependent variable in the Engle-Granger regression matters (since the Engle-Granger result can change depending on which variable you put on the left side).

Half-Life of Mean Reversion

Once you have confirmed that two series are cointegrated and computed the spread, the next question is: how quickly does the spread revert to its mean? This is measured by the half-life -- the expected time for the spread to move halfway back to its mean from any given deviation.

The half-life is estimated by fitting an Ornstein-Uhlenbeck (OU) process to the spread. In practice, this means running a simple OLS regression of the change in spread on the lagged spread level:

ΔZ_t = λ · Z_{t-1} + ε_t
Half-life = −ln(2) / λ

If λ is negative (as it should be for a mean-reverting spread), the half-life is positive. A half-life of 15 days means that, on average, the spread will move halfway back to its mean within 15 trading days.

import numpy as np
from sklearn.linear_model import LinearRegression

# spread: the cointegrated spread (Z = Y - beta * X)
spread = np.array(spread_series)
lag    = spread[:-1].reshape(-1, 1)
delta  = np.diff(spread)

reg = LinearRegression(fit_intercept=True)
reg.fit(lag, delta)
lam = reg.coef_[0]

if lam < 0:
    half_life = -np.log(2) / lam
    print(f"Half-life: {half_life:.1f} periods")
else:
    print("Lambda is positive: spread is not mean-reverting")

The half-life is critical for determining the holding period of a pairs trade. If the half-life is 5 days, you expect the trade to work out within a few weeks. If it is 60 days, you need to be prepared to hold for months. Most practitioners look for half-lives between 5 and 60 trading days for practical pairs trading.

Complete Working Example

Here is a complete example that fetches two stocks, tests for cointegration, computes the spread z-score, and identifies entry and exit signals. This uses yfinance for market data and statsmodels for the cointegration test.

import yfinance as yf
import numpy as np
import pandas as pd
from statsmodels.tsa.stattools import coint, adfuller
from sklearn.linear_model import LinearRegression

# 1. Fetch data
ticker_y = "KO"    # Coca-Cola
ticker_x = "PEP"   # PepsiCo
start    = "2023-01-01"
end      = "2024-12-31"

y = yf.download(ticker_y, start=start, end=end)["Close"]
x = yf.download(ticker_x, start=start, end=end)["Close"]

# Align dates
df = pd.DataFrame({"Y": y, "X": x}).dropna()

# 2. Test cointegration (Engle-Granger via statsmodels)
stat, p_value, crit = coint(df["Y"], df["X"])
print(f"Cointegration p-value: {p_value:.4f}")

# 3. Compute hedge ratio via OLS
X_arr = df["X"].values.reshape(-1, 1)
Y_arr = df["Y"].values
reg   = LinearRegression().fit(X_arr, Y_arr)
beta  = reg.coef_[0]
alpha = reg.intercept_
print(f"Hedge ratio (beta): {beta:.4f}")

# 4. Compute spread and z-score
df["spread"] = df["Y"] - beta * df["X"]
df["z_score"] = (
    (df["spread"] - df["spread"].rolling(60).mean())
    / df["spread"].rolling(60).std()
)

# 5. Half-life
spread_arr = df["spread"].dropna().values
lag_s  = spread_arr[:-1].reshape(-1, 1)
diff_s = np.diff(spread_arr)
hl_reg = LinearRegression().fit(lag_s, diff_s)
lam    = hl_reg.coef_[0]
if lam < 0:
    half_life = -np.log(2) / lam
    print(f"Half-life: {half_life:.1f} days")

# 6. Generate signals
df["signal"] = 0
df.loc[df["z_score"] >  2.0, "signal"] = -1  # short spread
df.loc[df["z_score"] < -2.0, "signal"] =  1  # long spread
df.loc[df["z_score"].abs() < 0.5, "signal"] = 0  # exit

print(df[["Y", "X", "spread", "z_score", "signal"]].tail(10))

Note on the example: KO (Coca-Cola) and PEP (PepsiCo) are a classic pairs trading example because they operate in the same industry with similar business models. Whether they are actually cointegrated in any given time period is an empirical question -- always test, never assume.

Why Cointegration Matters for Trading

The trading application of cointegration was formalized by Gatev, Goetzmann, and Rouwenhorst (2006) in their influential paper on pairs trading. They demonstrated that a simple pairs trading strategy based on co-moving stocks generated consistent profits over their sample period, although returns have declined over time as the strategy became more widely known.

Cointegration-based pairs trading has several attractive properties for systematic traders:

Market neutrality: Because you are long one stock and short another in the same sector, the strategy has low beta to the overall market. Your P&L depends on the spread, not on market direction.
Statistical foundation: Unlike many trading strategies, cointegration has a rigorous mathematical framework. You can compute confidence intervals, half-lives, and expected holding periods -- not just vague "the chart looks good" reasoning.
Mean reversion: The spread between cointegrated stocks has a well-defined mean and tends to revert to it. This provides a natural entry signal (spread deviates) and exit signal (spread reverts).
Risk management: The z-score of the spread provides a direct measure of how far the trade has moved against you. Stop-loss levels can be set at specific z-score thresholds (e.g., exit if the z-score exceeds 4).

Pitfalls and Limitations

Cointegration testing is powerful but has several important limitations that practitioners must understand.

Cointegration Can Break Down

A cointegrating relationship that held for the past 5 years can break down tomorrow. Structural changes in a company -- a merger, a major product launch, a shift in business model -- can permanently alter the relationship between two stocks. Cointegration is a statistical property of historical data; it does not guarantee future mean reversion.

Multiple Testing Problem

If you test 1,000 pairs of stocks for cointegration at the 5% significance level, you expect roughly 50 false positives -- pairs that appear cointegrated by chance. Always apply multiple testing corrections (such as Bonferroni correction or Benjamini-Hochberg FDR control) when scanning large numbers of pairs.

Sample Period Sensitivity

Cointegration test results can be highly sensitive to the sample period chosen. A pair that tests as cointegrated over 2020-2024 might not test as cointegrated over 2018-2024. Use rolling window cointegration tests to verify that the relationship is persistent, not just an artifact of a particular time frame.

Transaction Costs

Pairs trading involves entering and exiting two positions simultaneously, doubling the transaction costs compared to a single-stock strategy. The spread must move far enough to cover the round-trip costs on both legs. Short selling incurs additional costs (borrow fees) that can erode returns, particularly for hard-to-borrow stocks.

Critical warning: Never deploy a pairs trading strategy based on a single cointegration test. Use rolling windows (e.g., test cointegration over 12-month windows, rolling forward monthly) and require the relationship to be consistently significant across multiple periods. A single in-sample test is not sufficient evidence for live trading.

Engle-Granger vs. Johansen: When to Use Each

The Engle-Granger method is simpler and sufficient for testing a single pair of securities. Its main limitation is that the result can depend on which variable you designate as Y and which as X in the OLS regression. For two variables, this is typically not a major issue, but it is worth testing both orderings.

The Johansen test is preferred when:

Testing more than two series: If you want to test whether three or more stocks share a cointegrating relationship, the Johansen test handles this natively. The Engle-Granger method is limited to pairs.
Avoiding normalization asymmetry: The Johansen test treats all variables symmetrically -- there is no arbitrary choice of dependent variable.
Identifying the number of cointegrating vectors: Among N variables, there can be up to N-1 cointegrating vectors. The Johansen test identifies how many exist, which is useful for constructing multi-leg spread trades.

For most pairs trading applications, the Engle-Granger method (implemented via statsmodels coint()) is sufficient and more intuitive. Use the Johansen test when working with baskets of three or more securities.

Building a Pairs Trading Pipeline

A complete pairs trading system involves several stages beyond the cointegration test itself:

Universe selection: Start with stocks in the same sector or industry. Cointegration is more likely (and more stable) when the underlying businesses have a fundamental economic link. Testing random cross-sector pairs is a recipe for spurious results.
Cointegration screening: Test all candidate pairs for cointegration using rolling windows. Require significance at the 5% level across multiple non-overlapping windows.
Half-life filtering: Reject pairs with half-lives outside your trading horizon. If you cannot hold a position for 60 days, reject pairs with half-lives above 40-50 days.
Spread z-score monitoring: Compute the rolling z-score of the spread in real time. Standard entry thresholds are z = 2.0 (enter) and z = 0.5 (exit). Adjust based on backtesting.
Risk management: Set a maximum z-score for stop-loss (e.g., z = 4.0). Re-test cointegration periodically (monthly or quarterly) and exit any pair where the relationship has broken down.
Position sizing: Size positions based on the volatility of the spread, not the individual stocks. Use the spread's ATR or standard deviation for risk normalization.

Connection to Quantitative Signal Generation

Cointegration analysis provides a rigorous framework for identifying mean-reverting relationships that can be traded systematically. In a broader quantitative system, cointegrated pairs can serve as one signal source among many, contributing to a diversified portfolio of strategies.

The statistical testing framework -- hypothesis testing, p-values, half-life estimation -- exemplifies the kind of evidence-based approach that distinguishes quantitative trading from discretionary chart reading. Every claim (these two stocks mean-revert) is testable, and every parameter (the half-life, the hedge ratio) is estimated from data rather than assumed.

Quantitative Trading Signals

Alpha Suite applies statistical rigor to insider trading analysis -- SEC Form 4 conviction scoring, volatility-calibrated barriers, and systematic signal generation.

Get Started with Alpha Suite

What Is Cointegration?

Stationarity vs. Cointegration

The Augmented Dickey-Fuller (ADF) Test

Engle-Granger Two-Step Method

Step 1: OLS Regression

Step 2: ADF Test on Residuals

The Johansen Test

Half-Life of Mean Reversion

Complete Working Example

Why Cointegration Matters for Trading

Pitfalls and Limitations

Cointegration Can Break Down

Multiple Testing Problem

Sample Period Sensitivity

Transaction Costs

Engle-Granger vs. Johansen: When to Use Each

Building a Pairs Trading Pipeline

Connection to Quantitative Signal Generation

Quantitative Trading Signals

Continue Reading

Pairs Trading: Statistical Arbitrage for Market-Neutral Returns

Mean Reversion Trading: Theory, Evidence, and Implementation

Statistical Arbitrage Explained: From Theory to Practice

yfinance Python Tutorial: Downloading and Analyzing Market Data