Portfolio Optimization with Python: Hierarchical Risk Parity
In this post, we will delve into the Hierarchical Risk Parity (HRP) algorithm, and demonstrate how it can be applied to optimize an ETF-based portfolio. HRP is a relatively recent development, as compared to Markowitz’s mean-variance framework, in portfolio management research that leverages hierarchical clustering to allocate weights based on the correlation structure between the assets.
Tools From The Python Ecosystem
We will be using the following libraries and packages:
[project]
name = "hierachical-risk-parity"
requires-python = ">=3.11"
dependencies = [
"arch>=7.1.0",
"cplex>=22.1.1.2",
"matplotlib>=3.9.2",
"pandas[performance]>=2.2.3",
"plotly>=5.24.1",
"pyportfolioopt>=1.5.5",
"scikit-learn>=1.5.2",
"scipy>=1.14.1",
"yfinance>=0.2.44",
]
[tool.uv]
package = false
dev-dependencies = [
"ipykernel>=6.29.5",
"ipywidgets>=8.1.5",
"isort>=5.13.2",
"nbformat>=5.10.4",
"ruff>=0.7.0",
]
import warnings
from datetime import datetime, timedelta
from typing import List, Optional, Tuple, Union
import cvxpy as cp
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import plotly.express as px
import plotly.figure_factory as ff
import plotly.graph_objects as go
import pypfopt as ppo
import yfinance as yf
from plotly.subplots import make_subplots
from scipy.cluster.hierarchy import dendrogram, linkage
from scipy.spatial.distance import squareform
from sklearn.covariance import ledoit_wolf
All-ETF Portfolio
The underlying investment vehicle for this portfolio optimization is Exchange-Traded Funds (ETFs). ETFs are a popular investment vehicle offering non-institutional investors diversification, liquidity, and cost-effectiveness. For this analysis, we will focus on the following equity ETFs with an emphasis on technology, financial services, real estate, utilities, and consumer staples.
ETF Selection
Technology ETFs
iShares U.S. Technology ETF (IYW): This ETF provides exposure to U.S. companies involved in technology, including hardware, software, and IT services.
Vanguard Information Technology Index Fund ETF Shares (VGT): This ETF tracks the performance of the MSCI US Investable Market Information Technology Index, offering broad exposure to the technology sector.
Financial Services and Banking ETFs
- iShares U.S. Financials ETF (IYF): This ETF provides exposure to U.S. companies in the financial services sector, including banks, investment firms, and insurance companies.
Real Estate ETFs
- iShares U.S. Real Estate ETF (IYR): This ETF tracks the Dow Jones U.S. Real Estate Capped Index, which measures the performance of the real estate sector of the U.S. equity market, as defined by the index provider.
Utilities ETFs
- The Utilities Select Sector SPDR Fund (XLU): This ETF provides exposure to companies in the utilities sector, including electricity, gas, and water utilities.
Consumer Staples ETFs
- iShares U.S. Consumer Staples ETF (IYK): This ETF provides exposure to U.S. companies in the consumer staples sector, including food, beverage, and household products.
Other sectors and categories can be explored by utilizing resources like the ETF Database Categories on Yahoo Finance. One of the most crucial aspects of building a portfolio is aligning it with our personal convictions about the sectors and industries we are investing in; it’s about having a deep, long-term belief in the growth potential and sustainability of those industries.
For more insights into the pros and cons of an all-ETF portfolio, consider reading this article from Charles Schwab: 3 Ways to Build an All-ETF Portfolio.
Daily Price Time Series
The daily price time series can be downloaded from Yahoo Finance using the yfinance package:
tech = np.array(["IYW", "VGT"])
real_estate = np.array(["IYR"])
bank_finance = np.array(["IYF"])
utilities = np.array(["XLU"])
consumer_staples = np.array(["IYK"])
tickers = np.concatenate([tech, real_estate, bank_finance, utilities, consumer_staples], axis=0)
tickers.shape
(6,)
The sampling period for this task can be set to the last 5 years:
start_date = (datetime.now() - timedelta(days=365.25 * 5)).strftime("%Y-%m-%d")
daily_prices = yf.download(", ".join(tickers), start=start_date)["Adj Close"]
[ 0% ][**************** 33% ] 2 of 6 completed[**********************50% ] 3 of 6 completed[**********************67%******* ] 4 of 6 completed[**********************83%*************** ] 5 of 6 completed[*********************100%***********************] 6 of 6 completed
dates | IYF | IYK | IYR | IYW | VGT | XLU |
---|---|---|---|---|---|---|
2019-10-20 | 60.01811 | 37.86379 | 83.19051 | 50.69644 | 209.5335 | 54.65551 |
2019-10-21 | 59.63262 | 37.88772 | 82.92007 | 50.16434 | 206.6947 | 54.88584 |
2019-10-22 | 59.80701 | 37.98643 | 83.03347 | 50.28638 | 206.9057 | 55.10764 |
2019-10-23 | 59.89420 | 38.08514 | 82.92878 | 50.78919 | 210.0705 | 55.29531 |
2019-10-24 | 60.00435 | 38.04027 | 82.08258 | 51.40185 | 212.6024 | 54.73229 |
Quick inspection for potential data quality issues:
- Missing Data: Unequal time series or API issues can lead to gaps in the dataset. Run the following to check for any missing values in the data:
daily_prices.isna().sum(axis=0)
Ticker
IYF 0
IYK 0
IYR 0
IYW 0
VGT 0
XLU 0
dtype: int64
- Zero Values: Abnormal zero values can also indicate data problems, potentially due to API issues. You can identify columns with zero values using:
(daily_prices == 0).sum(axis=0)
Ticker
IYF 0
IYK 0
IYR 0
IYW 0
VGT 0
XLU 0
dtype: int64
Time Series Plots
fig = make_subplots(rows=3, cols=2, subplot_titles=tickers)
for i, ticker in enumerate(tickers):
row = i // 2 + 1
col = i % 2 + 1
fig.add_trace(
go.Scatter(x=daily_prices.index, y=daily_prices.loc[:, ticker], name=ticker),
row=row,
col=col,
);
fig.update_layout(height=1000, autosize=True, title_text="Daily Prices");
fig.show()
Daily Returns
To calculate the daily returns:
\[\begin{equation} \text{Daily Return}_{i, t} = r_{i, t} = \frac{\text{price}_{i, t} - \text{price}_{i, t-1}}{\text{price}_{i, t-1}} \end{equation}\]
where
- \(\text{price}_{i, t}\) is the price on day \(t\) for asset \(i\)
- \(\text{price}_{i, t-1}\) is the price on day \(t-1\) for asset \(i\)
The following time series will be used as inputs for the portfolio optimization algorithm:
daily_returns = daily_prices.pct_change().dropna(how="all")
dates | IYF | IYK | IYR | IYW | VGT | XLU |
---|---|---|---|---|---|---|
2019-10-21 | -0.0064229 | 0.0006320 | -0.0032508 | -0.0104957 | -0.0135482 | 0.0042142 |
2019-10-22 | 0.0029245 | 0.0026052 | 0.0013676 | 0.0024327 | 0.0010209 | 0.0040411 |
2019-10-23 | 0.0014578 | 0.0025985 | -0.0012608 | 0.0099990 | 0.0152961 | 0.0034054 |
2019-10-24 | 0.0018392 | -0.0011781 | -0.0102039 | 0.0120628 | 0.0120526 | -0.0101820 |
2019-10-27 | 0.0032885 | -0.0029879 | -0.0054204 | 0.0136285 | 0.0114581 | -0.0137160 |
Annualized Mean Returns
The annualized (geometric) mean returns for each asset \(i\) over the sampling period can be calculated as:
\[\begin{equation} \text{Annualized Mean Return}_{i} = \bar{r}_{i} = \left( \prod_{t=1}^{T_i} (1 + r_{i,t}) \right)^{\frac{\text{frequency}}{T_i}} - 1 \end{equation}\]
where:
- \(r_{i,t}\) is the daily return for asset \(i\) on day \(t\)
- \(T_i\) is the total number of trading days observed for asset \(i\) over the sampling period
- \(\text{frequency}\) is the number of time periods in a year (typically \(252\) for daily returns)
This formula computes the geometric mean of daily returns for each asset \(i\), and annualizes the returns by raising the product of returns to the power of the ratio \(\frac{\text{frequency}}{T_i}\), where \(T_i\) accounts for the (potentially) varying number of observed trading days for each asset.
annualized_mean_returns = (1 + daily_returns).prod() ** (
252 / daily_returns.count()
) - 1
annualized_mean_returns
Ticker
IYF 0.128584
IYK 0.131330
IYR 0.041589
IYW 0.251454
VGT 0.236268
XLU 0.084593
dtype: float64
conditional_colors = ["green" if value > 0 else "red" for value in annualized_mean_returns]
fig, ax = plt.subplots();
bars = ax.bar(
annualized_mean_returns.index,
annualized_mean_returns * 100, # Convert to percentage
color=conditional_colors
);
for bar in bars:
height = bar.get_height()
ax.text(
bar.get_x() + bar.get_width() / 2,
height + np.sign(height) * 0.1, # Add spacing to the value
f'{height:.2f}%',
ha='center', va='bottom' if height > 0 else 'top', # Adjust placement for negative values
fontsize=10
);
title_text = f"Annualized Mean Returns (%)\nData from {start_date} to {daily_prices.index[-1].strftime('%Y-%m-%d')}"
ax.set_title(title_text, fontsize=14);
plt.xticks(rotation=45, ha='right');
plt.tight_layout();
plt.show()
Shrinkage Covariance Matrix
An essential component of portfolio optimization is the estimator of the covariance matrix, which quantifies the correlations between the daily returns series. While the sample covariance matrix is a standard, unbiased estimator, it often contains significant estimation errors in practice.
To address this issue, Ledoit and Wolf (2004) propose a shrinkage approach, which combines the sample covariance matrix with a more structured and stable estimator, such as the constant correlation model. The shrinkage process adjusts extreme covariance values in the noisy data toward a central target, effectively reducing the estimation error.
Shrinkage Target
Let the returns for asset \(i\) on day \(t\) be denoted by \(X_{i, t}\), where \(1 \leq i \leq N\) represents the index of assets (i.e., number of assets) and \(1 \leq t \leq T\) represents the time index (i.e., number of trading days over the sample period). Let the following matrices be defined:
- \(\Sigma\) represents the true population covariance matrix, where the element \(\sigma_{i j}\) represents the covariance between asset \(i\) and asset \(j\).
- \(S\) represents the sample covariance matrix, where the element \(s_{i j}\) represents the sample covariance between asset \(i\) and asset \(j\).
Population and Sample Correlations
The population correlation \(\varrho_{i j}\) between asset \(i\) and asset \(j\) is given by:
\[\begin{align*} \varrho_{i j} = \frac{\sigma_{i j}}{\sqrt{\sigma_{i i} \sigma_{j j}}} \end{align*}\]
Similarly, the sample correlation \(r_{i j}\) is defined as:
\[\begin{align*} r_{i j} = \frac{s_{i j}}{\sqrt{s_{i i} s_{j j}}} \end{align*}\]
Average Population and Sample Correlations
From \(\varrho_{i j}\) and \(r_{i j}\), the average population correlation \(\bar{\varrho}\) and the average sample correlation \(\bar{r}\) are computed as follows:
\[\begin{align*} \bar{\varrho} = \frac{2}{(N-1) N} \sum_{i=1}^{N-1} \sum_{j=i+1}^N \varrho_{i j} \end{align*}\]
\[\begin{align*} \bar{r} = \frac{2}{(N-1) N} \sum_{i=1}^{N-1} \sum_{j=i+1}^N r_{i j} \end{align*}\]
Constant Correlation Matrix
The shrinkage target is the constant correlation matrix, constructed based on the average correlations \(\bar{\varrho}\) and \(\bar{r}\):
Population Constant Correlation Matrix \(\Phi\): This matrix is based on the average population correlation \(\bar{\varrho}\):
\[\begin{align*} \phi_{i i} = \sigma_{i i}, \quad \phi_{i j} = \bar{\varrho} \sqrt{\sigma_{i i} \sigma_{j j}} \end{align*}\]
Sample Constant Correlation Matrix \(F\): This matrix is based on the average sample correlation \(\bar{r}\):
\[\begin{align*} f_{i i} = s_{i i}, \quad f_{i j} = \bar{r} \sqrt{s_{i i} s_{j j}} \end{align*}\]
The sample constant correlation matrix \(F\) serves as a shrinkage target for the covariance matrix estimation. It represents a balance between structured stability (constant correlations) and data-driven flexibility (sample covariances), reducing estimation error and improving the robustness of any downstream portfolio optimization procedures.
Note: The reference paper highlights that the constant correlation model may be unsuitable when assets belong to different classes, such as stocks and bonds. However, in our case, since all the ETFs selected hav same underlying asset class, i.e., equities, the constant correlation model is a reasonable and appropriate choice for estimating the covariance matrix.
Computing the Shrinkage Covariance Matrix
The Ledoit-Wolf shrinkage estimator is implemented in the scikit-learn
library.
Formally, let \(\hat{\Sigma}\) represent the shrinkage covariance matrix, where the element \(\hat{\sigma}_{ij}\) is the covariance between assets \(i\) and \(j\). The matrix \(\hat{\Sigma}\) is of size \(N \times N\):
\[\begin{align*} \hat{\Sigma} = \underbrace{ \begin{bmatrix} \hat{\sigma}_{11} & \hat{\sigma}_{12} & \cdots & \hat{\sigma}_{1N} \\ \hat{\sigma}_{21} & \hat{\sigma}_{22} & \cdots & \hat{\sigma}_{2N} \\ \vdots & \vdots & \ddots & \vdots \\ \hat{\sigma}_{N1} & \hat{\sigma}_{N2} & \cdots & \hat{\sigma}_{NN} \end{bmatrix} }_{N \times N} \end{align*}\]
To convert this covariance matrix to the correlation matrix \(\hat{R}\), we normalize each element by dividing the covariance between assets \(i\) and \(j\) by the square root of the product of the variances of assets \(i\) and \(j\):
\[\begin{align*} \hat{r}_{ij} = \frac{\hat{\sigma}_{ij}}{\sqrt{\hat{\sigma}_{ii} \hat{\sigma}_{jj}}} \end{align*}\]
To compute the values \(\hat{r}_{ij}\) efficiently, define \(D\) as a diagonal matrix that contains the inverse of the square roots of the diagonal elements of \(\hat{\Sigma}\) (i.e., the variances of the assets):
\[\begin{align*} D = \underbrace{ \begin{bmatrix} \frac{1}{\sqrt{\hat{\sigma}_{11}}} & 0 & \cdots & 0 \\ 0 & \frac{1}{\sqrt{\hat{\sigma}_{22}}} & \cdots & 0 \\ \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & \cdots & \frac{1}{\sqrt{\hat{\sigma}_{NN}}} \end{bmatrix} }_{N \times N} \end{align*}\]
The shrinkage correlation matrix \(\hat{R}\) is then computed as:
\[\begin{align*} \underbrace{\hat{R}}_{N \times N} = \underbrace{D}_{N \times N} \; \underbrace{\hat{\Sigma}}_{N \times N} \; \underbrace{D}_{N \times N} \end{align*}\]
This yields the following correlation matrix:
\[\begin{align*} \hat{R} = \underbrace{ \begin{bmatrix} 1 & \hat{r}_{12} & \cdots & \hat{r}_{1N} \\ \hat{r}_{21} & 1 & \cdots & \hat{r}_{2N} \\ \vdots & \vdots & \ddots & \vdots \\ \hat{r}_{N1} & \hat{r}_{N2} & \cdots & 1 \end{bmatrix} }_{N \times N} \end{align*}\]
In this matrix, the diagonal elements are \(1\), i.e., the correlation of an asset with itself is always \(1\).
def to_dataframe(matrix: Union[np.ndarray, pd.DataFrame], columns: pd.Index) -> pd.DataFrame:
return pd.DataFrame(matrix, index=columns, columns=columns)
# Shrinkage covariance matrix
shrinkage_covariance_matrix, _ = ledoit_wolf(
X=daily_returns,
assume_centered=False,
)
# Shrinkage correlation matrix
dinv = np.diag(1 / np.sqrt(np.diag(shrinkage_covariance_matrix)))
shrinkage_correlation_matrix = np.dot(dinv, np.dot(shrinkage_covariance_matrix, dinv))
columns = daily_returns.columns
shrinkage_covariance_matrix = to_dataframe(shrinkage_covariance_matrix, columns)
shrinkage_correlation_matrix = to_dataframe(shrinkage_correlation_matrix, columns)
IYF | IYK | IYR | IYW | VGT | XLU | |
---|---|---|---|---|---|---|
IYF | 0.0002485 | 0.0001317 | 0.0001972 | 0.0001858 | 0.0001909 | 0.0001453 |
IYK | 0.0001317 | 0.0001344 | 0.0001297 | 0.0001217 | 0.0001242 | 0.0001198 |
IYR | 0.0001972 | 0.0001297 | 0.0002475 | 0.0001739 | 0.0001776 | 0.0001721 |
IYW | 0.0001858 | 0.0001217 | 0.0001739 | 0.0003215 | 0.0003057 | 0.0001184 |
VGT | 0.0001909 | 0.0001242 | 0.0001776 | 0.0003057 | 0.0003056 | 0.0001225 |
XLU | 0.0001453 | 0.0001198 | 0.0001721 | 0.0001184 | 0.0001225 | 0.0002136 |
This can be easily visualized using a heatmap:
fig, ax = plt.subplots();
cax = ax.matshow(shrinkage_correlation_matrix, cmap="coolwarm");
ax.xaxis.set_ticks_position('bottom');
ax.set_xticks(np.arange(len(shrinkage_correlation_matrix.columns)));
ax.set_yticks(np.arange(len(shrinkage_correlation_matrix.columns)));
ax.set_xticklabels(shrinkage_correlation_matrix.columns);
ax.set_yticklabels(shrinkage_correlation_matrix.columns);
for i in range(shrinkage_correlation_matrix.shape[0]):
for j in range(shrinkage_correlation_matrix.shape[1]):
# Ensure text can be easily seen
value = shrinkage_correlation_matrix.iloc[i, j]
color = "white" if value < 0.5 else "black"
ax.text(j, i, f"{value:.2f}", ha="center", va="center", color=color);
plt.title("Shrinkage Correlation Matrix");
plt.tight_layout();
plt.show()
Hierarchical Risk Parity (HRP)
The HRP algorithm consists of three key stages:
Clustering: Assets are grouped based on their correlations using hierarchical clustering methods like single, average, or complete linkage.
Quasi-Diagonalization: The covariance matrix is reordered based on the clustering structure, concentrating covariances around the diagonal.
Recursive Bisection: Weights are assigned recursively across clusters in a top-down manner.
The following sections will walk through a numerical example of the HRP algorithm using daily returns data from the previous section.
Clustering
Instead of treating assets independently, HRP groups similar assets based on their correlations. The goal is to cluster assets that are more closely related to each other.
Distance Matrix
The distance between assets is computed as a function of their correlations, with lower correlations indicating greater distances. First, convert the correlation matrix into a distance matrix:
\[\begin{align*} d: \left(X_i, X_j\right) \subset B \rightarrow \mathbb{R}, \quad d_{i,j} = d\left(X_i, X_j\right) = \sqrt{\frac{1}{2}\left(1 - \rho_{i,j}\right)}, \quad d_{i,j} \in [0,1] \end{align*}\]
where:
\(d_{i,j}\) represents the distance between asset \(X_i\) and asset \(X_j\)
\(\rho_{i,j}\) is the correlation coefficient between the returns of assets \(X_i\) and \(X_j\)
dist_matrix = np.sqrt((1.0 - shrinkage_correlation_matrix.round(8)) / 2.0)
dist_matrix
Ticker IYF IYK IYR IYW VGT XLU
Ticker
IYF 0.000000 0.373690 0.320059 0.413875 0.391884 0.429825
IYK 0.373690 0.000000 0.379982 0.455236 0.439896 0.382743
IYR 0.320059 0.379982 0.000000 0.437962 0.420914 0.354621
IYW 0.413875 0.455236 0.437962 0.000000 0.110865 0.523615
VGT 0.391884 0.439896 0.420914 0.110865 0.000000 0.510184
XLU 0.429825 0.382743 0.354621 0.523615 0.510184 0.000000
Transform the distance matrix into a condensed form. The condensed form contains only the upper triangular part of the distance matrix (excluding the diagonal), which is what hierarchical clustering requires:
condensed_dist_matrix = squareform(X=dist_matrix, checks=False)
condensed_dist_matrix
array([0.37369036, 0.32005876, 0.41387541, 0.39188418, 0.42982473,
0.37998172, 0.45523561, 0.43989607, 0.38274266, 0.43796154,
0.420914 , 0.35462111, 0.11086528, 0.52361505, 0.51018444])
Hierarchical Clustering
A hierarchical clustering algorithm is applied to the condensed distance matrix to group pairs of assets into clusters. The linkage function from scipy
is used to perform the clustering.
Note: An important hyperparameter in hierarchical clustering is the linkage method, which determines how the distances between newly formed clusters are calculated. The “appropriate” linkage method depends on the data characteristics, and evaluation mechanisms can help determine the choice. This will be done in the next section.
linkage_matrix = linkage(y=condensed_dist_matrix, method="ward")
linkage_matrix
array([[3. , 4. , 0.11086528, 2. ],
[0. , 2. , 0.32005876, 2. ],
[1. , 5. , 0.38274266, 2. ],
[7. , 8. , 0.41569608, 4. ],
[6. , 9. , 0.62789035, 6. ]])
Linkage Matrix Structure
The linkage_matrix
contains the steps of hierarchical clustering and is structured as follows:
Column | Description |
---|---|
1st | Index of the first cluster being merged. If the value is less than the number of original items, it refers to an original item (i.e., a ticker). Otherwise, it refers to a cluster formed in a previous step. |
2nd | Index of the second cluster being merged. Follows the same logic as the first column. |
3rd | Distance (or dissimilarity) between the two clusters being merged. |
4th | Number of original items in the newly formed cluster after this merge. |
The hierarchical clustering merges the original items step by step, forming clusters that are indexed starting from 6 (the first new cluster). The cluster IDs start after the number of original items (6th in zero-based indexing).
linkage_matrix
array([[3. , 4. , 0.11086528, 2. ],
[0. , 2. , 0.32005876, 2. ],
[1. , 5. , 0.38274266, 2. ],
[7. , 8. , 0.41569608, 4. ],
[6. , 9. , 0.62789035, 6. ]])
- Step 1: Tickers
3
and4
are merged to form Cluster 6 (2 items). - Step 2: Tickers
0
and2
are merged to form Cluster 7 (2 items). - Step 3: Tickers
1
and5
are merged to form Cluster 8 (2 items). - Step 4: Clusters
7
(containing tickers0, 2
) and8
(containing tickers1, 5
) are merged to form Cluster 9 (4 items). - Step 5: Cluster
6
(containing tickers3, 4
) and Cluster9
are merged to form Cluster 10 (6 items).
In this process, the distance column shows the dissimilarity between the merged clusters. The number of original items reflects how many tickers are included in the newly formed cluster. The output can be visualized via a dendrogram:
fig, ax = plt.subplots();
dendrogram(linkage_matrix, labels=shrinkage_correlation_matrix.columns, orientation='top');
plt.title("Hierarchical Clustering Dendrogram (Ward's Method)");
plt.ylabel("Distance");
plt.show()
Quasi-Diagonalization
After clustering, the next step in the HRP algorithm is quasi-diagonalization. The goal of this step is to reorder the rows and columns of the correlation matrix based on the hierarchical clustering results obtained in the previous step. This seriation step allows similar investments to be grouped together, and dissimilar ones to be placed farther apart.
The following implementation is taken from the original paper with annotated print statements for better understanding:
def get_quasi_diag(linkage_matrix: np.ndarray) -> List[int]:
"""
Sort clustered items by distance based on the linkage matrix.
Parameters
----------
linkage_matrix : np.ndarray
The linkage matrix from hierarchical clustering.
Returns
-------
List[int]
The sorted indices of the clustered items.
"""
linkage_matrix = linkage_matrix.astype(int)
sorted_index = pd.Series([linkage_matrix[-1, 0], linkage_matrix[-1, 1]])
print(f"Initial sorted index (last two merged clusters): {sorted_index.tolist()}")
num_items = linkage_matrix[-1, 3]
print(f"Number of original items: {num_items}")
# Recursively sort clusters by distance until no clusters remain
while sorted_index.max() >= num_items:
print("\n--- New iteration ---\n")
sorted_index.index = range(0, sorted_index.shape[0] * 2, 2)
print(f"Expanded sorted index to make space: {sorted_index.tolist()}")
cluster_indices = sorted_index[sorted_index >= num_items]
i = cluster_indices.index # Indices in sorted_index where clusters need to be expanded
j = cluster_indices.values - num_items # Row numbers in linkage_matrix to retrieve cluster members
print(f"Identified clusters to expand:")
print(f" Indices in sorted_index (i): {i.tolist()}")
print(f" Corresponding linkage matrix rows (j): {j.tolist()}")
# Replace clusters with their first constituent
for idx, row_num in zip(i, j):
row_dict = {
'merged_1': linkage_matrix[row_num, 0],
'merged_2': linkage_matrix[row_num, 1],
'dist': linkage_matrix[row_num, 2],
'number_of_original': linkage_matrix[row_num, 3]
}
original_cluster = sorted_index[idx]
first_constituent = linkage_matrix[row_num, 0]
print(f" Row {row_num} of the linkage matrix: {row_dict}")
print(f" Cluster {original_cluster} was formed by merging {linkage_matrix[row_num, 0]} and {linkage_matrix[row_num, 1]}")
print(f" Replacing cluster {original_cluster} with its first constituent: {first_constituent}")
sorted_index[idx] = first_constituent
print(f"After replacing clusters with their first constituents: {sorted_index.tolist()}")
# Append the second constituent of the clusters
second_constituents = pd.Series(linkage_matrix[j, 1], index=i + 1)
sorted_index = pd.concat([sorted_index, second_constituents])
for idx, second_constituent in second_constituents.items():
print(f" Appending second constituent {second_constituent}")
print(f"After appending the second constituents: {sorted_index.tolist()}")
# Re-sort the index and reset indexing to keep proper order
sorted_index = sorted_index.sort_index()
sorted_index.index = range(sorted_index.shape[0])
print(f"Final sorted index after re-sorting: {sorted_index.tolist()}")
return sorted_index.tolist()
Given the linkage matrix:
linkage_matrix
array([[3. , 4. , 0.11086528, 2. ],
[0. , 2. , 0.32005876, 2. ],
[1. , 5. , 0.38274266, 2. ],
[7. , 8. , 0.41569608, 4. ],
[6. , 9. , 0.62789035, 6. ]])
The quasi-diagonalization process is applied as follows:
sorted_index = get_quasi_diag(linkage_matrix)
Initial sorted index (last two merged clusters): [6, 9]
Number of original items: 6
--- New iteration ---
Expanded sorted index to make space: [6, 9]
Identified clusters to expand:
Indices in sorted_index (i): [0, 2]
Corresponding linkage matrix rows (j): [0, 3]
Row 0 of the linkage matrix: {'merged_1': 3, 'merged_2': 4, 'dist': 0, 'number_of_original': 2}
Cluster 6 was formed by merging 3 and 4
Replacing cluster 6 with its first constituent: 3
Row 3 of the linkage matrix: {'merged_1': 7, 'merged_2': 8, 'dist': 0, 'number_of_original': 4}
Cluster 9 was formed by merging 7 and 8
Replacing cluster 9 with its first constituent: 7
After replacing clusters with their first constituents: [3, 7]
Appending second constituent 4
Appending second constituent 8
After appending the second constituents: [3, 7, 4, 8]
Final sorted index after re-sorting: [3, 4, 7, 8]
--- New iteration ---
Expanded sorted index to make space: [3, 4, 7, 8]
Identified clusters to expand:
Indices in sorted_index (i): [4, 6]
Corresponding linkage matrix rows (j): [1, 2]
Row 1 of the linkage matrix: {'merged_1': 0, 'merged_2': 2, 'dist': 0, 'number_of_original': 2}
Cluster 7 was formed by merging 0 and 2
Replacing cluster 7 with its first constituent: 0
Row 2 of the linkage matrix: {'merged_1': 1, 'merged_2': 5, 'dist': 0, 'number_of_original': 2}
Cluster 8 was formed by merging 1 and 5
Replacing cluster 8 with its first constituent: 1
After replacing clusters with their first constituents: [3, 4, 0, 1]
Appending second constituent 2
Appending second constituent 5
After appending the second constituents: [3, 4, 0, 1, 2, 5]
Final sorted index after re-sorting: [3, 4, 0, 2, 1, 5]
sorted_index
[3, 4, 0, 2, 1, 5]
The sorted correlation matrix is obtained by rearranging the rows and columns based on the sorted index:
sorted_correlation_matrix = shrinkage_correlation_matrix.iloc[sorted_index, sorted_index]
The quasi-diagonalized correlation matrix shows a more structured pattern, with higher correlations concentrated around the main diagonal. Store the ordered tickers for future reference:
ordered_tickers = shrinkage_correlation_matrix.index[sorted_index]
ordered_tickers
Index(['IYW', 'VGT', 'IYF', 'IYR', 'IYK', 'XLU'], dtype='object', name='Ticker')
Recursive Bisection
In the final step, recursive bisection, the hierarchical tree (built from hierarchical clustering) is split into two branches repeatedly until each branch contains only one asset. It can be shown that splitting weights inversely proportional to the variance of assets in each branch achieves optimal allocation when the correlation matrix is diagonal (see Appendix A.2. of the original paper).
Initialization:
Set the list of items \(L = \{L_0\}\), where \(L_0 = \{n\}_{n=1, \ldots, N}\). This list represents the starting set that includes every asset to be allocated.
Assign a unit weight to all items: \(w_n = 1\), for all \(n = 1, \ldots, N\).
Stopping Condition: If \(\left| L_i \right| = 1\) for all \(L_i \in L\), stop the algorithm.
Bisection (for each \(L_i \in L\) such that \(\left| L_i \right| > 1\)):
Bisect the set: Divide \(L_i\) into two subsets \(L_i^{(1)} \cup L_i^{(2)} = L_i\), where \[\begin{align*} \left| L_i^{(1)} \right| = \operatorname{int}\left(\frac{1}{2} \left|L_i\right| \right) \end{align*}\] The order is preserved in the bisection.
Variance of subsets: For each subset \(L_i^{(j)}\) (where \(j = 1, 2\)), calculate the variance \(\widetilde{V}_i^{(j)}\) using the quadratic form: \[\begin{align*} \widetilde{V}_i^{(j)} \equiv \widetilde{w}_i^{(j)} V_i^{(j)} \widetilde{w}_i^{(j)} \end{align*}\] where \(V_i^{(j)}\) is the covariance matrix of the assets in \(L_i^{(j)}\), and \(\widetilde{w}_i^{(j)}\) is defined as: \[\begin{align*} \widetilde{w}_i^{(j)} = \operatorname{diag}\left( V_i^{(j)} \right)^{-1} \frac{1}{\operatorname{tr}\left( \operatorname{diag}\left( V_i^{(j)} \right)^{-1} \right)} \end{align*}\] Here, \(\operatorname{diag}[\cdot]\) denotes the diagonal matrix and \(\operatorname{tr}[\cdot]\) is the trace operator.
Split factor: Compute the split factor \(\alpha_i\) as: \[\begin{align*} \alpha_i = 1 - \frac{\widetilde{V}_i^{(1)}}{\widetilde{V}_i^{(1)} + \widetilde{V}_i^{(2)}} \end{align*}\] ensuring \(0 \leq \alpha_i \leq 1\).
Rescale allocations:
- Rescale allocations for all \(n \in L_i^{(1)}\) by a factor of \(\alpha_i\).
- Rescale allocations for all \(n \in L_i^{(2)}\) by a factor of \(1 - \alpha_i\).
Repeat: Loop back to step 2 until the stopping condition is met.
Inverse Variance Portflio
def get_ivp(cov: pd.DataFrame) -> np.ndarray:
"""
Compute the inverse variance portfolio (IVP) for a given covariance matrix.
Parameters
----------
cov : pd.DataFrame
Covariance matrix of the cluster.
Returns
-------
np.ndarray
Inverse variance portfolio weights for the cluster.
"""
ivp = 1.0 / np.diag(cov)
ivp /= ivp.sum() # Normalize weights
print(f"\n IVP weights for cluster: {ivp}")
return ivp
Cluster Variance
def get_cluster_var(cov: pd.DataFrame, cluster_items: List[str]) -> float:
"""
Compute the variance for a given cluster.
Parameters
----------
cov : pd.DataFrame
Covariance matrix for the assets.
cluster_items : List[str]
List of tickers in the cluster.
Returns
-------
float
The variance of the cluster.
"""
# Subset of the covariance matrix containing only the cluster assets
cov_slice = cov.loc[cluster_items, cluster_items]
# Define the left space (4 spaces in this case)
hspace = " "
formatted_cov_slice = str(cov_slice).replace("\n", "\n" + hspace)
print(f"\n Covariance matrix slice for cluster {cluster_items}:\n{hspace}{formatted_cov_slice}")
# Calculate inverse variance portfolio (IVP) weights for the cluster and reshape to column vector (n x 1)
weights = get_ivp(cov_slice).reshape(-1, 1)
# Compute cluster variance as w.T * cov_slice * w, which is [1 x n] * [n x n] * [n x 1] = [1 x 1]
cluster_var = np.dot(np.dot(weights.T, cov_slice), weights)[0, 0]
print(f"\n Cluster variance for {cluster_items}: {cluster_var}\n")
return cluster_var
Recursive Bisection
def recursive_bisection(cov: pd.DataFrame, ordered_tickers: Union[pd.Index, List[str]]) -> pd.Series:
"""
Compute the Hierarchical Risk Parity (HRP) portfolio allocation using recursive bisection.
Parameters
----------
cov : pd.DataFrame
Covariance matrix for the assets.
ordered_tickers : Union[pd.Index, List[str]]
List of tickers ordered by hierarchical clustering.
Returns
-------
pd.Series
Portfolio weights for each asset.
"""
# Initialize equal weights for all assets
weights = pd.Series(1.0, index=ordered_tickers)
hspace = " " # Define the left space (4 spaces in this case)
formatted_weights = str(weights).replace("\n", "\n" + hspace)
print(f"Initial weights:\n{hspace}{formatted_weights}\n")
# Initialize the cluster as the full set of ordered tickers
cluster_items = [ordered_tickers]
step = 0
while len(cluster_items) > 0:
step += 1
print(f"\n{'='*20} Step {step}: Recursive Bisection {'='*20}\n")
print(f"Cluster items at Step {step}: {[list(cluster) for cluster in cluster_items]}")
# Split each cluster into two halves only if the cluster contains more than one asset
cluster_items = [
i[j:k]
for i in cluster_items
for j, k in ((0, len(i) // 2), (len(i) // 2, len(i)))
if len(i) > 1
]
# For each pair of clusters, calculate variance and adjust weights
for i in range(0, len(cluster_items), 2):
first_cluster = cluster_items[i]
second_cluster = cluster_items[i + 1]
print(f"\n ---- Processing clusters ----")
print(f" First cluster: {first_cluster}")
print(f" Second cluster: {second_cluster}")
# Compute variance for each cluster
first_var = get_cluster_var(cov, list(first_cluster))
second_var = get_cluster_var(cov, list(second_cluster))
# Compute the weighting factor alpha
alpha = 1 - first_var / (first_var + second_var)
print(f" Weighting factor alpha for {list(first_cluster)}: {alpha}")
print(f" Weighting factor 1 - alpha for {list(second_cluster)}: {1 - alpha}")
# Adjust weights for each cluster
weights[first_cluster] *= alpha
weights[second_cluster] *= 1 - alpha
formatted_weights = str(weights).replace("\n", "\n" + hspace)
print(f" Updated weights after adjustment:\n{hspace}{formatted_weights}\n")
return weights
Portfolio Allocation
Applying the recursive bisection algorithm to the covariance matrix:
hrp_weights = recursive_bisection(cov=shrinkage_covariance_matrix, ordered_tickers=ordered_tickers)
Initial weights:
Ticker
IYW 1.0
VGT 1.0
IYF 1.0
IYR 1.0
IYK 1.0
XLU 1.0
dtype: float64
==================== Step 1: Recursive Bisection ====================
Cluster items at Step 1: [['IYW', 'VGT', 'IYF', 'IYR', 'IYK', 'XLU']]
---- Processing clusters ----
First cluster: Index(['IYW', 'VGT', 'IYF'], dtype='object', name='Ticker')
Second cluster: Index(['IYR', 'IYK', 'XLU'], dtype='object', name='Ticker')
Covariance matrix slice for cluster ['IYW', 'VGT', 'IYF']:
Ticker IYW VGT IYF
Ticker
IYW 0.000322 0.000306 0.000186
VGT 0.000306 0.000306 0.000191
IYF 0.000186 0.000191 0.000249
IVP weights for cluster: [0.29887164 0.31445765 0.38667071]
Cluster variance for ['IYW', 'VGT', 'IYF']: 0.00024294420088323586
Covariance matrix slice for cluster ['IYR', 'IYK', 'XLU']:
Ticker IYR IYK XLU
Ticker
IYR 0.000248 0.000130 0.000172
IYK 0.000130 0.000134 0.000120
XLU 0.000172 0.000120 0.000214
IVP weights for cluster: [0.24994149 0.46036675 0.28969176]
Cluster variance for ['IYR', 'IYK', 'XLU']: 0.00014859282764898994
Weighting factor alpha for ['IYW', 'VGT', 'IYF']: 0.3795115578366297
Weighting factor 1 - alpha for ['IYR', 'IYK', 'XLU']: 0.6204884421633703
Updated weights after adjustment:
Ticker
IYW 0.379512
VGT 0.379512
IYF 0.379512
IYR 0.620488
IYK 0.620488
XLU 0.620488
dtype: float64
==================== Step 2: Recursive Bisection ====================
Cluster items at Step 2: [['IYW', 'VGT', 'IYF'], ['IYR', 'IYK', 'XLU']]
---- Processing clusters ----
First cluster: Index(['IYW'], dtype='object', name='Ticker')
Second cluster: Index(['VGT', 'IYF'], dtype='object', name='Ticker')
Covariance matrix slice for cluster ['IYW']:
Ticker IYW
Ticker
IYW 0.000322
IVP weights for cluster: [1.]
Cluster variance for ['IYW']: 0.0003215181251290974
Covariance matrix slice for cluster ['VGT', 'IYF']:
Ticker VGT IYF
Ticker
VGT 0.000306 0.000191
IYF 0.000191 0.000249
IVP weights for cluster: [0.44850226 0.55149774]
Cluster variance for ['VGT', 'IYF']: 0.00023150783628519947
Weighting factor alpha for ['IYW']: 0.4186201958641258
Weighting factor 1 - alpha for ['VGT', 'IYF']: 0.5813798041358742
Updated weights after adjustment:
Ticker
IYW 0.158871
VGT 0.220640
IYF 0.220640
IYR 0.620488
IYK 0.620488
XLU 0.620488
dtype: float64
---- Processing clusters ----
First cluster: Index(['IYR'], dtype='object', name='Ticker')
Second cluster: Index(['IYK', 'XLU'], dtype='object', name='Ticker')
Covariance matrix slice for cluster ['IYR']:
Ticker IYR
Ticker
IYR 0.000248
IVP weights for cluster: [1.]
Cluster variance for ['IYR']: 0.0002475355614636847
Covariance matrix slice for cluster ['IYK', 'XLU']:
Ticker IYK XLU
Ticker
IYK 0.000134 0.000120
XLU 0.000120 0.000214
IVP weights for cluster: [0.61377445 0.38622555]
Cluster variance for ['IYK', 'XLU']: 0.0001392752130086069
Weighting factor alpha for ['IYR']: 0.3600603245827725
Weighting factor 1 - alpha for ['IYK', 'XLU']: 0.6399396754172275
Updated weights after adjustment:
Ticker
IYW 0.158871
VGT 0.220640
IYF 0.220640
IYR 0.223413
IYK 0.397075
XLU 0.397075
dtype: float64
==================== Step 3: Recursive Bisection ====================
Cluster items at Step 3: [['IYW'], ['VGT', 'IYF'], ['IYR'], ['IYK', 'XLU']]
---- Processing clusters ----
First cluster: Index(['VGT'], dtype='object', name='Ticker')
Second cluster: Index(['IYF'], dtype='object', name='Ticker')
Covariance matrix slice for cluster ['VGT']:
Ticker VGT
Ticker
VGT 0.000306
IVP weights for cluster: [1.]
Cluster variance for ['VGT']: 0.00030558216038170134
Covariance matrix slice for cluster ['IYF']:
Ticker IYF
Ticker
IYF 0.000249
IVP weights for cluster: [1.]
Cluster variance for ['IYF']: 0.0002485128719824892
Weighting factor alpha for ['VGT']: 0.4485022558714241
Weighting factor 1 - alpha for ['IYF']: 0.5514977441285759
Updated weights after adjustment:
Ticker
IYW 0.158871
VGT 0.098958
IYF 0.121683
IYR 0.223413
IYK 0.397075
XLU 0.397075
dtype: float64
---- Processing clusters ----
First cluster: Index(['IYK'], dtype='object', name='Ticker')
Second cluster: Index(['XLU'], dtype='object', name='Ticker')
Covariance matrix slice for cluster ['IYK']:
Ticker IYK
Ticker
IYK 0.000134
IVP weights for cluster: [1.]
Cluster variance for ['IYK']: 0.00013439156229630292
Covariance matrix slice for cluster ['XLU']:
Ticker XLU
Ticker
XLU 0.000214
IVP weights for cluster: [1.]
Cluster variance for ['XLU']: 0.0002135697881330683
Weighting factor alpha for ['IYK']: 0.6137744547477219
Weighting factor 1 - alpha for ['XLU']: 0.38622554525227815
Updated weights after adjustment:
Ticker
IYW 0.158871
VGT 0.098958
IYF 0.121683
IYR 0.223413
IYK 0.243715
XLU 0.153361
dtype: float64
==================== Step 4: Recursive Bisection ====================
Cluster items at Step 4: [['VGT'], ['IYF'], ['IYK'], ['XLU']]
hrp_weights.sort_index()
Ticker
IYF 0.121683
IYK 0.243715
IYR 0.223413
IYW 0.158871
VGT 0.098958
XLU 0.153361
dtype: float64
Verify Results with PyportfolioOpt
The PyportfolioOpt package provides a convenient implementation of the HRP algorithm. We can compare the results obtained from our manual implementation with the package’s implementation to verify the correctness of our approach.
hrp = ppo.hierarchical_portfolio.HRPOpt(
cov_matrix=shrinkage_covariance_matrix,
)
hrp.optimize(linkage_method="ward")
OrderedDict([('IYF', 0.12168265813576948), ('IYK', 0.2437145973588905), ('IYR', 0.22341326988520196), ('IYW', 0.15887120267426943), ('VGT', 0.0989576970265908), ('XLU', 0.15336057491927785)])
Evaluation Strategy
To assess the out-of-sample performance of the HRP optimal allocation strategy, a backtest can be employed. Backtesting is a key technique for evaluating an investment strategy’s performance using historical data. It allows for an assessment of how reliably the strategy could be replicated in real-world scenarios, while also providing valuable insights into its associated risk-return profile. To this end, Joubert et al., 2024 outlines three primary backtesting methods:
Backtesting Method | Description | Advantages | Limitations |
---|---|---|---|
Walk-Forward Backtest | Historical data is split into multiple segments. The strategy is trained on one segment and tested on the next. | Easy to implement. | Tests only a single path, risking overfitting. Past performance may not predict future results (e.g., Nvidia’s growth since COVID). |
Resampling | A statistical technique that creates multiple samples from historical data to assess performance. This approach yields a distribution of performance metrics, including: - Bootstrap: Draws samples with replacement from historical data, enabling performance estimation over many iterations. - Cross-Validation: Splits the data into multiple training and testing sets, providing multiple assessments of the strategy’s performance. | Provides more robust analysis by generating multiple scenarios. | Does not create new data, but resamples existing data, which may miss unique future events. |
Monte Carlo Simulation | Generates synthetic data with properties similar to historical data using an understanding of the data generation process. | Yields performance estimates indicative of future outcomes (assuming correct and stable data processes). | Requires accurate modeling of the data generation process, which is difficult for volatile financial markets. |
For evaluating the HRP portfolio allocation strategy and optimizing the hierarchical clustering linkage method, we will use time-series bootstraps tools from the arch package.
Time-Series Bootstrapping
The arch.bootstrap.StationaryBootstrap class implements the method developed by Politis and Romano (1994). This method addresses the limitations of traditional bootstrapping techniques when applied to time-series data, aiming to preserve crucial time-dependent structures of the data while generating resampled datasets.
Let \(X_1, X_2, \dots, X_T\) represent the original time series. To generate a resampled time series using block bootstrapping, define a block \(B_{i,b}\) as consisting of \(b\) consecutive observations starting at \(X_i\):
\[\begin{align*} B_{i,b} = \{X_i, X_{i+1}, \dots, X_{i+b-1}\} \end{align*}\]
Periodic Boundry Conditions
If \(i + b - 1 > T\) (i.e., the block extends beyond the end of the time series), periodic boundary conditions are applied. This means that the series wraps around, treating the first observations in the series as if they come after the last. In other words, for any index \(j > T\), the observation is taken as:
\[\begin{align*} X_j = X_{(j \mod T)} \quad \text{for} \quad j > T, \quad \text{where} \, X_0 = X_T. \end{align*}\]
The expression \(j \mod T\) calculates the remainder when \(j\) is divided by \(T\), effectively cycling the indices back to the beginning of the series. For example, if \(T = 10\), and \(X_{12}\) is requested, it would be equivalent to sampling \(X_2\).
Block Selection
The starting indices of the blocks are determined by a sequence of independent and identically distributed (iid) random variables \(I_1, I_2, \dots\), drawn uniformly from \(\{1, \dots, T\}\). The lengths of the blocks are governed by another sequence of iid random variables \(L_1, L_2, \dots\), where each \(L_i\) follows a geometric distribution with parameter \(p \in [0,1]\), i.e.,
\[\begin{align*} P(L_i = m) = (1 - p)^{m-1} p, \quad m = 1, 2, \dots. \end{align*}\]
Resampling
To construct the resampled series \(X_1^*, X_2^*, \dots, X_T^*\):
- Select the first block \(B_{I_1, L_1}\) and copy \(L_1\) consecutive observations starting at \(X_{I_1}\) into the resampled series.
- Select the next block \(B_{I_2, L_2}\) and append \(L_2\) consecutive observations starting at \(X_{I_2}\), continuing this process.
- Repeat the block selection and copying processes until a total of \(T\) observations have been generated.
It is possible for this procedure to result in overlapping blocks when consecutive starting indices \(I_1, I_2, \dots\) are close together, leading to the reuse of some observations. The process terminates once \(T\) observations are resampled, but it can be extended to generate longer series if desired.
Implementation
The following script implements the time-series bootstrapping:
import warnings
from argparse import ArgumentParser
from datetime import datetime, timedelta
from multiprocessing import Pool, cpu_count
from pathlib import Path
from typing import List
import numpy as np
import pandas as pd
import yfinance as yf
from arch.bootstrap import StationaryBootstrap, optimal_block_length
from pypfopt.hierarchical_portfolio import HRPOpt
from sklearn.covariance import ledoit_wolf
# Define the tickers for the assets in the portfolio
tech = np.array(["IYW", "VGT"])
real_estate = np.array(["IYR"])
bank_finance = np.array(["IYF"])
utilities = np.array(["XLU"])
consumer_staples = np.array(["IYK"])
tickers = np.concatenate(
[tech, real_estate, bank_finance, utilities, consumer_staples], axis=0
)
# Download the daily adjusted closing prices of the assets
start_date = (datetime.now() - timedelta(days=365 * 5)).strftime("%Y-%m-%d")
daily_prices = yf.download(", ".join(tickers), start=start_date)["Adj Close"]
# Calculate the daily returns of the assets
daily_returns = daily_prices.pct_change().dropna()
frequency = 252
# Typical savings account
annual_risk_free_rate = 0.02
daily_risk_free_rate = (1 + annual_risk_free_rate) ** (1 / frequency) - 1
pd.set_option("mode.copy_on_write", True)
def compute_optimal_block_length(returns: pd.DataFrame) -> float:
"""
Compute the optimal block length for the stationary bootstrap.
The optimal block length is computed for the squared returns since the
autocorrelation in the squares tends to be stronger than in the returns
themselves. This approach helps capture more of the dependence in the
returns for the bootstrap.
Parameters
----------
returns : pd.DataFrame
Daily returns of the assets.
Returns
-------
float
The average optimal block length for the stationary bootstrap.
"""
block_lengths = optimal_block_length(returns**2)
avg_block_length = block_lengths.loc[:, "stationary"].mean()
return avg_block_length
def compute_shrinkage_covariance(returns: pd.DataFrame) -> pd.DataFrame:
"""
Calculate the Ledoit-Wolf shrunk covariance matrix of the daily returns.
Parameters
----------
returns : pd.DataFrame
Daily returns of the assets.
Returns
-------
pd.DataFrame
The shrunk covariance matrix of the daily returns.
"""
shrinkage_cov = pd.DataFrame(
ledoit_wolf(returns, assume_centered=False)[0],
index=returns.columns,
columns=returns.columns,
)
return shrinkage_cov
def compute_performance_metrics(
returns: pd.DataFrame, linkage_method: str
) -> np.ndarray:
"""
Calculate the expected return, volatility, and Sharpe ratio of the portfolio.
Parameters
----------
returns : pd.DataFrame
Daily returns of the assets.
linkage_method : str
The method used for hierarchical clustering.
Possible values: 'centroid', 'weighted', 'complete', 'average', 'ward'.
Returns
-------
np.ndarray
An array containing the annualized expected return, volatility, and Sharpe ratio.
"""
shrinkage_cov = compute_shrinkage_covariance(returns)
hrp = HRPOpt(returns=returns, cov_matrix=shrinkage_cov)
with warnings.catch_warnings():
warnings.simplefilter("ignore", category=FutureWarning)
hrp.optimize(linkage_method=linkage_method)
expected_return, volatility, sharpe_ratio = hrp.portfolio_performance(
risk_free_rate=daily_risk_free_rate,
frequency=frequency,
)
return np.array([expected_return, volatility, sharpe_ratio])
def bootstrap(seed_value: int, linkage_method: str) -> pd.DataFrame:
"""
Perform stationary bootstrapping to compute portfolio performance metrics.
Parameters
----------
seed_value : int
Seed for the random number generator to ensure reproducibility.
linkage_method : str
The method used for hierarchical clustering.
Possible values: 'centroid', 'weighted', 'complete', 'average', 'ward'.
Returns
-------
pd.DataFrame
A DataFrame containing the annualized expected return, volatility, Sharpe ratio,
and the linkage method used for clustering.
"""
rs = np.random.default_rng(seed_value)
avg_block_length = compute_optimal_block_length(daily_returns)
bs = StationaryBootstrap(
block_size=avg_block_length,
returns=daily_returns,
seed=rs,
)
results_array = bs.apply(
func=compute_performance_metrics,
reps=5000,
extra_kwargs={"linkage_method": linkage_method},
)
performance_metrics = pd.DataFrame(
results_array,
columns=[
"annualized_expected_return",
"annualized_volatility",
"annualized_sharpe_ratio",
],
)
performance_metrics["linkage_method"] = linkage_method
return performance_metrics
def main() -> int:
"""
Main function to run stationary bootstrapping on portfolio returns
using hierarchical clustering with various linkage methods.
This function parses command line arguments to specify the output directory,
performs bootstrapping using parallel processing, and saves the results.
Returns
-------
int
Returns 0 upon successful completion.
"""
parser = ArgumentParser()
parser.add_argument("--output_dir", type=str, default=Path.cwd())
args, _ = parser.parse_known_args()
linkage_methods = ["centroid", "weighted", "complete", "average", "ward"]
n_cores = min(cpu_count() - 1, len(linkage_methods))
# Generate unique seeds for each process
seeds = np.random.randint(0, 1e6, size=len(linkage_methods))
with Pool(n_cores) as pool:
# Run bootstrap for each linkage method with a unique seed
simulated_performance: List[pd.DataFrame] = pool.starmap(
bootstrap, zip(seeds, linkage_methods)
)
final_results = pd.concat(simulated_performance, ignore_index=True)
final_results.to_csv(
Path(args.output_dir) / "bootstrapped_performance_metrics.csv", index=False
)
return 0
if __name__ == "__main__":
main()
Data Collection:
- Downloads daily adjusted closing prices for the specified ETFs
- Computes the daily returns time series.
Risk-Free Rate: Converts a \(2\%\) annual risk-free rate to a daily rate to match daily returns frequency, which is typically \(252\) for trading days.
\[\begin{align*} \text{daily risk-free rate} = (1 + \text{annual risk-free rate})^{\frac{1}{\text{frequency}}} - 1 \end{align*}\]
Block Length Calculation: Computes optimal block length for StationaryBootstrap using squared returns to capture dependencies. See docstring for details.
Ledoit-Wolf Covariance: Estimates the covariance matrix using the Ledoit-Wolf shrinkage method.
Bootstrapping: Runs StationaryBootstrap with 5000 replications for each linkage method to assess portfolio performance.
Performance Metrics: Computes annualized expected return, volatility, and Sharpe ratio from bootstrapped samples.
The script uses parallel processing to run bootstrapping for each linkage method with a unique seed in an separate process. The results are collected and saved to a CSV file for further analysis.
$ python3 bootstrap_backtest.py --output_dir /path/to/output/bootstrapped_performance_metrics.csv
Analysis
The bootstrapped performance metrics are stored in long format, with the linkage_method
column indicating the hierarchical clustering method used for each set of bootstrapped samples. Each bootstrapped sample contains the annualized expected return, volatility, and Sharpe ratio, computed over a resampled set of all \(T\) observations from the original time series.
performance_metrics = pd.read_csv("bootstrapped_performance_metrics.csv")
annualized_expected_return | annualized_volatility | annualized_sharpe_ratio | linkage_method |
---|---|---|---|
0.1090801 | 0.1988651 | 0.5481177 | centroid |
0.2660785 | 0.2277011 | 1.1681979 | centroid |
0.0362587 | 0.1902725 | 0.1901487 | centroid |
0.2569973 | 0.1656867 | 1.5506298 | centroid |
0.1183425 | 0.2509082 | 0.4713436 | centroid |
Bootstrapped Distributions
def plot_performance_metrics(performance_metrics: pd.DataFrame, metric: str) -> None:
"""
Plot the performance metrics of the Hierarchical Risk Parity (HRP) portfolio.
Parameters
----------
performance_metrics : pd.DataFrame
Performance metrics of the HRP portfolio.
metric : str
The performance metric to plot.
"""
metric_title = metric.replace("_", " ").title()
median_values = performance_metrics.groupby("linkage_method")[metric].median().reset_index()
fig = px.violin(
performance_metrics,
x="linkage_method",
y=metric,
color="linkage_method",
title=f"Bootstrapped {metric_title}s of Hierarchical Risk Parity Portfolios",
labels={"metric": "Performance Metric", metric: "Value"},
);
for i, row in median_values.iterrows():
value = row[metric]
if metric in ["annualized_expected_return", "annualized_volatility"]:
value = value * 100
formated_value = f"{value:.4f}%"
else:
formated_value = f"{value:.4f}"
fig.add_annotation(
x=row['linkage_method'],
y=row[metric],
text=f"Median:<br>{formated_value}",
showarrow=True,
arrowhead=2,
ax=0,
ay=-40 # Offset to place the label above the plot
);
fig.update_layout(
xaxis_title="Linkage Method",
yaxis_title=metric_title,
legend_title="Linkage Method",
legend=dict(
orientation="h", # Horizontal legend
yanchor="bottom", # Align the legend to the bottom of its position
y=1.02, # Place it slightly above the plot
xanchor="center", # Center the legend
x=0.5 # Place it at the middle horizontally
),
autosize=True,
);
fig.show()
Annualized Expected Returns
plot_performance_metrics(performance_metrics, "annualized_expected_return")
Annualized Volatility
plot_performance_metrics(performance_metrics, "annualized_volatility")
Annualized Sharpe Ratios
plot_performance_metrics(performance_metrics, "annualized_sharpe_ratio", True)
Bootstrap Confidence Intervals
From the bootstrapped distributions, we can calculate the confidence intervals for the performance metrics using the quantiles directly from the bootstrapped samples.
def bootstrapped_confidence_intervals(performance_metrics: pd.DataFrame, metric: str, alpha: float = 0.05) -> pd.DataFrame:
"""
Calculate the (1 - alpha) confidence intervals for the performance metrics.
Parameters
----------
performance_metrics : pd.DataFrame
Performance metrics of the HRP portfolio.
metric : str
The performance metric for which to calculate the confidence intervals.
alpha : float, default 0.05
The significance level for the confidence interval.
Returns
-------
pd.DataFrame
A DataFrame containing the (1 - alpha) confidence intervals for each performance metric.
"""
lower_percentile = 100 * (alpha / 2)
upper_percentile = 100 * (1 - (alpha / 2))
def compute_ci(series: np.ndarray, is_lower: bool) -> float:
percentile = lower_percentile if is_lower else upper_percentile
bound = np.percentile(series, percentile)
if metric in ["annualized_expected_return", "annualized_volatility"]:
return bound * 100
return bound
confidence_intervals = performance_metrics.groupby("linkage_method")[metric].agg(
lower_bound=lambda x: compute_ci(x, True),
upper_bound=lambda x: compute_ci(x, False),
)
return confidence_intervals
Annualized Expected Returns
annualized_expected_return_ci = bootstrapped_confidence_intervals(performance_metrics, "annualized_expected_return", 0.05)
lower_bound | upper_bound | |
---|---|---|
average | 2.886640 | 29.25455 |
centroid | 2.974364 | 28.77622 |
complete | 2.457101 | 29.09965 |
ward | 1.834686 | 28.60933 |
weighted | 2.408667 | 29.00987 |
Annualized Volatility
annualized_volatility_ci = bootstrapped_confidence_intervals(performance_metrics, "annualized_volatility", 0.05)
lower_bound | upper_bound | |
---|---|---|
average | 13.00005 | 27.65951 |
centroid | 12.98310 | 27.83992 |
complete | 12.97197 | 27.72584 |
ward | 13.31101 | 28.30012 |
weighted | 13.05811 | 27.81955 |
Annualized Sharpe Ratios
annualized_sharpe_ratio_ci = bootstrapped_confidence_intervals(performance_metrics, "annualized_sharpe_ratio", 0.05)
lower_bound | upper_bound | |
---|---|---|
average | 0.1318767 | 1.849346 |
centroid | 0.1367354 | 1.804591 |
complete | 0.1084253 | 1.815125 |
ward | 0.0767051 | 1.746462 |
weighted | 0.1086801 | 1.814162 |
Projected Growth
def plot_investment_growth_with_compounding(performance_metrics: pd.DataFrame, initial_investment: float, years: int) -> None:
"""
Plot the projected investment growth over a holding period by linkage method, using compounding with median annualized expected returns.
Parameters
----------
performance_metrics : pd.DataFrame
Performance metrics of the HRP portfolio.
initial_investment : float
The initial amount of the investment.
years : int
The holding period in years.
"""
median_values = performance_metrics.groupby("linkage_method")["annualized_expected_return"].median().reset_index()
growth_data = pd.DataFrame({'year': range(1, years + 1)})
# Calculate the investment growth for each linkage method for each year, incorporating compounding
for _, row in median_values.iterrows():
growth_data[row['linkage_method']] = initial_investment * ((1 + row['annualized_expected_return']) ** growth_data['year'])
# Reshape from wide to long format for plotting
growth_data_melted = growth_data.melt(id_vars=['year'], var_name='linkage_method', value_name='investment_value')
fig = px.line(
growth_data_melted,
x='year',
y='investment_value',
color='linkage_method',
title=f'Projected Growth Over {years} Years (Initial Investment: ${initial_investment:,.2f})',
labels={'investment_value': 'Investment Value ($)', 'year': 'Year'}
);
fig.update_layout(
xaxis_title="Year",
yaxis_title="Investment Value ($)",
legend_title="Linkage Method",
autosize=True,
legend=dict(
orientation="h",
yanchor="bottom",
y=1.02,
xanchor="center",
x=0.5
)
);
fig.show()
Plot the projected investment growth over a \(5\)-year period with an initial investment of \(\$10,000\):
plot_investment_growth_with_compounding(performance_metrics, initial_investment=10000, years=5)
Risk-Return Profile
def plot_risk_return_profile(performance_metrics: pd.DataFrame) -> None:
"""
Plot the risk-return profile (annualized expected return vs. annualized volatility) for each linkage method.
Parameters
----------
performance_metrics : pd.DataFrame
Performance metrics of the HRP portfolio.
"""
linkage_methods = performance_metrics['linkage_method'].unique()
n_cols = 2 # Number of columns in the grid
n_rows = (len(linkage_methods) + 1) // n_cols
fig = make_subplots(
rows=n_rows, cols=n_cols,
subplot_titles=linkage_methods,
shared_xaxes=False, shared_yaxes=True,
x_title="Annualized Volatility (Risk)", y_title="Annualized Expected Return"
)
# Find the min and max values of the Sharpe ratio for the color scale
sharpe_min = performance_metrics['annualized_sharpe_ratio'].min()
sharpe_max = performance_metrics['annualized_sharpe_ratio'].max()
for i, method in enumerate(linkage_methods):
row = i // n_cols + 1
col = i % n_cols + 1
filtered_data = performance_metrics[performance_metrics['linkage_method'] == method]
# Add scatter plot trace with shared color scale
fig.add_trace(
go.Scatter(
x=filtered_data['annualized_volatility'],
y=filtered_data['annualized_expected_return'],
mode='markers',
name=method.capitalize(),
hovertext=filtered_data['annualized_sharpe_ratio'],
marker=dict(
size=8,
color=filtered_data['annualized_sharpe_ratio'], # Color based on Sharpe ratio
cmin=sharpe_min, # Set the same min for all subplots
cmax=sharpe_max, # Set the same max for all subplots
colorscale="Viridis",
showscale=(i == 0), # Only show the scale on the first subplot
colorbar=dict(title="Sharpe Ratio") if i == 0 else None, # Add color bar on the first subplot
),
),
row=row, col=col
)
fig.update_layout(
title="Risk-Return Profile by Linkage Method",
showlegend=False,
autosize=True,
height=1000,
)
fig.show()
plot_risk_return_profile(performance_metrics)
Optimal Linkage Method
We will select the linkage method that yields the highest median annualized Sharpe ratio from the bootstrapped distribution.
optimal_linkage_method = performance_metrics.groupby("linkage_method")["annualized_sharpe_ratio"].median().idxmax()
hrp = ppo.hierarchical_portfolio.HRPOpt(returns=daily_returns, cov_matrix=shrinkage_covariance_matrix)
with warnings.catch_warnings():
warnings.simplefilter("ignore", category=FutureWarning)
hrp.optimize(linkage_method=optimal_linkage_method)
OrderedDict([('IYF', 0.10382192445572266), ('IYK', 0.3542205029126607), ('IYR', 0.10424052379931392), ('IYW', 0.1600118160237878), ('VGT', 0.11374570623238564), ('XLU', 0.1639595265761292)])
Allocation Strategy
The optimal portfolio weights derived from the HRP algorithm can now be used to construct the final asset allocation. The pypfopt.discrete_allocation.DiscreteAllocation class provides several methods to convert continuous weights into discrete share quantities. One such method treats this conversion as an integer programming problem formulated as follows:
\[\begin{align*} \underset{x \in \mathbb{Z}^n}{\operatorname{minimise}} & \quad r + \| w T - x \odot p \|_1 \\ \text{subject to} & \quad r + x \cdot p = T \end{align*}\]
Where:
- \(T \in \mathbb{R}\) is the total dollar amount to be allocated.
- \(p \in \mathbb{R}^n\) is the vector of the latest prices for each asset.
- \(w \in \mathbb{R}^n\) is the vector of target portfolio weights for each asset.
- \(x \in \mathbb{Z}^n\) is the integer allocation, representing the number of units of each asset to be purchased.
- \(r \in \mathbb{R}\) is the remaining unallocated dollar amount, defined as \(r = T - x \cdot p\).
- \(\odot\) denotes element-wise (Hadamard) multiplication.
- \(\| \cdot \|_1\) represents the \(L_1\) norm, i.e., the sum of the absolute values of the elements in the vector.
The goal is to minimize the remaining unallocated value \(r\) while also minimizing the difference between the target dollar allocation, \(wT\), and the actual allocation, \(x \odot p\).
total_portfolio_value = 10000
discrete_allocation = ppo.discrete_allocation.DiscreteAllocation(
weights=hrp.clean_weights(),
latest_prices=daily_prices.iloc[-1, :],
total_portfolio_value=total_portfolio_value,
)
allocations, fund_remaning = discrete_allocation.lp_portfolio(verbose=True, solver=cp.CPLEX)
Funds remaining: 21.34
IYF: allocated 0.099, desired 0.104
IYK: allocated 0.358, desired 0.354
IYR: allocated 0.102, desired 0.104
IYW: allocated 0.156, desired 0.160
VGT: allocated 0.121, desired 0.114
XLU: allocated 0.164, desired 0.164
Allocation has RMSE: 0.004
allocations = pd.DataFrame.from_dict(allocations, orient="index", columns=["Shares"])
allocations
Shares
IYF 9
IYK 51
IYR 10
IYW 10
VGT 2
XLU 20
To see a comprehensive list of available solvers, refer to the CVXPY solver documentation.
References
De Prado, M. L. (2016). Building diversified portfolios that outperform out-of-sample. SSRN Electronic Journal. https://papers.ssrn.com/sol3/papers.cfm?abstract_id=2708678
Ledoit, O., & Wolf, M. (2003). Honey, I shrunk the sample covariance matrix. UPF Economics and Business Working Paper No. 691. https://doi.org/10.2139/ssrn.4897573
Politis, D. N., & Romano, J. P. (1994). The stationary bootstrap. Journal of the American Statistical Association, 89(428), 1303-1313. https://doi.org/10.1080/01621459.1994.10476870
Joubert, J., Sestovic, D., Barziy, I., Distaso, W., & Lopez de Prado, M. (2024). The three types of backtests. SSRN Electronic Journal. https://doi.org/10.2139/ssrn.4897573
De Prado, M. L. (2018). Advances in Financial Machine Learning. John Wiley & Sons. https://www.wiley.com/en-fr/Advances+in+Financial+Machine+Learning-p-9781119482086