Portfolio Optimization with Python: Hierarchical Risk Parity

Sunday, 20 Oct 2024 Python

In this post, we will delve into the Hierarchical Risk Parity (HRP) algorithm, and demonstrate how it can be applied to optimize an ETF-based portfolio. HRP is a relatively recent development, as compared to Markowitz’s mean-variance framework, in portfolio management research that leverages hierarchical clustering to allocate weights based on the correlation structure between the assets.

Tools From The Python Ecosystem

We will be using the following libraries and packages:

[project]
name = "hierachical-risk-parity"
requires-python = ">=3.11"
dependencies = [
    "arch>=7.1.0",
    "cplex>=22.1.1.2",
    "matplotlib>=3.9.2",
    "pandas[performance]>=2.2.3",
    "plotly>=5.24.1",
    "pyportfolioopt>=1.5.5",
    "scikit-learn>=1.5.2",
    "scipy>=1.14.1",
    "yfinance>=0.2.44",
]

[tool.uv]
package = false
dev-dependencies = [
    "ipykernel>=6.29.5",
    "ipywidgets>=8.1.5",
    "isort>=5.13.2",
    "nbformat>=5.10.4",
    "ruff>=0.7.0",
]

import warnings
from datetime import datetime, timedelta
from typing import List, Optional, Tuple, Union

import cvxpy as cp
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import plotly.express as px
import plotly.figure_factory as ff
import plotly.graph_objects as go
import pypfopt as ppo
import yfinance as yf
from plotly.subplots import make_subplots
from scipy.cluster.hierarchy import dendrogram, linkage
from scipy.spatial.distance import squareform
from sklearn.covariance import ledoit_wolf

All-ETF Portfolio

The underlying investment vehicle for this portfolio optimization is Exchange-Traded Funds (ETFs). ETFs are a popular investment vehicle offering non-institutional investors diversification, liquidity, and cost-effectiveness. For this analysis, we will focus on the following equity ETFs with an emphasis on technology, financial services, real estate, utilities, and consumer staples.

ETF Selection

Technology ETFs

iShares U.S. Technology ETF (IYW): This ETF provides exposure to U.S. companies involved in technology, including hardware, software, and IT services.
Vanguard Information Technology Index Fund ETF Shares (VGT): This ETF tracks the performance of the MSCI US Investable Market Information Technology Index, offering broad exposure to the technology sector.

Financial Services and Banking ETFs

iShares U.S. Financials ETF (IYF): This ETF provides exposure to U.S. companies in the financial services sector, including banks, investment firms, and insurance companies.

Real Estate ETFs

iShares U.S. Real Estate ETF (IYR): This ETF tracks the Dow Jones U.S. Real Estate Capped Index, which measures the performance of the real estate sector of the U.S. equity market, as defined by the index provider.

Utilities ETFs

The Utilities Select Sector SPDR Fund (XLU): This ETF provides exposure to companies in the utilities sector, including electricity, gas, and water utilities.

Consumer Staples ETFs

iShares U.S. Consumer Staples ETF (IYK): This ETF provides exposure to U.S. companies in the consumer staples sector, including food, beverage, and household products.

Other sectors and categories can be explored by utilizing resources like the ETF Database Categories on Yahoo Finance. One of the most crucial aspects of building a portfolio is aligning it with our personal convictions about the sectors and industries we are investing in; it’s about having a deep, long-term belief in the growth potential and sustainability of those industries.

For more insights into the pros and cons of an all-ETF portfolio, consider reading this article from Charles Schwab: 3 Ways to Build an All-ETF Portfolio.

Daily Price Time Series

The daily price time series can be downloaded from Yahoo Finance using the yfinance package:

tech = np.array(["IYW", "VGT"])
real_estate = np.array(["IYR"])
bank_finance = np.array(["IYF"])
utilities = np.array(["XLU"])
consumer_staples = np.array(["IYK"])

tickers = np.concatenate([tech, real_estate, bank_finance, utilities, consumer_staples], axis=0)
tickers.shape

(6,)

The sampling period for this task can be set to the last 5 years:

start_date = (datetime.now() - timedelta(days=365.25 * 5)).strftime("%Y-%m-%d")

daily_prices = yf.download(", ".join(tickers), start=start_date)["Adj Close"]

[                       0%                       ][****************      33%                       ]  2 of 6 completed[**********************50%                       ]  3 of 6 completed[**********************67%*******                ]  4 of 6 completed[**********************83%***************        ]  5 of 6 completed[*********************100%***********************]  6 of 6 completed

dates	IYF	IYK	IYR	IYW	VGT	XLU
2019-10-20	60.01811	37.86379	83.19051	50.69644	209.5335	54.65551
2019-10-21	59.63262	37.88772	82.92007	50.16434	206.6947	54.88584
2019-10-22	59.80701	37.98643	83.03347	50.28638	206.9057	55.10764
2019-10-23	59.89420	38.08514	82.92878	50.78919	210.0705	55.29531
2019-10-24	60.00435	38.04027	82.08258	51.40185	212.6024	54.73229

Quick inspection for potential data quality issues:

Missing Data: Unequal time series or API issues can lead to gaps in the dataset. Run the following to check for any missing values in the data:

daily_prices.isna().sum(axis=0)

Ticker
IYF    0
IYK    0
IYR    0
IYW    0
VGT    0
XLU    0
dtype: int64

Zero Values: Abnormal zero values can also indicate data problems, potentially due to API issues. You can identify columns with zero values using:

(daily_prices == 0).sum(axis=0)

Ticker
IYF    0
IYK    0
IYR    0
IYW    0
VGT    0
XLU    0
dtype: int64

Time Series Plots

fig = make_subplots(rows=3, cols=2, subplot_titles=tickers)

for i, ticker in enumerate(tickers):
    row = i // 2 + 1
    col = i % 2 + 1
    fig.add_trace(
        go.Scatter(x=daily_prices.index, y=daily_prices.loc[:, ticker], name=ticker),
        row=row,
        col=col,
    );

fig.update_layout(height=1000, autosize=True, title_text="Daily Prices");

fig.show()

Daily Returns

To calculate the daily returns:

\[\begin{equation} \text{Daily Return}_{i, t} = r_{i, t} = \frac{\text{price}_{i, t} - \text{price}_{i, t-1}}{\text{price}_{i, t-1}} \end{equation}\]

where

$\text{price}_{i, t}$ is the price on day $t$ for asset $i$
$\text{price}_{i, t-1}$ is the price on day $t-1$ for asset $i$

The following time series will be used as inputs for the portfolio optimization algorithm:

daily_returns = daily_prices.pct_change().dropna(how="all")

dates	IYF	IYK	IYR	IYW	VGT	XLU
2019-10-21	-0.0064229	0.0006320	-0.0032508	-0.0104957	-0.0135482	0.0042142
2019-10-22	0.0029245	0.0026052	0.0013676	0.0024327	0.0010209	0.0040411
2019-10-23	0.0014578	0.0025985	-0.0012608	0.0099990	0.0152961	0.0034054
2019-10-24	0.0018392	-0.0011781	-0.0102039	0.0120628	0.0120526	-0.0101820
2019-10-27	0.0032885	-0.0029879	-0.0054204	0.0136285	0.0114581	-0.0137160

Annualized Mean Returns

The annualized (geometric) mean returns for each asset $i$ over the sampling period can be calculated as:

\[\begin{equation} \text{Annualized Mean Return}_{i} = \bar{r}_{i} = \left( \prod_{t=1}^{T_i} (1 + r_{i,t}) \right)^{\frac{\text{frequency}}{T_i}} - 1 \end{equation}\]

where:

$r_{i,t}$ is the daily return for asset $i$ on day $t$
$T_i$ is the total number of trading days observed for asset $i$ over the sampling period
$\text{frequency}$ is the number of time periods in a year (typically $252$ for daily returns)

This formula computes the geometric mean of daily returns for each asset $i$, and annualizes the returns by raising the product of returns to the power of the ratio $\frac{\text{frequency}}{T_i}$, where $T_i$ accounts for the (potentially) varying number of observed trading days for each asset.

annualized_mean_returns = (1 + daily_returns).prod() ** (
    252 / daily_returns.count()
) - 1

annualized_mean_returns

Ticker
IYF    0.128584
IYK    0.131330
IYR    0.041589
IYW    0.251454
VGT    0.236268
XLU    0.084593
dtype: float64

conditional_colors = ["green" if value > 0 else "red" for value in annualized_mean_returns]

fig, ax = plt.subplots();

bars = ax.bar(
    annualized_mean_returns.index, 
    annualized_mean_returns * 100,  # Convert to percentage
    color=conditional_colors
);

for bar in bars:
    height = bar.get_height()
    ax.text(
        bar.get_x() + bar.get_width() / 2, 
        height + np.sign(height) * 0.1,  # Add spacing to the value
        f'{height:.2f}%', 
        ha='center', va='bottom' if height > 0 else 'top',  # Adjust placement for negative values
        fontsize=10
    );

title_text = f"Annualized Mean Returns (%)\nData from {start_date} to {daily_prices.index[-1].strftime('%Y-%m-%d')}"
ax.set_title(title_text, fontsize=14);
plt.xticks(rotation=45, ha='right');
plt.tight_layout();
plt.show()

Shrinkage Covariance Matrix

An essential component of portfolio optimization is the estimator of the covariance matrix, which quantifies the correlations between the daily returns series. While the sample covariance matrix is a standard, unbiased estimator, it often contains significant estimation errors in practice.

To address this issue, Ledoit and Wolf (2004) propose a shrinkage approach, which combines the sample covariance matrix with a more structured and stable estimator, such as the constant correlation model. The shrinkage process adjusts extreme covariance values in the noisy data toward a central target, effectively reducing the estimation error.

Shrinkage Target

Let the returns for asset $i$ on day $t$ be denoted by $X_{i, t}$, where $1 \leq i \leq N$ represents the index of assets (i.e., number of assets) and $1 \leq t \leq T$ represents the time index (i.e., number of trading days over the sample period). Let the following matrices be defined:

$\Sigma$ represents the true population covariance matrix, where the element $\sigma_{i j}$ represents the covariance between asset $i$ and asset $j$.
$S$ represents the sample covariance matrix, where the element $s_{i j}$ represents the sample covariance between asset $i$ and asset $j$.

Population and Sample Correlations

The population correlation $\varrho_{i j}$ between asset $i$ and asset $j$ is given by:

\[\begin{align*} \varrho_{i j} = \frac{\sigma_{i j}}{\sqrt{\sigma_{i i} \sigma_{j j}}} \end{align*}\]

Similarly, the sample correlation $r_{i j}$ is defined as:

\[\begin{align*} r_{i j} = \frac{s_{i j}}{\sqrt{s_{i i} s_{j j}}} \end{align*}\]

Average Population and Sample Correlations

From $\varrho_{i j}$ and $r_{i j}$, the average population correlation $\bar{\varrho}$ and the average sample correlation $\bar{r}$ are computed as follows:

\[\begin{align*} \bar{\varrho} = \frac{2}{(N-1) N} \sum_{i=1}^{N-1} \sum_{j=i+1}^N \varrho_{i j} \end{align*}\]

\[\begin{align*} \bar{r} = \frac{2}{(N-1) N} \sum_{i=1}^{N-1} \sum_{j=i+1}^N r_{i j} \end{align*}\]

Constant Correlation Matrix

The shrinkage target is the constant correlation matrix, constructed based on the average correlations $\bar{\varrho}$ and $\bar{r}$:

Population Constant Correlation Matrix $\Phi$: This matrix is based on the average population correlation $\bar{\varrho}$:
\[\begin{align*} \phi_{i i} = \sigma_{i i}, \quad \phi_{i j} = \bar{\varrho} \sqrt{\sigma_{i i} \sigma_{j j}} \end{align*}\]
Sample Constant Correlation Matrix $F$: This matrix is based on the average sample correlation $\bar{r}$:
\[\begin{align*} f_{i i} = s_{i i}, \quad f_{i j} = \bar{r} \sqrt{s_{i i} s_{j j}} \end{align*}\]

The sample constant correlation matrix $F$ serves as a shrinkage target for the covariance matrix estimation. It represents a balance between structured stability (constant correlations) and data-driven flexibility (sample covariances), reducing estimation error and improving the robustness of any downstream portfolio optimization procedures.

Note: The reference paper highlights that the constant correlation model may be unsuitable when assets belong to different classes, such as stocks and bonds. However, in our case, since all the ETFs selected hav same underlying asset class, i.e., equities, the constant correlation model is a reasonable and appropriate choice for estimating the covariance matrix.

Computing the Shrinkage Covariance Matrix

The Ledoit-Wolf shrinkage estimator is implemented in the scikit-learn library.

Formally, let $\hat{\Sigma}$ represent the shrinkage covariance matrix, where the element $\hat{\sigma}_{ij}$ is the covariance between assets $i$ and $j$. The matrix $\hat{\Sigma}$ is of size $N \times N$:

\[\begin{align*} \hat{\Sigma} = \underbrace{ \begin{bmatrix} \hat{\sigma}_{11} & \hat{\sigma}_{12} & \cdots & \hat{\sigma}_{1N} \\ \hat{\sigma}_{21} & \hat{\sigma}_{22} & \cdots & \hat{\sigma}_{2N} \\ \vdots & \vdots & \ddots & \vdots \\ \hat{\sigma}_{N1} & \hat{\sigma}_{N2} & \cdots & \hat{\sigma}_{NN} \end{bmatrix} }_{N \times N} \end{align*}\]

To convert this covariance matrix to the correlation matrix $\hat{R}$, we normalize each element by dividing the covariance between assets $i$ and $j$ by the square root of the product of the variances of assets $i$ and $j$:

\[\begin{align*} \hat{r}_{ij} = \frac{\hat{\sigma}_{ij}}{\sqrt{\hat{\sigma}_{ii} \hat{\sigma}_{jj}}} \end{align*}\]

To compute the values $\hat{r}_{ij}$ efficiently, define $D$ as a diagonal matrix that contains the inverse of the square roots of the diagonal elements of $\hat{\Sigma}$ (i.e., the variances of the assets):

\[\begin{align*} D = \underbrace{ \begin{bmatrix} \frac{1}{\sqrt{\hat{\sigma}_{11}}} & 0 & \cdots & 0 \\ 0 & \frac{1}{\sqrt{\hat{\sigma}_{22}}} & \cdots & 0 \\ \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & \cdots & \frac{1}{\sqrt{\hat{\sigma}_{NN}}} \end{bmatrix} }_{N \times N} \end{align*}\]

The shrinkage correlation matrix $\hat{R}$ is then computed as:

\[\begin{align*} \underbrace{\hat{R}}_{N \times N} = \underbrace{D}_{N \times N} \; \underbrace{\hat{\Sigma}}_{N \times N} \; \underbrace{D}_{N \times N} \end{align*}\]

This yields the following correlation matrix:

\[\begin{align*} \hat{R} = \underbrace{ \begin{bmatrix} 1 & \hat{r}_{12} & \cdots & \hat{r}_{1N} \\ \hat{r}_{21} & 1 & \cdots & \hat{r}_{2N} \\ \vdots & \vdots & \ddots & \vdots \\ \hat{r}_{N1} & \hat{r}_{N2} & \cdots & 1 \end{bmatrix} }_{N \times N} \end{align*}\]

In this matrix, the diagonal elements are $1$, i.e., the correlation of an asset with itself is always $1$.

def to_dataframe(matrix: Union[np.ndarray, pd.DataFrame], columns: pd.Index) -> pd.DataFrame:
    return pd.DataFrame(matrix, index=columns, columns=columns)
  
# Shrinkage covariance matrix
shrinkage_covariance_matrix, _ = ledoit_wolf(
    X=daily_returns,
    assume_centered=False,
)

# Shrinkage correlation matrix
dinv = np.diag(1 / np.sqrt(np.diag(shrinkage_covariance_matrix)))
shrinkage_correlation_matrix = np.dot(dinv, np.dot(shrinkage_covariance_matrix, dinv))

columns = daily_returns.columns
shrinkage_covariance_matrix = to_dataframe(shrinkage_covariance_matrix, columns)
shrinkage_correlation_matrix = to_dataframe(shrinkage_correlation_matrix, columns)

	IYF	IYK	IYR	IYW	VGT	XLU
IYF	0.0002485	0.0001317	0.0001972	0.0001858	0.0001909	0.0001453
IYK	0.0001317	0.0001344	0.0001297	0.0001217	0.0001242	0.0001198
IYR	0.0001972	0.0001297	0.0002475	0.0001739	0.0001776	0.0001721
IYW	0.0001858	0.0001217	0.0001739	0.0003215	0.0003057	0.0001184
VGT	0.0001909	0.0001242	0.0001776	0.0003057	0.0003056	0.0001225
XLU	0.0001453	0.0001198	0.0001721	0.0001184	0.0001225	0.0002136

This can be easily visualized using a heatmap:

fig, ax = plt.subplots();
cax = ax.matshow(shrinkage_correlation_matrix, cmap="coolwarm");
ax.xaxis.set_ticks_position('bottom');
ax.set_xticks(np.arange(len(shrinkage_correlation_matrix.columns)));
ax.set_yticks(np.arange(len(shrinkage_correlation_matrix.columns)));
ax.set_xticklabels(shrinkage_correlation_matrix.columns);
ax.set_yticklabels(shrinkage_correlation_matrix.columns);

for i in range(shrinkage_correlation_matrix.shape[0]):
    for j in range(shrinkage_correlation_matrix.shape[1]):
        # Ensure text can be easily seen
        value = shrinkage_correlation_matrix.iloc[i, j]
        color = "white" if value < 0.5 else "black"
        ax.text(j, i, f"{value:.2f}", ha="center", va="center", color=color);

plt.title("Shrinkage Correlation Matrix");
plt.tight_layout();
plt.show()

Hierarchical Risk Parity (HRP)

The HRP algorithm consists of three key stages:

Clustering: Assets are grouped based on their correlations using hierarchical clustering methods like single, average, or complete linkage.
Quasi-Diagonalization: The covariance matrix is reordered based on the clustering structure, concentrating covariances around the diagonal.
Recursive Bisection: Weights are assigned recursively across clusters in a top-down manner.

The following sections will walk through a numerical example of the HRP algorithm using daily returns data from the previous section.

Clustering

Instead of treating assets independently, HRP groups similar assets based on their correlations. The goal is to cluster assets that are more closely related to each other.

Distance Matrix

The distance between assets is computed as a function of their correlations, with lower correlations indicating greater distances. First, convert the correlation matrix into a distance matrix:

\[\begin{align*} d: \left(X_i, X_j\right) \subset B \rightarrow \mathbb{R}, \quad d_{i,j} = d\left(X_i, X_j\right) = \sqrt{\frac{1}{2}\left(1 - \rho_{i,j}\right)}, \quad d_{i,j} \in [0,1] \end{align*}\]

where:

$d_{i,j}$ represents the distance between asset $X_i$ and asset $X_j$
$\rho_{i,j}$ is the correlation coefficient between the returns of assets $X_i$ and $X_j$

dist_matrix = np.sqrt((1.0 - shrinkage_correlation_matrix.round(8)) / 2.0)
dist_matrix

Ticker       IYF       IYK       IYR       IYW       VGT       XLU
Ticker                                                            
IYF     0.000000  0.373690  0.320059  0.413875  0.391884  0.429825
IYK     0.373690  0.000000  0.379982  0.455236  0.439896  0.382743
IYR     0.320059  0.379982  0.000000  0.437962  0.420914  0.354621
IYW     0.413875  0.455236  0.437962  0.000000  0.110865  0.523615
VGT     0.391884  0.439896  0.420914  0.110865  0.000000  0.510184
XLU     0.429825  0.382743  0.354621  0.523615  0.510184  0.000000

Transform the distance matrix into a condensed form. The condensed form contains only the upper triangular part of the distance matrix (excluding the diagonal), which is what hierarchical clustering requires:

condensed_dist_matrix = squareform(X=dist_matrix, checks=False)
condensed_dist_matrix

array([0.37369036, 0.32005876, 0.41387541, 0.39188418, 0.42982473,
       0.37998172, 0.45523561, 0.43989607, 0.38274266, 0.43796154,
       0.420914  , 0.35462111, 0.11086528, 0.52361505, 0.51018444])

Hierarchical Clustering

A hierarchical clustering algorithm is applied to the condensed distance matrix to group pairs of assets into clusters. The linkage function from scipy is used to perform the clustering.

Note: An important hyperparameter in hierarchical clustering is the linkage method, which determines how the distances between newly formed clusters are calculated. The “appropriate” linkage method depends on the data characteristics, and evaluation mechanisms can help determine the choice. This will be done in the next section.

linkage_matrix = linkage(y=condensed_dist_matrix, method="ward")

linkage_matrix

array([[3.        , 4.        , 0.11086528, 2.        ],
       [0.        , 2.        , 0.32005876, 2.        ],
       [1.        , 5.        , 0.38274266, 2.        ],
       [7.        , 8.        , 0.41569608, 4.        ],
       [6.        , 9.        , 0.62789035, 6.        ]])

Linkage Matrix Structure

The linkage_matrix contains the steps of hierarchical clustering and is structured as follows:

Column	Description
1st	Index of the first cluster being merged. If the value is less than the number of original items, it refers to an original item (i.e., a ticker). Otherwise, it refers to a cluster formed in a previous step.
2nd	Index of the second cluster being merged. Follows the same logic as the first column.
3rd	Distance (or dissimilarity) between the two clusters being merged.
4th	Number of original items in the newly formed cluster after this merge.

The hierarchical clustering merges the original items step by step, forming clusters that are indexed starting from 6 (the first new cluster). The cluster IDs start after the number of original items (6th in zero-based indexing).

linkage_matrix

array([[3.        , 4.        , 0.11086528, 2.        ],
       [0.        , 2.        , 0.32005876, 2.        ],
       [1.        , 5.        , 0.38274266, 2.        ],
       [7.        , 8.        , 0.41569608, 4.        ],
       [6.        , 9.        , 0.62789035, 6.        ]])

Step 1: Tickers 3 and 4 are merged to form Cluster 6 (2 items).
Step 2: Tickers 0 and 2 are merged to form Cluster 7 (2 items).
Step 3: Tickers 1 and 5 are merged to form Cluster 8 (2 items).
Step 4: Clusters 7 (containing tickers 0, 2) and 8 (containing tickers 1, 5) are merged to form Cluster 9 (4 items).
Step 5: Cluster 6 (containing tickers 3, 4) and Cluster 9 are merged to form Cluster 10 (6 items).

In this process, the distance column shows the dissimilarity between the merged clusters. The number of original items reflects how many tickers are included in the newly formed cluster. The output can be visualized via a dendrogram:

fig, ax = plt.subplots();
dendrogram(linkage_matrix, labels=shrinkage_correlation_matrix.columns, orientation='top');
plt.title("Hierarchical Clustering Dendrogram (Ward's Method)");
plt.ylabel("Distance");
plt.show()

Quasi-Diagonalization

After clustering, the next step in the HRP algorithm is quasi-diagonalization. The goal of this step is to reorder the rows and columns of the correlation matrix based on the hierarchical clustering results obtained in the previous step. This seriation step allows similar investments to be grouped together, and dissimilar ones to be placed farther apart.

The following implementation is taken from the original paper with annotated print statements for better understanding:

def get_quasi_diag(linkage_matrix: np.ndarray) -> List[int]:
    """ 
    Sort clustered items by distance based on the linkage matrix.
    
    Parameters
    ----------
    linkage_matrix : np.ndarray
        The linkage matrix from hierarchical clustering.
        
    Returns
    -------
    List[int]
        The sorted indices of the clustered items.
    """
    linkage_matrix = linkage_matrix.astype(int)
    sorted_index = pd.Series([linkage_matrix[-1, 0], linkage_matrix[-1, 1]])
    print(f"Initial sorted index (last two merged clusters): {sorted_index.tolist()}")
    num_items = linkage_matrix[-1, 3]
    print(f"Number of original items: {num_items}")
    # Recursively sort clusters by distance until no clusters remain
    while sorted_index.max() >= num_items:
        print("\n--- New iteration ---\n")
        
        sorted_index.index = range(0, sorted_index.shape[0] * 2, 2)
        print(f"Expanded sorted index to make space: {sorted_index.tolist()}")
        
        cluster_indices = sorted_index[sorted_index >= num_items]
        i = cluster_indices.index  # Indices in sorted_index where clusters need to be expanded
        j = cluster_indices.values - num_items  # Row numbers in linkage_matrix to retrieve cluster members
        print(f"Identified clusters to expand:")
        print(f"    Indices in sorted_index (i): {i.tolist()}")
        print(f"    Corresponding linkage matrix rows (j): {j.tolist()}")
        # Replace clusters with their first constituent
        for idx, row_num in zip(i, j):
            row_dict = {
                'merged_1': linkage_matrix[row_num, 0],
                'merged_2': linkage_matrix[row_num, 1],
                'dist': linkage_matrix[row_num, 2],
                'number_of_original': linkage_matrix[row_num, 3]
            }
            original_cluster = sorted_index[idx]
            first_constituent = linkage_matrix[row_num, 0]
            print(f"      Row {row_num} of the linkage matrix: {row_dict}")
            print(f"      Cluster {original_cluster} was formed by merging {linkage_matrix[row_num, 0]} and {linkage_matrix[row_num, 1]}")
            print(f"      Replacing cluster {original_cluster} with its first constituent: {first_constituent}")
            sorted_index[idx] = first_constituent
        print(f"After replacing clusters with their first constituents: {sorted_index.tolist()}")
        
        # Append the second constituent of the clusters
        second_constituents = pd.Series(linkage_matrix[j, 1], index=i + 1)
        sorted_index = pd.concat([sorted_index, second_constituents])
        for idx, second_constituent in second_constituents.items():
            print(f"      Appending second constituent {second_constituent}")
        print(f"After appending the second constituents: {sorted_index.tolist()}")
        
        # Re-sort the index and reset indexing to keep proper order
        sorted_index = sorted_index.sort_index()
        sorted_index.index = range(sorted_index.shape[0])
        print(f"Final sorted index after re-sorting: {sorted_index.tolist()}")

    return sorted_index.tolist()

Given the linkage matrix:

linkage_matrix

array([[3.        , 4.        , 0.11086528, 2.        ],
       [0.        , 2.        , 0.32005876, 2.        ],
       [1.        , 5.        , 0.38274266, 2.        ],
       [7.        , 8.        , 0.41569608, 4.        ],
       [6.        , 9.        , 0.62789035, 6.        ]])

The quasi-diagonalization process is applied as follows:

sorted_index = get_quasi_diag(linkage_matrix)

Initial sorted index (last two merged clusters): [6, 9]
Number of original items: 6

--- New iteration ---

Expanded sorted index to make space: [6, 9]
Identified clusters to expand:
    Indices in sorted_index (i): [0, 2]
    Corresponding linkage matrix rows (j): [0, 3]
      Row 0 of the linkage matrix: {'merged_1': 3, 'merged_2': 4, 'dist': 0, 'number_of_original': 2}
      Cluster 6 was formed by merging 3 and 4
      Replacing cluster 6 with its first constituent: 3
      Row 3 of the linkage matrix: {'merged_1': 7, 'merged_2': 8, 'dist': 0, 'number_of_original': 4}
      Cluster 9 was formed by merging 7 and 8
      Replacing cluster 9 with its first constituent: 7
After replacing clusters with their first constituents: [3, 7]
      Appending second constituent 4
      Appending second constituent 8
After appending the second constituents: [3, 7, 4, 8]
Final sorted index after re-sorting: [3, 4, 7, 8]

--- New iteration ---

Expanded sorted index to make space: [3, 4, 7, 8]
Identified clusters to expand:
    Indices in sorted_index (i): [4, 6]
    Corresponding linkage matrix rows (j): [1, 2]
      Row 1 of the linkage matrix: {'merged_1': 0, 'merged_2': 2, 'dist': 0, 'number_of_original': 2}
      Cluster 7 was formed by merging 0 and 2
      Replacing cluster 7 with its first constituent: 0
      Row 2 of the linkage matrix: {'merged_1': 1, 'merged_2': 5, 'dist': 0, 'number_of_original': 2}
      Cluster 8 was formed by merging 1 and 5
      Replacing cluster 8 with its first constituent: 1
After replacing clusters with their first constituents: [3, 4, 0, 1]
      Appending second constituent 2
      Appending second constituent 5
After appending the second constituents: [3, 4, 0, 1, 2, 5]
Final sorted index after re-sorting: [3, 4, 0, 2, 1, 5]

sorted_index

[3, 4, 0, 2, 1, 5]

The sorted correlation matrix is obtained by rearranging the rows and columns based on the sorted index:

sorted_correlation_matrix = shrinkage_correlation_matrix.iloc[sorted_index, sorted_index]

The quasi-diagonalized correlation matrix shows a more structured pattern, with higher correlations concentrated around the main diagonal. Store the ordered tickers for future reference:

ordered_tickers = shrinkage_correlation_matrix.index[sorted_index]
ordered_tickers

Index(['IYW', 'VGT', 'IYF', 'IYR', 'IYK', 'XLU'], dtype='object', name='Ticker')

Recursive Bisection

In the final step, recursive bisection, the hierarchical tree (built from hierarchical clustering) is split into two branches repeatedly until each branch contains only one asset. It can be shown that splitting weights inversely proportional to the variance of assets in each branch achieves optimal allocation when the correlation matrix is diagonal (see Appendix A.2. of the original paper).

Initialization:
- Set the list of items $L = \{L_0\}$, where $L_0 = \{n\}_{n=1, \ldots, N}$. This list represents the starting set that includes every asset to be allocated.
- Assign a unit weight to all items: $w_n = 1$, for all $n = 1, \ldots, N$.
Stopping Condition: If $\left| L_i \right| = 1$ for all $L_i \in L$, stop the algorithm.
Bisection (for each $L_i \in L$ such that $\left| L_i \right| > 1$):
1. Bisect the set: Divide $L_i$ into two subsets $L_i^{(1)} \cup L_i^{(2)} = L_i$, where \[\begin{align*} \left| L_i^{(1)} \right| = \operatorname{int}\left(\frac{1}{2} \left|L_i\right| \right) \end{align*}\] The order is preserved in the bisection.
2. Variance of subsets: For each subset $L_i^{(j)}$ (where $j = 1, 2$), calculate the variance $\widetilde{V}_i^{(j)}$ using the quadratic form: \[\begin{align*} \widetilde{V}_i^{(j)} \equiv \widetilde{w}_i^{(j)} V_i^{(j)} \widetilde{w}_i^{(j)} \end{align*}\] where $V_i^{(j)}$ is the covariance matrix of the assets in $L_i^{(j)}$, and $\widetilde{w}_i^{(j)}$ is defined as: \[\begin{align*} \widetilde{w}_i^{(j)} = \operatorname{diag}\left( V_i^{(j)} \right)^{-1} \frac{1}{\operatorname{tr}\left( \operatorname{diag}\left( V_i^{(j)} \right)^{-1} \right)} \end{align*}\] Here, $\operatorname{diag}[\cdot]$ denotes the diagonal matrix and $\operatorname{tr}[\cdot]$ is the trace operator.
3. Split factor: Compute the split factor $\alpha_i$ as: \[\begin{align*} \alpha_i = 1 - \frac{\widetilde{V}_i^{(1)}}{\widetilde{V}_i^{(1)} + \widetilde{V}_i^{(2)}} \end{align*}\] ensuring $0 \leq \alpha_i \leq 1$.
4. Rescale allocations:
- Rescale allocations for all $n \in L_i^{(1)}$ by a factor of $\alpha_i$.
- Rescale allocations for all $n \in L_i^{(2)}$ by a factor of $1 - \alpha_i$.
Repeat: Loop back to step 2 until the stopping condition is met.

Inverse Variance Portflio

def get_ivp(cov: pd.DataFrame) -> np.ndarray:
    """
    Compute the inverse variance portfolio (IVP) for a given covariance matrix.

    Parameters
    ----------
    cov : pd.DataFrame
        Covariance matrix of the cluster.

    Returns
    -------
    np.ndarray
        Inverse variance portfolio weights for the cluster.
    """
    ivp = 1.0 / np.diag(cov)
    ivp /= ivp.sum()  # Normalize weights
    print(f"\n    IVP weights for cluster: {ivp}")
    return ivp

Cluster Variance

def get_cluster_var(cov: pd.DataFrame, cluster_items: List[str]) -> float:
    """
    Compute the variance for a given cluster.

    Parameters
    ----------
    cov : pd.DataFrame
        Covariance matrix for the assets.
    cluster_items : List[str]
        List of tickers in the cluster.

    Returns
    -------
    float
        The variance of the cluster.
    """
    # Subset of the covariance matrix containing only the cluster assets
    cov_slice = cov.loc[cluster_items, cluster_items]
    # Define the left space (4 spaces in this case)
    hspace = "    "
    formatted_cov_slice = str(cov_slice).replace("\n", "\n" + hspace)
    print(f"\n    Covariance matrix slice for cluster {cluster_items}:\n{hspace}{formatted_cov_slice}")
    # Calculate inverse variance portfolio (IVP) weights for the cluster and reshape to column vector (n x 1)
    weights = get_ivp(cov_slice).reshape(-1, 1)
    # Compute cluster variance as w.T * cov_slice * w, which is [1 x n] * [n x n] * [n x 1] = [1 x 1]
    cluster_var = np.dot(np.dot(weights.T, cov_slice), weights)[0, 0]
    print(f"\n    Cluster variance for {cluster_items}: {cluster_var}\n")
    return cluster_var

Recursive Bisection

def recursive_bisection(cov: pd.DataFrame, ordered_tickers: Union[pd.Index, List[str]]) -> pd.Series:
    """
    Compute the Hierarchical Risk Parity (HRP) portfolio allocation using recursive bisection.

    Parameters
    ----------
    cov : pd.DataFrame
        Covariance matrix for the assets.
    ordered_tickers : Union[pd.Index, List[str]]
        List of tickers ordered by hierarchical clustering.

    Returns
    -------
    pd.Series
        Portfolio weights for each asset.
    """
    # Initialize equal weights for all assets
    weights = pd.Series(1.0, index=ordered_tickers)
    hspace = "    "  # Define the left space (4 spaces in this case)
    formatted_weights = str(weights).replace("\n", "\n" + hspace)
    print(f"Initial weights:\n{hspace}{formatted_weights}\n")
    # Initialize the cluster as the full set of ordered tickers
    cluster_items = [ordered_tickers]
    
    step = 0
    while len(cluster_items) > 0:
        step += 1
        print(f"\n{'='*20} Step {step}: Recursive Bisection {'='*20}\n")
        print(f"Cluster items at Step {step}: {[list(cluster) for cluster in cluster_items]}")
        # Split each cluster into two halves only if the cluster contains more than one asset
        cluster_items = [
            i[j:k]
            for i in cluster_items
            for j, k in ((0, len(i) // 2), (len(i) // 2, len(i)))
            if len(i) > 1
        ]
        # For each pair of clusters, calculate variance and adjust weights
        for i in range(0, len(cluster_items), 2):
            first_cluster = cluster_items[i]
            second_cluster = cluster_items[i + 1]
            print(f"\n    ---- Processing clusters ----")
            print(f"    First cluster: {first_cluster}")
            print(f"    Second cluster: {second_cluster}")
            # Compute variance for each cluster
            first_var = get_cluster_var(cov, list(first_cluster))
            second_var = get_cluster_var(cov, list(second_cluster))
            # Compute the weighting factor alpha
            alpha = 1 - first_var / (first_var + second_var)
            print(f"    Weighting factor alpha for {list(first_cluster)}: {alpha}")
            print(f"    Weighting factor 1 - alpha for {list(second_cluster)}: {1 - alpha}")
            # Adjust weights for each cluster
            weights[first_cluster] *= alpha
            weights[second_cluster] *= 1 - alpha
            formatted_weights = str(weights).replace("\n", "\n" + hspace)
            print(f"    Updated weights after adjustment:\n{hspace}{formatted_weights}\n")
    return weights

Portfolio Allocation

Applying the recursive bisection algorithm to the covariance matrix:

hrp_weights = recursive_bisection(cov=shrinkage_covariance_matrix, ordered_tickers=ordered_tickers)

Initial weights:
    Ticker
    IYW    1.0
    VGT    1.0
    IYF    1.0
    IYR    1.0
    IYK    1.0
    XLU    1.0
    dtype: float64


==================== Step 1: Recursive Bisection ====================

Cluster items at Step 1: [['IYW', 'VGT', 'IYF', 'IYR', 'IYK', 'XLU']]

    ---- Processing clusters ----
    First cluster: Index(['IYW', 'VGT', 'IYF'], dtype='object', name='Ticker')
    Second cluster: Index(['IYR', 'IYK', 'XLU'], dtype='object', name='Ticker')

    Covariance matrix slice for cluster ['IYW', 'VGT', 'IYF']:
    Ticker       IYW       VGT       IYF
    Ticker                              
    IYW     0.000322  0.000306  0.000186
    VGT     0.000306  0.000306  0.000191
    IYF     0.000186  0.000191  0.000249

    IVP weights for cluster: [0.29887164 0.31445765 0.38667071]

    Cluster variance for ['IYW', 'VGT', 'IYF']: 0.00024294420088323586

    Covariance matrix slice for cluster ['IYR', 'IYK', 'XLU']:
    Ticker       IYR       IYK       XLU
    Ticker                              
    IYR     0.000248  0.000130  0.000172
    IYK     0.000130  0.000134  0.000120
    XLU     0.000172  0.000120  0.000214

    IVP weights for cluster: [0.24994149 0.46036675 0.28969176]

    Cluster variance for ['IYR', 'IYK', 'XLU']: 0.00014859282764898994

    Weighting factor alpha for ['IYW', 'VGT', 'IYF']: 0.3795115578366297
    Weighting factor 1 - alpha for ['IYR', 'IYK', 'XLU']: 0.6204884421633703
    Updated weights after adjustment:
    Ticker
    IYW    0.379512
    VGT    0.379512
    IYF    0.379512
    IYR    0.620488
    IYK    0.620488
    XLU    0.620488
    dtype: float64


==================== Step 2: Recursive Bisection ====================

Cluster items at Step 2: [['IYW', 'VGT', 'IYF'], ['IYR', 'IYK', 'XLU']]

    ---- Processing clusters ----
    First cluster: Index(['IYW'], dtype='object', name='Ticker')
    Second cluster: Index(['VGT', 'IYF'], dtype='object', name='Ticker')

    Covariance matrix slice for cluster ['IYW']:
    Ticker       IYW
    Ticker          
    IYW     0.000322

    IVP weights for cluster: [1.]

    Cluster variance for ['IYW']: 0.0003215181251290974

    Covariance matrix slice for cluster ['VGT', 'IYF']:
    Ticker       VGT       IYF
    Ticker                    
    VGT     0.000306  0.000191
    IYF     0.000191  0.000249

    IVP weights for cluster: [0.44850226 0.55149774]

    Cluster variance for ['VGT', 'IYF']: 0.00023150783628519947

    Weighting factor alpha for ['IYW']: 0.4186201958641258
    Weighting factor 1 - alpha for ['VGT', 'IYF']: 0.5813798041358742
    Updated weights after adjustment:
    Ticker
    IYW    0.158871
    VGT    0.220640
    IYF    0.220640
    IYR    0.620488
    IYK    0.620488
    XLU    0.620488
    dtype: float64

    ---- Processing clusters ----
    First cluster: Index(['IYR'], dtype='object', name='Ticker')
    Second cluster: Index(['IYK', 'XLU'], dtype='object', name='Ticker')

    Covariance matrix slice for cluster ['IYR']:
    Ticker       IYR
    Ticker          
    IYR     0.000248

    IVP weights for cluster: [1.]

    Cluster variance for ['IYR']: 0.0002475355614636847

    Covariance matrix slice for cluster ['IYK', 'XLU']:
    Ticker       IYK       XLU
    Ticker                    
    IYK     0.000134  0.000120
    XLU     0.000120  0.000214

    IVP weights for cluster: [0.61377445 0.38622555]

    Cluster variance for ['IYK', 'XLU']: 0.0001392752130086069

    Weighting factor alpha for ['IYR']: 0.3600603245827725
    Weighting factor 1 - alpha for ['IYK', 'XLU']: 0.6399396754172275
    Updated weights after adjustment:
    Ticker
    IYW    0.158871
    VGT    0.220640
    IYF    0.220640
    IYR    0.223413
    IYK    0.397075
    XLU    0.397075
    dtype: float64


==================== Step 3: Recursive Bisection ====================

Cluster items at Step 3: [['IYW'], ['VGT', 'IYF'], ['IYR'], ['IYK', 'XLU']]

    ---- Processing clusters ----
    First cluster: Index(['VGT'], dtype='object', name='Ticker')
    Second cluster: Index(['IYF'], dtype='object', name='Ticker')

    Covariance matrix slice for cluster ['VGT']:
    Ticker       VGT
    Ticker          
    VGT     0.000306

    IVP weights for cluster: [1.]

    Cluster variance for ['VGT']: 0.00030558216038170134

    Covariance matrix slice for cluster ['IYF']:
    Ticker       IYF
    Ticker          
    IYF     0.000249

    IVP weights for cluster: [1.]

    Cluster variance for ['IYF']: 0.0002485128719824892

    Weighting factor alpha for ['VGT']: 0.4485022558714241
    Weighting factor 1 - alpha for ['IYF']: 0.5514977441285759
    Updated weights after adjustment:
    Ticker
    IYW    0.158871
    VGT    0.098958
    IYF    0.121683
    IYR    0.223413
    IYK    0.397075
    XLU    0.397075
    dtype: float64

    ---- Processing clusters ----
    First cluster: Index(['IYK'], dtype='object', name='Ticker')
    Second cluster: Index(['XLU'], dtype='object', name='Ticker')

    Covariance matrix slice for cluster ['IYK']:
    Ticker       IYK
    Ticker          
    IYK     0.000134

    IVP weights for cluster: [1.]

    Cluster variance for ['IYK']: 0.00013439156229630292

    Covariance matrix slice for cluster ['XLU']:
    Ticker       XLU
    Ticker          
    XLU     0.000214

    IVP weights for cluster: [1.]

    Cluster variance for ['XLU']: 0.0002135697881330683

    Weighting factor alpha for ['IYK']: 0.6137744547477219
    Weighting factor 1 - alpha for ['XLU']: 0.38622554525227815
    Updated weights after adjustment:
    Ticker
    IYW    0.158871
    VGT    0.098958
    IYF    0.121683
    IYR    0.223413
    IYK    0.243715
    XLU    0.153361
    dtype: float64


==================== Step 4: Recursive Bisection ====================

Cluster items at Step 4: [['VGT'], ['IYF'], ['IYK'], ['XLU']]

hrp_weights.sort_index()

Ticker
IYF    0.121683
IYK    0.243715
IYR    0.223413
IYW    0.158871
VGT    0.098958
XLU    0.153361
dtype: float64

Verify Results with `PyportfolioOpt`

The PyportfolioOpt package provides a convenient implementation of the HRP algorithm. We can compare the results obtained from our manual implementation with the package’s implementation to verify the correctness of our approach.

hrp = ppo.hierarchical_portfolio.HRPOpt(
  cov_matrix=shrinkage_covariance_matrix,
)

hrp.optimize(linkage_method="ward")

OrderedDict([('IYF', 0.12168265813576948), ('IYK', 0.2437145973588905), ('IYR', 0.22341326988520196), ('IYW', 0.15887120267426943), ('VGT', 0.0989576970265908), ('XLU', 0.15336057491927785)])

Evaluation Strategy

To assess the out-of-sample performance of the HRP optimal allocation strategy, a backtest can be employed. Backtesting is a key technique for evaluating an investment strategy’s performance using historical data. It allows for an assessment of how reliably the strategy could be replicated in real-world scenarios, while also providing valuable insights into its associated risk-return profile. To this end, Joubert et al., 2024 outlines three primary backtesting methods:

Backtesting Method	Description	Advantages	Limitations
Walk-Forward Backtest	Historical data is split into multiple segments. The strategy is trained on one segment and tested on the next.	Easy to implement.	Tests only a single path, risking overfitting. Past performance may not predict future results (e.g., Nvidia’s growth since COVID).
Resampling	A statistical technique that creates multiple samples from historical data to assess performance. This approach yields a distribution of performance metrics, including: - Bootstrap: Draws samples with replacement from historical data, enabling performance estimation over many iterations. - Cross-Validation: Splits the data into multiple training and testing sets, providing multiple assessments of the strategy’s performance.	Provides more robust analysis by generating multiple scenarios.	Does not create new data, but resamples existing data, which may miss unique future events.
Monte Carlo Simulation	Generates synthetic data with properties similar to historical data using an understanding of the data generation process.	Yields performance estimates indicative of future outcomes (assuming correct and stable data processes).	Requires accurate modeling of the data generation process, which is difficult for volatile financial markets.

For evaluating the HRP portfolio allocation strategy and optimizing the hierarchical clustering linkage method, we will use time-series bootstraps tools from the arch package.

Time-Series Bootstrapping

The arch.bootstrap.StationaryBootstrap class implements the method developed by Politis and Romano (1994). This method addresses the limitations of traditional bootstrapping techniques when applied to time-series data, aiming to preserve crucial time-dependent structures of the data while generating resampled datasets.

Let $X_1, X_2, \dots, X_T$ represent the original time series. To generate a resampled time series using block bootstrapping, define a block $B_{i,b}$ as consisting of $b$ consecutive observations starting at $X_i$:

\[\begin{align*} B_{i,b} = \{X_i, X_{i+1}, \dots, X_{i+b-1}\} \end{align*}\]

Periodic Boundry Conditions

If $i + b - 1 > T$ (i.e., the block extends beyond the end of the time series), periodic boundary conditions are applied. This means that the series wraps around, treating the first observations in the series as if they come after the last. In other words, for any index $j > T$, the observation is taken as:

\[\begin{align*} X_j = X_{(j \mod T)} \quad \text{for} \quad j > T, \quad \text{where} \, X_0 = X_T. \end{align*}\]

The expression $j \mod T$ calculates the remainder when $j$ is divided by $T$, effectively cycling the indices back to the beginning of the series. For example, if $T = 10$, and $X_{12}$ is requested, it would be equivalent to sampling $X_2$.

Block Selection

The starting indices of the blocks are determined by a sequence of independent and identically distributed (iid) random variables $I_1, I_2, \dots$, drawn uniformly from $\{1, \dots, T\}$. The lengths of the blocks are governed by another sequence of iid random variables $L_1, L_2, \dots$, where each $L_i$ follows a geometric distribution with parameter $p \in [0,1]$, i.e.,

\[\begin{align*} P(L_i = m) = (1 - p)^{m-1} p, \quad m = 1, 2, \dots. \end{align*}\]

Resampling

To construct the resampled series $X_1^*, X_2^*, \dots, X_T^*$:

Select the first block $B_{I_1, L_1}$ and copy $L_1$ consecutive observations starting at $X_{I_1}$ into the resampled series.
Select the next block $B_{I_2, L_2}$ and append $L_2$ consecutive observations starting at $X_{I_2}$, continuing this process.
Repeat the block selection and copying processes until a total of $T$ observations have been generated.

It is possible for this procedure to result in overlapping blocks when consecutive starting indices $I_1, I_2, \dots$ are close together, leading to the reuse of some observations. The process terminates once $T$ observations are resampled, but it can be extended to generate longer series if desired.

Implementation

The following script implements the time-series bootstrapping:

import warnings
from argparse import ArgumentParser
from datetime import datetime, timedelta
from multiprocessing import Pool, cpu_count
from pathlib import Path
from typing import List

import numpy as np
import pandas as pd
import yfinance as yf
from arch.bootstrap import StationaryBootstrap, optimal_block_length
from pypfopt.hierarchical_portfolio import HRPOpt
from sklearn.covariance import ledoit_wolf

# Define the tickers for the assets in the portfolio
tech = np.array(["IYW", "VGT"])
real_estate = np.array(["IYR"])
bank_finance = np.array(["IYF"])
utilities = np.array(["XLU"])
consumer_staples = np.array(["IYK"])
tickers = np.concatenate(
    [tech, real_estate, bank_finance, utilities, consumer_staples], axis=0
)

# Download the daily adjusted closing prices of the assets
start_date = (datetime.now() - timedelta(days=365 * 5)).strftime("%Y-%m-%d")
daily_prices = yf.download(", ".join(tickers), start=start_date)["Adj Close"]

# Calculate the daily returns of the assets
daily_returns = daily_prices.pct_change().dropna()
frequency = 252
# Typical savings account
annual_risk_free_rate = 0.02
daily_risk_free_rate = (1 + annual_risk_free_rate) ** (1 / frequency) - 1

pd.set_option("mode.copy_on_write", True)


def compute_optimal_block_length(returns: pd.DataFrame) -> float:
    """
    Compute the optimal block length for the stationary bootstrap.

    The optimal block length is computed for the squared returns since the
    autocorrelation in the squares tends to be stronger than in the returns
    themselves. This approach helps capture more of the dependence in the
    returns for the bootstrap.

    Parameters
    ----------
    returns : pd.DataFrame
        Daily returns of the assets.

    Returns
    -------
    float
        The average optimal block length for the stationary bootstrap.
    """
    block_lengths = optimal_block_length(returns**2)
    avg_block_length = block_lengths.loc[:, "stationary"].mean()
    return avg_block_length


def compute_shrinkage_covariance(returns: pd.DataFrame) -> pd.DataFrame:
    """
    Calculate the Ledoit-Wolf shrunk covariance matrix of the daily returns.

    Parameters
    ----------
    returns : pd.DataFrame
        Daily returns of the assets.

    Returns
    -------
    pd.DataFrame
        The shrunk covariance matrix of the daily returns.
    """
    shrinkage_cov = pd.DataFrame(
        ledoit_wolf(returns, assume_centered=False)[0],
        index=returns.columns,
        columns=returns.columns,
    )
    return shrinkage_cov


def compute_performance_metrics(
    returns: pd.DataFrame, linkage_method: str
) -> np.ndarray:
    """
    Calculate the expected return, volatility, and Sharpe ratio of the portfolio.

    Parameters
    ----------
    returns : pd.DataFrame
        Daily returns of the assets.
    linkage_method : str
        The method used for hierarchical clustering.
        Possible values: 'centroid', 'weighted', 'complete', 'average', 'ward'.

    Returns
    -------
    np.ndarray
        An array containing the annualized expected return, volatility, and Sharpe ratio.
    """
    shrinkage_cov = compute_shrinkage_covariance(returns)
    hrp = HRPOpt(returns=returns, cov_matrix=shrinkage_cov)

    with warnings.catch_warnings():
        warnings.simplefilter("ignore", category=FutureWarning)
        hrp.optimize(linkage_method=linkage_method)

    expected_return, volatility, sharpe_ratio = hrp.portfolio_performance(
        risk_free_rate=daily_risk_free_rate,
        frequency=frequency,
    )
    return np.array([expected_return, volatility, sharpe_ratio])


def bootstrap(seed_value: int, linkage_method: str) -> pd.DataFrame:
    """
    Perform stationary bootstrapping to compute portfolio performance metrics.

    Parameters
    ----------
    seed_value : int
        Seed for the random number generator to ensure reproducibility.
    linkage_method : str
        The method used for hierarchical clustering.
        Possible values: 'centroid', 'weighted', 'complete', 'average', 'ward'.

    Returns
    -------
    pd.DataFrame
        A DataFrame containing the annualized expected return, volatility, Sharpe ratio,
        and the linkage method used for clustering.
    """
    rs = np.random.default_rng(seed_value)
    avg_block_length = compute_optimal_block_length(daily_returns)
    bs = StationaryBootstrap(
        block_size=avg_block_length,
        returns=daily_returns,
        seed=rs,
    )
    results_array = bs.apply(
        func=compute_performance_metrics,
        reps=5000,
        extra_kwargs={"linkage_method": linkage_method},
    )

    performance_metrics = pd.DataFrame(
        results_array,
        columns=[
            "annualized_expected_return",
            "annualized_volatility",
            "annualized_sharpe_ratio",
        ],
    )
    performance_metrics["linkage_method"] = linkage_method

    return performance_metrics


def main() -> int:
    """
    Main function to run stationary bootstrapping on portfolio returns
    using hierarchical clustering with various linkage methods.

    This function parses command line arguments to specify the output directory,
    performs bootstrapping using parallel processing, and saves the results.

    Returns
    -------
    int
        Returns 0 upon successful completion.
    """
    parser = ArgumentParser()
    parser.add_argument("--output_dir", type=str, default=Path.cwd())
    args, _ = parser.parse_known_args()

    linkage_methods = ["centroid", "weighted", "complete", "average", "ward"]
    n_cores = min(cpu_count() - 1, len(linkage_methods))
    # Generate unique seeds for each process
    seeds = np.random.randint(0, 1e6, size=len(linkage_methods))

    with Pool(n_cores) as pool:
        # Run bootstrap for each linkage method with a unique seed
        simulated_performance: List[pd.DataFrame] = pool.starmap(
            bootstrap, zip(seeds, linkage_methods)
        )

    final_results = pd.concat(simulated_performance, ignore_index=True)
    final_results.to_csv(
        Path(args.output_dir) / "bootstrapped_performance_metrics.csv", index=False
    )
    return 0

if __name__ == "__main__":
  
    main()

Data Collection:
- Downloads daily adjusted closing prices for the specified ETFs
- Computes the daily returns time series.
Risk-Free Rate: Converts a $2\%$ annual risk-free rate to a daily rate to match daily returns frequency, which is typically $252$ for trading days.
\[\begin{align*} \text{daily risk-free rate} = (1 + \text{annual risk-free rate})^{\frac{1}{\text{frequency}}} - 1 \end{align*}\]
Block Length Calculation: Computes optimal block length for StationaryBootstrap using squared returns to capture dependencies. See docstring for details.
Ledoit-Wolf Covariance: Estimates the covariance matrix using the Ledoit-Wolf shrinkage method.
Bootstrapping: Runs StationaryBootstrap with 5000 replications for each linkage method to assess portfolio performance.
Performance Metrics: Computes annualized expected return, volatility, and Sharpe ratio from bootstrapped samples.

The script uses parallel processing to run bootstrapping for each linkage method with a unique seed in an separate process. The results are collected and saved to a CSV file for further analysis.

$ python3 bootstrap_backtest.py --output_dir /path/to/output/bootstrapped_performance_metrics.csv

Analysis

The bootstrapped performance metrics are stored in long format, with the linkage_method column indicating the hierarchical clustering method used for each set of bootstrapped samples. Each bootstrapped sample contains the annualized expected return, volatility, and Sharpe ratio, computed over a resampled set of all $T$ observations from the original time series.

performance_metrics = pd.read_csv("bootstrapped_performance_metrics.csv")

annualized_expected_return	annualized_volatility	annualized_sharpe_ratio	linkage_method
0.1090801	0.1988651	0.5481177	centroid
0.2660785	0.2277011	1.1681979	centroid
0.0362587	0.1902725	0.1901487	centroid
0.2569973	0.1656867	1.5506298	centroid
0.1183425	0.2509082	0.4713436	centroid

Bootstrapped Distributions

def plot_performance_metrics(performance_metrics: pd.DataFrame, metric: str) -> None:
    """
    Plot the performance metrics of the Hierarchical Risk Parity (HRP) portfolio.

    Parameters
    ----------
    performance_metrics : pd.DataFrame
        Performance metrics of the HRP portfolio.
    metric : str
        The performance metric to plot.
    """
    metric_title = metric.replace("_", " ").title()
    median_values = performance_metrics.groupby("linkage_method")[metric].median().reset_index()
    
    fig = px.violin(
        performance_metrics,
        x="linkage_method",
        y=metric,
        color="linkage_method",
        title=f"Bootstrapped {metric_title}s of Hierarchical Risk Parity Portfolios",
        labels={"metric": "Performance Metric", metric: "Value"},
    );
    for i, row in median_values.iterrows():
        value = row[metric]
        if metric in ["annualized_expected_return", "annualized_volatility"]:
            value = value * 100
            formated_value = f"{value:.4f}%"
        else:
            formated_value = f"{value:.4f}"
        fig.add_annotation(
            x=row['linkage_method'],
            y=row[metric],
            text=f"Median:<br>{formated_value}",
            showarrow=True,
            arrowhead=2,
            ax=0,
            ay=-40  # Offset to place the label above the plot
        );
    fig.update_layout(
      xaxis_title="Linkage Method", 
      yaxis_title=metric_title,
      legend_title="Linkage Method",
      legend=dict(
          orientation="h",  # Horizontal legend
          yanchor="bottom",  # Align the legend to the bottom of its position
          y=1.02,            # Place it slightly above the plot
          xanchor="center",   # Center the legend
          x=0.5               # Place it at the middle horizontally
      ),
      autosize=True,
    );
    fig.show()

Annualized Expected Returns

plot_performance_metrics(performance_metrics, "annualized_expected_return")

Annualized Volatility

plot_performance_metrics(performance_metrics, "annualized_volatility")

Annualized Sharpe Ratios

plot_performance_metrics(performance_metrics, "annualized_sharpe_ratio", True)

Bootstrap Confidence Intervals

From the bootstrapped distributions, we can calculate the confidence intervals for the performance metrics using the quantiles directly from the bootstrapped samples.

def bootstrapped_confidence_intervals(performance_metrics: pd.DataFrame, metric: str, alpha: float = 0.05) -> pd.DataFrame:
    """
    Calculate the (1 - alpha) confidence intervals for the performance metrics.

    Parameters
    ----------
    performance_metrics : pd.DataFrame
        Performance metrics of the HRP portfolio.
    metric : str
        The performance metric for which to calculate the confidence intervals.
    alpha : float, default 0.05
        The significance level for the confidence interval.

    Returns
    -------
    pd.DataFrame
        A DataFrame containing the (1 - alpha) confidence intervals for each performance metric.
    """
    lower_percentile = 100 * (alpha / 2)
    upper_percentile = 100 * (1 - (alpha / 2))

    def compute_ci(series: np.ndarray, is_lower: bool) -> float:
        percentile = lower_percentile if is_lower else upper_percentile
        bound = np.percentile(series, percentile)
        if metric in ["annualized_expected_return", "annualized_volatility"]:
            return bound * 100
        return bound

    confidence_intervals = performance_metrics.groupby("linkage_method")[metric].agg(
        lower_bound=lambda x: compute_ci(x, True),
        upper_bound=lambda x: compute_ci(x, False),
    )

    return confidence_intervals

Annualized Expected Returns

annualized_expected_return_ci = bootstrapped_confidence_intervals(performance_metrics, "annualized_expected_return", 0.05)

	lower_bound	upper_bound
average	2.886640	29.25455
centroid	2.974364	28.77622
complete	2.457101	29.09965
ward	1.834686	28.60933
weighted	2.408667	29.00987

Annualized Volatility

annualized_volatility_ci = bootstrapped_confidence_intervals(performance_metrics, "annualized_volatility", 0.05)

	lower_bound	upper_bound
average	13.00005	27.65951
centroid	12.98310	27.83992
complete	12.97197	27.72584
ward	13.31101	28.30012
weighted	13.05811	27.81955

Annualized Sharpe Ratios

annualized_sharpe_ratio_ci = bootstrapped_confidence_intervals(performance_metrics, "annualized_sharpe_ratio", 0.05)

	lower_bound	upper_bound
average	0.1318767	1.849346
centroid	0.1367354	1.804591
complete	0.1084253	1.815125
ward	0.0767051	1.746462
weighted	0.1086801	1.814162

Projected Growth

def plot_investment_growth_with_compounding(performance_metrics: pd.DataFrame, initial_investment: float, years: int) -> None:
    """
    Plot the projected investment growth over a holding period by linkage method, using compounding with median annualized expected returns.

    Parameters
    ----------
    performance_metrics : pd.DataFrame
        Performance metrics of the HRP portfolio.
    initial_investment : float
        The initial amount of the investment.
    years : int
        The holding period in years.
    """
    median_values = performance_metrics.groupby("linkage_method")["annualized_expected_return"].median().reset_index()
    growth_data = pd.DataFrame({'year': range(1, years + 1)})
    # Calculate the investment growth for each linkage method for each year, incorporating compounding
    for _, row in median_values.iterrows():
        growth_data[row['linkage_method']] = initial_investment * ((1 + row['annualized_expected_return']) ** growth_data['year'])
    # Reshape from wide to long format for plotting
    growth_data_melted = growth_data.melt(id_vars=['year'], var_name='linkage_method', value_name='investment_value')
    fig = px.line(
        growth_data_melted,
        x='year',
        y='investment_value',
        color='linkage_method',
        title=f'Projected Growth Over {years} Years (Initial Investment: ${initial_investment:,.2f})',
        labels={'investment_value': 'Investment Value ($)', 'year': 'Year'}
    );
    fig.update_layout(
        xaxis_title="Year",
        yaxis_title="Investment Value ($)",
        legend_title="Linkage Method",
        autosize=True,
        legend=dict(
            orientation="h",
            yanchor="bottom",
            y=1.02,
            xanchor="center",
            x=0.5
        )
    );
    fig.show()

Plot the projected investment growth over a $5$-year period with an initial investment of $\$10,000$:

plot_investment_growth_with_compounding(performance_metrics, initial_investment=10000, years=5)

Risk-Return Profile

def plot_risk_return_profile(performance_metrics: pd.DataFrame) -> None:
    """
    Plot the risk-return profile (annualized expected return vs. annualized volatility) for each linkage method.

    Parameters
    ----------
    performance_metrics : pd.DataFrame
        Performance metrics of the HRP portfolio.
    """
    linkage_methods = performance_metrics['linkage_method'].unique()
    n_cols = 2  # Number of columns in the grid
    n_rows = (len(linkage_methods) + 1) // n_cols

    fig = make_subplots(
        rows=n_rows, cols=n_cols,
        subplot_titles=linkage_methods,
        shared_xaxes=False, shared_yaxes=True,
        x_title="Annualized Volatility (Risk)", y_title="Annualized Expected Return"
    )
    
    # Find the min and max values of the Sharpe ratio for the color scale
    sharpe_min = performance_metrics['annualized_sharpe_ratio'].min()
    sharpe_max = performance_metrics['annualized_sharpe_ratio'].max()

    for i, method in enumerate(linkage_methods):
        row = i // n_cols + 1
        col = i % n_cols + 1
        filtered_data = performance_metrics[performance_metrics['linkage_method'] == method]
        
        # Add scatter plot trace with shared color scale
        fig.add_trace(
            go.Scatter(
                x=filtered_data['annualized_volatility'],
                y=filtered_data['annualized_expected_return'],
                mode='markers',
                name=method.capitalize(),
                hovertext=filtered_data['annualized_sharpe_ratio'],
                marker=dict(
                    size=8,
                    color=filtered_data['annualized_sharpe_ratio'],  # Color based on Sharpe ratio
                    cmin=sharpe_min,  # Set the same min for all subplots
                    cmax=sharpe_max,  # Set the same max for all subplots
                    colorscale="Viridis",
                    showscale=(i == 0),  # Only show the scale on the first subplot
                    colorbar=dict(title="Sharpe Ratio") if i == 0 else None,  # Add color bar on the first subplot
                ),
            ),
            row=row, col=col
        )
    fig.update_layout(
        title="Risk-Return Profile by Linkage Method",
        showlegend=False,
        autosize=True,
        height=1000,
    )
    fig.show()

plot_risk_return_profile(performance_metrics)

Optimal Linkage Method

We will select the linkage method that yields the highest median annualized Sharpe ratio from the bootstrapped distribution.

optimal_linkage_method = performance_metrics.groupby("linkage_method")["annualized_sharpe_ratio"].median().idxmax()

hrp = ppo.hierarchical_portfolio.HRPOpt(returns=daily_returns, cov_matrix=shrinkage_covariance_matrix)
with warnings.catch_warnings():
    warnings.simplefilter("ignore", category=FutureWarning)
    hrp.optimize(linkage_method=optimal_linkage_method)

OrderedDict([('IYF', 0.10382192445572266), ('IYK', 0.3542205029126607), ('IYR', 0.10424052379931392), ('IYW', 0.1600118160237878), ('VGT', 0.11374570623238564), ('XLU', 0.1639595265761292)])

Allocation Strategy

The optimal portfolio weights derived from the HRP algorithm can now be used to construct the final asset allocation. The pypfopt.discrete_allocation.DiscreteAllocation class provides several methods to convert continuous weights into discrete share quantities. One such method treats this conversion as an integer programming problem formulated as follows:

\[\begin{align*} \underset{x \in \mathbb{Z}^n}{\operatorname{minimise}} & \quad r + \| w T - x \odot p \|_1 \\ \text{subject to} & \quad r + x \cdot p = T \end{align*}\]

Where:

$T \in \mathbb{R}$ is the total dollar amount to be allocated.
$p \in \mathbb{R}^n$ is the vector of the latest prices for each asset.
$w \in \mathbb{R}^n$ is the vector of target portfolio weights for each asset.
$x \in \mathbb{Z}^n$ is the integer allocation, representing the number of units of each asset to be purchased.
$r \in \mathbb{R}$ is the remaining unallocated dollar amount, defined as $r = T - x \cdot p$.
$\odot$ denotes element-wise (Hadamard) multiplication.
$\| \cdot \|_1$ represents the $L_1$ norm, i.e., the sum of the absolute values of the elements in the vector.

The goal is to minimize the remaining unallocated value $r$ while also minimizing the difference between the target dollar allocation, $wT$, and the actual allocation, $x \odot p$.

total_portfolio_value = 10000

discrete_allocation = ppo.discrete_allocation.DiscreteAllocation(
    weights=hrp.clean_weights(),
    latest_prices=daily_prices.iloc[-1, :],
    total_portfolio_value=total_portfolio_value,
)

allocations, fund_remaning = discrete_allocation.lp_portfolio(verbose=True, solver=cp.CPLEX)

Funds remaining: 21.34
IYF: allocated 0.099, desired 0.104
IYK: allocated 0.358, desired 0.354
IYR: allocated 0.102, desired 0.104
IYW: allocated 0.156, desired 0.160
VGT: allocated 0.121, desired 0.114
XLU: allocated 0.164, desired 0.164
Allocation has RMSE: 0.004

allocations = pd.DataFrame.from_dict(allocations, orient="index", columns=["Shares"])
allocations

     Shares
IYF       9
IYK      51
IYR      10
IYW      10
VGT       2
XLU      20

To see a comprehensive list of available solvers, refer to the CVXPY solver documentation.

References

De Prado, M. L. (2016). Building diversified portfolios that outperform out-of-sample. SSRN Electronic Journal. https://papers.ssrn.com/sol3/papers.cfm?abstract_id=2708678
Ledoit, O., & Wolf, M. (2003). Honey, I shrunk the sample covariance matrix. UPF Economics and Business Working Paper No. 691. https://doi.org/10.2139/ssrn.4897573
Politis, D. N., & Romano, J. P. (1994). The stationary bootstrap. Journal of the American Statistical Association, 89(428), 1303-1313. https://doi.org/10.1080/01621459.1994.10476870
Joubert, J., Sestovic, D., Barziy, I., Distaso, W., & Lopez de Prado, M. (2024). The three types of backtests. SSRN Electronic Journal. https://doi.org/10.2139/ssrn.4897573
De Prado, M. L. (2018). Advances in Financial Machine Learning. John Wiley & Sons. https://www.wiley.com/en-fr/Advances+in+Financial+Machine+Learning-p-9781119482086

Finance Python Programming Machine Learning Statistical Learning Portfolio Optimization

Portfolio Optimization with Python: Hierarchical Risk Parity

Tools From The Python Ecosystem

All-ETF Portfolio

ETF Selection

Technology ETFs

Financial Services and Banking ETFs

Real Estate ETFs

Utilities ETFs

Consumer Staples ETFs

Daily Price Time Series

Time Series Plots

Daily Returns

Annualized Mean Returns

Shrinkage Covariance Matrix

Shrinkage Target

Population and Sample Correlations

Average Population and Sample Correlations

Constant Correlation Matrix

Computing the Shrinkage Covariance Matrix

Hierarchical Risk Parity (HRP)

Clustering

Distance Matrix

Hierarchical Clustering

Linkage Matrix Structure

Quasi-Diagonalization

Recursive Bisection

Inverse Variance Portflio

Cluster Variance

Recursive Bisection

Portfolio Allocation

Verify Results with PyportfolioOpt

Evaluation Strategy

Time-Series Bootstrapping

Periodic Boundry Conditions

Block Selection

Resampling

Implementation

Analysis

Bootstrapped Distributions

Annualized Expected Returns

Annualized Volatility

Annualized Sharpe Ratios

Bootstrap Confidence Intervals

Annualized Expected Returns

Annualized Volatility

Annualized Sharpe Ratios

Projected Growth

Risk-Return Profile

Optimal Linkage Method

Allocation Strategy

References

Yang (Ken) Wu

Related

Verify Results with `PyportfolioOpt`