Portfolio Optimization with Python: Hierarchical Risk Parity

In this post, we will delve into the Hierarchical Risk Parity (HRP) algorithm and demonstrate how it can be applied to optimize an ETF-based portfolio. HRP is a relatively recent development, as compared to Markowitz’s mean-variance framework, in portfolio management research that leverages hierarchical clustering to allocate weights based on the correlation structure between the assets.

Tools From The Python Ecosystem

We will be using the following libraries and packages:

Show code

[project]
name = "hierachical-risk-parity"
version = "0.1.0"
requires-python = ">=3.11"
dependencies = [
    "arch>=7.1.0",
    "cvxpy>=1.6.0",
    "matplotlib>=3.9.2",
    "pandas[performance]>=2.2.3",
    "plotly>=5.24.1",
    "pyportfolioopt>=1.5.5",
    "scikit-learn>=1.5.2",
    "scipy>=1.14.1",
    "yfinance>=0.2.54",
]

[tool.uv]
package = false

[dependency-groups]
notebook = [
    "ipykernel>=6.29.5",
    "ipywidgets>=8.1.5",
    "nbformat>=5.10.4",
]
lint-fmt = [
    "isort>=5.13.2",
    "ruff>=0.7.0",
]

Show code

import warnings
from datetime import datetime, timedelta
from typing import List, Optional, Tuple, Union

import cvxpy as cp
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import plotly.express as px
import plotly.figure_factory as ff
import plotly.graph_objects as go
import pypfopt as ppo
import yfinance as yf
from plotly.subplots import make_subplots
from scipy.cluster.hierarchy import dendrogram, linkage
from scipy.spatial.distance import squareform
from sklearn.covariance import ledoit_wolf

All-ETF Portfolio

The underlying investment vehicle for this portfolio optimization is Exchange-Traded Funds (ETFs). ETFs are a popular investment vehicle offering non-institutional investors diversification, liquidity, and cost-effectiveness. For this analysis, we will focus on the following equity ETFs with an emphasis on technology, financial services, real estate, utilities, and consumer staples.

ETF Selection

Technology ETFs

iShares U.S. Technology ETF (IYW): This ETF provides exposure to U.S. companies involved in technology, including hardware, software, and IT services.
Vanguard Information Technology Index Fund ETF Shares (VGT): This ETF tracks the performance of an index made up of large, mid-size, and small U.S. companies, offering broad exposure to the technology sector.

Financial Services and Banking ETFs

iShares U.S. Financials ETF (IYF): This ETF provides exposure to U.S. companies in the financial services sector, including banks, investment firms, and insurance companies.

Real Estate ETFs

iShares U.S. Real Estate ETF (IYR): This ETF tracks the Dow Jones U.S. Real Estate Capped Index, which measures the performance of the real estate sector of the U.S. equity market, as defined by the index provider.

Utilities ETFs

The Utilities Select Sector SPDR Fund (XLU): This ETF provides exposure to companies in the utilities sector, including electricity, gas, and water utilities.

Consumer Staples ETFs

iShares U.S. Consumer Staples ETF (IYK): This ETF provides exposure to U.S. companies in the consumer staples sector, including food, beverage, and household products.

Other sectors and categories can be explored by utilizing resources like the ETF Database Categories on Yahoo Finance. One of the most crucial aspects of building a portfolio is aligning it with our personal convictions about the sectors and industries we are investing in; it’s about having a deep, long-term belief in the growth potential and sustainability of those industries.

For more insights into the pros and cons of an all-ETF portfolio, consider reading this article from Charles Schwab: 3 Ways to Build an All-ETF Portfolio.

Daily Price Time Series

The daily price time series can be downloaded from Yahoo Finance using the yfinance package:

Show code

tech = np.array(["IYW", "VGT"])
real_estate = np.array(["IYR"])
bank_finance = np.array(["IYF"])
utilities = np.array(["XLU"])
consumer_staples = np.array(["IYK"])

tickers = np.concatenate([tech, real_estate, bank_finance, utilities, consumer_staples], axis=0)
tickers.shape

(6,)

The sampling period for this task can be set to the last 5 years:

Show code

start_date = (datetime.now() - timedelta(days=365.25 * 5)).strftime("%Y-%m-%d")

daily_prices = yf.download(", ".join(tickers), start=start_date, auto_adjust=False)["Adj Close"]


[                       0%                       ]
[****************      33%                       ]  2 of 6 completed
[**********************50%                       ]  3 of 6 completed
[**********************67%*******                ]  4 of 6 completed
[**********************83%***************        ]  5 of 6 completed
[*********************100%***********************]  6 of 6 completed

dates	IYF	IYK	IYR	IYW	VGT	XLU
2020-02-18	65.62138	41.73264	86.46013	63.67025	262.4211	59.86509
2020-02-19	65.74526	41.76549	87.39507	63.00155	259.9911	60.07826
2020-02-20	65.25426	41.60716	87.56109	61.51032	254.2279	59.96741
2020-02-23	63.27185	40.36743	86.39026	58.91109	243.8256	59.26823
2020-02-24	61.06459	39.35474	84.07471	57.32471	236.0743	58.02335

Quick inspection for potential data quality issues:

Missing Data: Unequal time series or API issues can lead to gaps in the dataset. To check for any missing values in the data:

Show code

daily_prices.isna().sum(axis=0)

Ticker
IYF    0
IYK    0
IYR    0
IYW    0
VGT    0
XLU    0
dtype: int64

Zero Values: Abnormal zero values can also indicate data problems, potentially due to API issues; we can identify columns with zero values using:

Show code

(daily_prices == 0).sum(axis=0)

Ticker
IYF    0
IYK    0
IYR    0
IYW    0
VGT    0
XLU    0
dtype: int64

Time Series Plots

Show code

fig = make_subplots(rows=3, cols=2, subplot_titles=tickers)

for i, ticker in enumerate(tickers):
    row = i // 2 + 1
    col = i % 2 + 1
    fig.add_trace(
        go.Scatter(x=daily_prices.index, y=daily_prices.loc[:, ticker], name=ticker),
        row=row,
        col=col,
    );

fig.update_layout(height=1000, autosize=True, title_text="Daily Prices");

fig.show()

Daily Returns

To calculate the daily returns:

${Daily Return}_{i, t} = r_{i, t} = \frac{{price}_{i, t} - {price}_{i, t - 1}}{{price}_{i, t - 1}}$

where

${price}_{i, t}$ is the price on day $t$ for asset $i$
${price}_{i, t - 1}$ is the price on day $t - 1$ for asset $i$

The following time series will be used as inputs for the portfolio optimization algorithm:

Show code

daily_returns = daily_prices.pct_change().dropna(how="all")

dates	IYF	IYK	IYR	IYW	VGT	XLU
2020-02-19	0.0018879	0.0007872	0.0108136	-0.0105027	-0.0092599	0.0035609
2020-02-20	-0.0074683	-0.0037908	0.0018996	-0.0236696	-0.0221669	-0.0018451
2020-02-23	-0.0303797	-0.0297962	-0.0133716	-0.0422568	-0.0409172	-0.0116593
2020-02-24	-0.0348855	-0.0250868	-0.0268033	-0.0269285	-0.0317903	-0.0210041
2020-02-25	-0.0079657	-0.0064521	-0.0104964	0.0051090	0.0023193	-0.0101395

Annualized Mean Returns

The annualized (geometric) mean return for each asset $i$ over the sampling period can be calculated as:

${Annualized Mean Return}_{i} = {\bar{r}}_{i} = {(\prod_{t = 1}^{T_{i}} (1 + r_{i, t}))}^{\frac{frequency}{T_{i}}} - 1$

where:

$r_{i, t}$ is the daily return for asset $i$ on day $t$
$T_{i}$ is the total number of trading days observed for asset $i$ over the sampling period
$frequency$ is the number of time periods in a year (typically $252$ for daily returns)

This formula computes the geometric mean of daily returns for each asset $i$ , and annualizes the returns by raising the product of returns to the power of the ratio $\frac{frequency}{T_{i}}$ , where $T_{i}$ accounts for the (potentially) varying number of observed trading days for each asset.

Show code

annualized_mean_returns = (1 + daily_returns).prod() ** (
    252 / daily_returns.count()
) - 1

annualized_mean_returns

Ticker
IYF    0.127549
IYK    0.102621
IYR    0.021925
IYW    0.213146
VGT    0.197400
XLU    0.059471
dtype: float64

Show code

conditional_colors = ["green" if value > 0 else "red" for value in annualized_mean_returns]

fig, ax = plt.subplots();

bars = ax.bar(
    annualized_mean_returns.index, 
    annualized_mean_returns * 100,  # Convert to percentage
    color=conditional_colors
);

for bar in bars:
    height = bar.get_height()
    ax.text(
        bar.get_x() + bar.get_width() / 2, 
        height + np.sign(height) * 0.1,  # Add spacing to the value
        f'{height:.2f}%', 
        ha='center', va='bottom' if height > 0 else 'top',  # Adjust placement for negative values
        fontsize=10
    );

title_text = f"Annualized Mean Returns (%)\nData from {start_date} to {daily_prices.index[-1].strftime('%Y-%m-%d')}"
ax.set_title(title_text, fontsize=14);
plt.xticks(rotation=45, ha='right');
plt.tight_layout();
plt.show()

Shrinkage Covariance Matrix

An essential component of portfolio optimization is the estimator of the covariance matrix, which quantifies the relationship between the daily returns series. While the sample covariance matrix is a standard, unbiased estimator, it often contains significant estimation errors in practice.

To address this issue, Ledoit and Wolf (2004) propose a shrinkage approach, which combines the sample covariance matrix with a more structured and stable estimator, such as the constant correlation model. The shrinkage process pulls extreme covariance values in noisy data towards a central target, reducing estimation error and enhancing the stability of the covariance matrix.

Shrinkage Target

Let the returns for asset $i$ on day $t$ be denoted by $X_{i, t}$ , where $1 \leq i \leq N$ represents the index of assets (i.e., number of assets) and $1 \leq t \leq T$ represents the time index (i.e., number of trading days over the sample period). Let the following matrices be defined:

$Σ$ represents the true population covariance matrix, where the element $σ_{i j}$ represents the covariance between asset $i$ and asset $j$ .
$S$ represents the sample covariance matrix, where the element $s_{i j}$ represents the sample covariance between asset $i$ and asset $j$ .

Population and Sample Correlations

The population correlation $ϱ_{i j}$ between asset $i$ and asset $j$ is given by:

$\begin{array}{r} ϱ_{i j} = \frac{σ_{i j}}{\sqrt{σ_{i i} σ_{j j}}} \end{array}$

Similarly, the sample correlation $r_{i j}$ is defined as:

$\begin{array}{r} r_{i j} = \frac{s_{i j}}{\sqrt{s_{i i} s_{j j}}} \end{array}$

Average Population and Sample Correlations

From $ϱ_{i j}$ and $r_{i j}$ , the average population correlation $\bar{ϱ}$ and the average sample correlation $\bar{r}$ are computed as follows:

$\begin{array}{r} \bar{ϱ} = \frac{2}{(N - 1) N} \sum_{i = 1}^{N - 1} \sum_{j = i + 1}^{N} ϱ_{i j} \end{array}$

$\begin{array}{r} \bar{r} = \frac{2}{(N - 1) N} \sum_{i = 1}^{N - 1} \sum_{j = i + 1}^{N} r_{i j} \end{array}$

Constant Correlation Matrix

The shrinkage target is the constant correlation matrix, constructed based on the average correlations $\bar{ϱ}$ and $\bar{r}$ :

Population Constant Correlation Matrix $Φ$ : This matrix is based on the average population correlation $\bar{ϱ}$ :

$\begin{array}{r} ϕ_{i i} = σ_{i i}, ϕ_{i j} = \bar{ϱ} \sqrt{σ_{i i} σ_{j j}} \end{array}$
Sample Constant Correlation Matrix $F$ : This matrix is based on the average sample correlation $\bar{r}$ :

$\begin{array}{r} f_{i i} = s_{i i}, f_{i j} = \bar{r} \sqrt{s_{i i} s_{j j}} \end{array}$

The sample constant correlation matrix $F$ serves as a shrinkage target for the covariance matrix estimation. It represents a balance between structured stability (constant correlations) and data-driven flexibility (sample covariances), reducing estimation error and improving the robustness of any downstream portfolio optimization procedures.

Note: The reference paper highlights that the constant correlation model may be unsuitable when assets belong to different classes, such as stocks and bonds. However, in our case, since all the ETFs selected have same underlying asset class, i.e., equities, the constant correlation model is a reasonable and appropriate choice for the shrinkage target.

Computing the Shrinkage Correlation Matrix

The Ledoit-Wolf shrinkage estimator is implemented in the scikit-learn library.

Formally, let $\hat{Σ}$ represent the shrinkage covariance matrix, where the element ${\hat{σ}}_{i j}$ is the covariance between assets $i$ and $j$ . The matrix $\hat{Σ}$ is of size $N \times N$ :

$\begin{array}{r} \hat{Σ} = \underset{N \times N}{\underset{⏟}{[\begin{array}{c} {\hat{σ}}_{11} & {\hat{σ}}_{12} & \dots & {\hat{σ}}_{1 N} \\ {\hat{σ}}_{21} & {\hat{σ}}_{22} & \dots & {\hat{σ}}_{2 N} \\ ⋮ & ⋮ & ⋱ & ⋮ \\ {\hat{σ}}_{N 1} & {\hat{σ}}_{N 2} & \dots & {\hat{σ}}_{N N} \end{array}]}} \end{array}$

To convert this covariance matrix to the correlation matrix $\hat{R}$ , we normalize each element by dividing the covariance between assets $i$ and $j$ by the square root of the product of the variances of assets $i$ and $j$ :

$\begin{array}{r} {\hat{r}}_{i j} = \frac{{\hat{σ}}_{i j}}{\sqrt{{\hat{σ}}_{i i} {\hat{σ}}_{j j}}} \end{array}$

To compute the values ( ${\hat{r}}_{i j}$ ) efficiently, define $D$ as a diagonal matrix that contains the inverse of the square roots of the diagonal elements of $\hat{Σ}$ (i.e., the variances of the assets):

$\begin{array}{r} D = \underset{N \times N}{\underset{⏟}{[\begin{array}{c} \frac{1}{\sqrt{{\hat{σ}}_{11}}} & 0 & \dots & 0 \\ 0 & \frac{1}{\sqrt{{\hat{σ}}_{22}}} & \dots & 0 \\ ⋮ & ⋮ & ⋱ & ⋮ \\ 0 & 0 & \dots & \frac{1}{\sqrt{{\hat{σ}}_{N N}}} \end{array}]}} \end{array}$

The shrinkage correlation matrix $\hat{R}$ is then computed as:

$\begin{array}{r} \underset{N \times N}{\underset{⏟}{\hat{R}}} = \underset{N \times N}{\underset{⏟}{D}} \underset{N \times N}{\underset{⏟}{\hat{Σ}}} \underset{N \times N}{\underset{⏟}{D}} \end{array}$

This yields the following correlation matrix:

$\begin{array}{r} \hat{R} = \underset{N \times N}{\underset{⏟}{[\begin{array}{c} 1 & {\hat{r}}_{12} & \dots & {\hat{r}}_{1 N} \\ {\hat{r}}_{21} & 1 & \dots & {\hat{r}}_{2 N} \\ ⋮ & ⋮ & ⋱ & ⋮ \\ {\hat{r}}_{N 1} & {\hat{r}}_{N 2} & \dots & 1 \end{array}]}} \end{array}$

In this matrix, the diagonal elements are $1$ , i.e., the correlation of an asset with itself is always $1$ .

Show code

def to_dataframe(matrix: Union[np.ndarray, pd.DataFrame], columns: pd.Index) -> pd.DataFrame:
    return pd.DataFrame(matrix, index=columns, columns=columns)
  
# Shrinkage covariance matrix
shrinkage_covariance_matrix, _ = ledoit_wolf(
    X=daily_returns,
    assume_centered=False,
)

# Shrinkage correlation matrix
dinv = np.diag(1 / np.sqrt(np.diag(shrinkage_covariance_matrix)))
shrinkage_correlation_matrix = np.dot(dinv, np.dot(shrinkage_covariance_matrix, dinv))

columns = daily_returns.columns
shrinkage_covariance_matrix = to_dataframe(shrinkage_covariance_matrix, columns)
shrinkage_correlation_matrix = to_dataframe(shrinkage_correlation_matrix, columns)

	IYF	IYK	IYR	IYW	VGT	XLU
IYF	1.0000000	0.7060423	0.7841322	0.6461096	0.6809586	0.6253118
IYK	0.7060423	1.0000000	0.7107111	0.5603243	0.5856663	0.7009273
IYR	0.7841322	0.7107111	1.0000000	0.6054559	0.6337747	0.7444306
IYW	0.6461096	0.5603243	0.6054559	1.0000000	0.9758028	0.4442204
VGT	0.6809586	0.5856663	0.6337747	0.9758028	1.0000000	0.4718863
XLU	0.6253118	0.7009273	0.7444306	0.4442204	0.4718863	1.0000000

This can be easily visualized using a heatmap:

Show code

fig, ax = plt.subplots();
cax = ax.matshow(shrinkage_correlation_matrix, cmap="coolwarm");
ax.xaxis.set_ticks_position('bottom');
ax.set_xticks(np.arange(len(shrinkage_correlation_matrix.columns)));
ax.set_yticks(np.arange(len(shrinkage_correlation_matrix.columns)));
ax.set_xticklabels(shrinkage_correlation_matrix.columns);
ax.set_yticklabels(shrinkage_correlation_matrix.columns);

for i in range(shrinkage_correlation_matrix.shape[0]):
    for j in range(shrinkage_correlation_matrix.shape[1]):
        # Ensure text can be easily seen
        value = shrinkage_correlation_matrix.iloc[i, j]
        color = "white" if value < 0.5 else "black"
        ax.text(j, i, f"{value:.2f}", ha="center", va="center", color=color);

plt.title("Shrinkage Correlation Matrix");
plt.tight_layout();
plt.show()

Hierarchical Risk Parity (HRP)

The HRP algorithm consists of three key stages:

Clustering: Assets are grouped based on their correlations using hierarchical clustering methods like single, average, or complete linkage.
Quasi-Diagonalization: The covariance matrix is reordered based on the clustering structure, concentrating covariances around the diagonal.
Recursive Bisection: Weights are assigned recursively across clusters in a top-down manner.

The following sections will walk through a numerical example of the HRP algorithm using daily returns data from the previous section.

Clustering

Instead of treating assets independently, HRP groups similar assets based on their correlations. The goal is to cluster assets that are more closely related to each other.

Distance Matrix

The distance between assets is computed as a function of their correlations, with lower correlations indicating greater distances. First, convert the correlation matrix into a distance matrix:

$\begin{array}{r} d : (X_{i}, X_{j}) \subset B \to R, d_{i, j} = d (X_{i}, X_{j}) = \sqrt{\frac{1}{2} (1 - ρ_{i, j})}, d_{i, j} \in [0, 1] \end{array}$

where:

$d_{i, j}$ represents the distance between asset $X_{i}$ and asset $X_{j}$
$ρ_{i, j}$ is the correlation coefficient between the returns of assets $X_{i}$ and $X_{j}$

Show code

dist_matrix = np.sqrt((1.0 - shrinkage_correlation_matrix.round(8)) / 2.0)
dist_matrix

Ticker       IYF       IYK       IYR       IYW       VGT       XLU
Ticker                                                            
IYF     0.000000  0.383378  0.328533  0.420649  0.399400  0.432833
IYK     0.383378  0.000000  0.380322  0.468869  0.455156  0.386699
IYR     0.328533  0.380322  0.000000  0.444153  0.427917  0.357470
IYW     0.420649  0.468869  0.444153  0.000000  0.109994  0.527153
VGT     0.399400  0.455156  0.427917  0.109994  0.000000  0.513865
XLU     0.432833  0.386699  0.357470  0.527153  0.513865  0.000000

Transform the distance matrix into a condensed form. The condensed form contains only the upper triangular part of the distance matrix (excluding the diagonal), which is what hierarchical clustering requires:

Show code

condensed_dist_matrix = squareform(X=dist_matrix, checks=False)
condensed_dist_matrix

array([0.38337823, 0.32853292, 0.42064855, 0.39940044, 0.43283266,
       0.38032153, 0.4688687 , 0.45515587, 0.3866993 , 0.44415317,
       0.42791666, 0.35746986, 0.10999368, 0.52715256, 0.51386464])

Hierarchical Clustering

A hierarchical clustering algorithm is applied to the condensed distance matrix to group pairs of assets into clusters. The linkage function from scipy is used to perform the clustering.

Note: An important hyperparameter in hierarchical clustering is the linkage method, which determines how the distances between newly formed clusters are calculated. The “appropriate” linkage method depends on the data characteristics, and evaluation mechanisms can help determine the choice. This will be done in the next section.

Show code

linkage_matrix = linkage(y=condensed_dist_matrix, method="ward")

linkage_matrix

array([[3.        , 4.        , 0.10999368, 2.        ],
       [0.        , 2.        , 0.32853292, 2.        ],
       [1.        , 5.        , 0.3866993 , 2.        ],
       [7.        , 8.        , 0.41790066, 4.        ],
       [6.        , 9.        , 0.64018061, 6.        ]])

Linkage Matrix Structure

The linkage_matrix contains the steps of hierarchical clustering and is structured as follows:

Column	Description
1st	Index of the first cluster being merged. If the value is less than the number of original items, it refers to an original item (i.e., a ticker). Otherwise, it refers to a cluster formed in a previous step.
2nd	Index of the second cluster being merged. Follows the same logic as the first column.
3rd	Distance (or dissimilarity) between the two clusters being merged.
4th	Number of original items in the newly formed cluster after this merge.

The hierarchical clustering merges the original items step by step, forming clusters that are indexed starting from 6 (the first new cluster). The cluster IDs start after the number of original items (6th in zero-based indexing).

Show code

linkage_matrix

array([[3.        , 4.        , 0.10999368, 2.        ],
       [0.        , 2.        , 0.32853292, 2.        ],
       [1.        , 5.        , 0.3866993 , 2.        ],
       [7.        , 8.        , 0.41790066, 4.        ],
       [6.        , 9.        , 0.64018061, 6.        ]])

Step 1: Tickers 3 and 4 are merged to form Cluster 6 (2 items).
Step 2: Tickers 0 and 2 are merged to form Cluster 7 (2 items).
Step 3: Tickers 1 and 5 are merged to form Cluster 8 (2 items).
Step 4: Clusters 7 (containing tickers 0, 2) and 8 (containing tickers 1, 5) are merged to form Cluster 9 (4 items).
Step 5: Cluster 6 (containing tickers 3, 4) and Cluster 9 are merged to form Cluster 10 (6 items).

In this process, the distance column shows the dissimilarity between the merged clusters. The number of original items reflects how many tickers are included in the newly formed cluster. The output can be visualized via a dendrogram:

Show code

fig, ax = plt.subplots();
dendrogram(linkage_matrix, labels=shrinkage_correlation_matrix.columns, orientation='top');
plt.title("Hierarchical Clustering Dendrogram (Ward's Method)");
plt.ylabel("Distance");
plt.show()

Quasi-Diagonalization

After clustering, the next step in the HRP algorithm is quasi-diagonalization. The goal of this step is to reorder the rows and columns of the correlation matrix based on the hierarchical clustering results obtained in the previous step. This seriation step allows similar investments to be grouped together, and dissimilar ones to be placed farther apart.

The following implementation is taken from the original paper with annotated print statements for better understanding:

Show code

def get_quasi_diag(linkage_matrix: np.ndarray) -> List[int]:
    """ 
    Sort clustered items by distance based on the linkage matrix.
    
    Parameters
    ----------
    linkage_matrix : np.ndarray
        The linkage matrix from hierarchical clustering.
        
    Returns
    -------
    List[int]
        The sorted indices of the clustered items.
    """
    linkage_matrix = linkage_matrix.astype(int)

    sorted_index = pd.Series([linkage_matrix[-1, 0], linkage_matrix[-1, 1]])
    print(f"Initial sorted index (last two merged clusters): {sorted_index.tolist()}")
    num_items = linkage_matrix[-1, 3]
    print(f"Initial total number of original items: {num_items}")
    # Recursively sort clusters by distance until no clusters remain
    while sorted_index.max() >= num_items:
        print("\n--- New iteration ---\n")
        
        # Reassign the series index to a range with gaps (0, 2, 4, ...)
        # Creates empty positions between the current elements
        # The gaps allow new elements to be inserted without overwriting existing ones
        # It is needed in quasi-diagonalization to insert the second constituent of clusters
        # The operation helps maintain the correct hierarchical ordering of the items
        sorted_index.index = range(0, sorted_index.shape[0] * 2, 2)
        print(f"Expanded sorted index positions (with gaps): {sorted_index.tolist()}")
        
        # Filter sorted_index to select only indices corresponding to merged clusters (i.e., indices >= num_items)
        # Original items have indices from 0 to num_items - 1, any value that is >= num_items corresponds to a merged cluster
        # The cluster_indices will contain only those elements that still need to be "unpacked" into their constituent parts
        cluster_indices = sorted_index[sorted_index >= num_items]
        i = cluster_indices.index  # Indices in sorted_index where clusters need to be expanded
        j = cluster_indices.values - num_items # Row numbers in linkage_matrix to retrieve cluster members
        print(f"Identified clusters to expand:")
        print(f"    Positions in sorted_index (i): {i.tolist()}")
        print(f"    Corresponding linkage matrix row numbers (j): {j.tolist()}")
        # Replace clusters with their first constituent
        for idx, row_num in zip(i, j):
            row_dict = {
                'merged_1': linkage_matrix[row_num, 0],
                'merged_2': linkage_matrix[row_num, 1],
                'number_of_original': linkage_matrix[row_num, 3]
            }
            original_cluster = sorted_index[idx]
            first_constituent = linkage_matrix[row_num, 0]
            print(f"      Row {row_num} of the linkage matrix: {row_dict}")
            print(f"      Cluster {original_cluster} was formed by merging {linkage_matrix[row_num, 0]} and {linkage_matrix[row_num, 1]}")
            print(f"      Replacing cluster {original_cluster} with its first constituent: {first_constituent}")
            sorted_index[idx] = first_constituent
        print(f"After replacing clusters with their first constituents: {sorted_index.tolist()}")
        
        # Append the second constituent of the clusters
        second_constituents = pd.Series(linkage_matrix[j, 1], index=i + 1)
        sorted_index = pd.concat([sorted_index, second_constituents])
        for idx, second_constituent in second_constituents.items():
            print(f"      Appending second constituent {second_constituent}")
        print(f"After appending the second constituents: {sorted_index.tolist()}")
        
        # In quasi-diagonalization, we recursively unpack clusters to reveal the underlying order
        # After inserting new elements into the series with gaps, the positions may no longer be in sequential order
        # Sorting the series by its index re-establishes the intended order based on the gaps we created
        # Resetting the index to a continuous range (0, 1, 2, ...) removes the gaps and standardizes the positions
        # This ensures that in the next iteration, we can correctly identify which elements are still clusters to unpack
        sorted_index = sorted_index.sort_index()
        sorted_index.index = range(sorted_index.shape[0])
        print(f"Final sorted index after re-sorting: {sorted_index.tolist()}")

    return sorted_index.tolist()

Given the linkage matrix:

Show code

linkage_matrix

array([[3.        , 4.        , 0.10999368, 2.        ],
       [0.        , 2.        , 0.32853292, 2.        ],
       [1.        , 5.        , 0.3866993 , 2.        ],
       [7.        , 8.        , 0.41790066, 4.        ],
       [6.        , 9.        , 0.64018061, 6.        ]])

The quasi-diagonalization process is applied as follows:

Show code

sorted_index = get_quasi_diag(linkage_matrix)

Initial sorted index (last two merged clusters): [6, 9]
Initial total number of original items: 6

--- New iteration ---

Expanded sorted index positions (with gaps): [6, 9]
Identified clusters to expand:
    Positions in sorted_index (i): [0, 2]
    Corresponding linkage matrix row numbers (j): [0, 3]
      Row 0 of the linkage matrix: {'merged_1': 3, 'merged_2': 4, 'number_of_original': 2}
      Cluster 6 was formed by merging 3 and 4
      Replacing cluster 6 with its first constituent: 3
      Row 3 of the linkage matrix: {'merged_1': 7, 'merged_2': 8, 'number_of_original': 4}
      Cluster 9 was formed by merging 7 and 8
      Replacing cluster 9 with its first constituent: 7
After replacing clusters with their first constituents: [3, 7]
      Appending second constituent 4
      Appending second constituent 8
After appending the second constituents: [3, 7, 4, 8]
Final sorted index after re-sorting: [3, 4, 7, 8]

--- New iteration ---

Expanded sorted index positions (with gaps): [3, 4, 7, 8]
Identified clusters to expand:
    Positions in sorted_index (i): [4, 6]
    Corresponding linkage matrix row numbers (j): [1, 2]
      Row 1 of the linkage matrix: {'merged_1': 0, 'merged_2': 2, 'number_of_original': 2}
      Cluster 7 was formed by merging 0 and 2
      Replacing cluster 7 with its first constituent: 0
      Row 2 of the linkage matrix: {'merged_1': 1, 'merged_2': 5, 'number_of_original': 2}
      Cluster 8 was formed by merging 1 and 5
      Replacing cluster 8 with its first constituent: 1
After replacing clusters with their first constituents: [3, 4, 0, 1]
      Appending second constituent 2
      Appending second constituent 5
After appending the second constituents: [3, 4, 0, 1, 2, 5]
Final sorted index after re-sorting: [3, 4, 0, 2, 1, 5]

Show code

sorted_index

[3, 4, 0, 2, 1, 5]

The sorted correlation matrix is obtained by rearranging the rows and columns based on the sorted index:

Show code

sorted_correlation_matrix = shrinkage_correlation_matrix.iloc[sorted_index, sorted_index]

The quasi-diagonalized correlation matrix shows a more structured pattern, with higher correlations concentrated around the main diagonal. Store the ordered tickers for future reference:

Show code

ordered_tickers = shrinkage_correlation_matrix.index[sorted_index]
ordered_tickers

Index(['IYW', 'VGT', 'IYF', 'IYR', 'IYK', 'XLU'], dtype='object', name='Ticker')

Recursive Bisection

In the final step, recursive bisection, the hierarchical tree (built from hierarchical clustering) is split into two branches repeatedly until each branch contains only one asset. It can be shown that splitting weights inversely proportional to the variance of assets in each branch achieves optimal allocation when the correlation matrix is diagonal (see Appendix A.2. of the original paper).

Initialization:
- Set the list of items $L = {L_{0}}$ , where $L_{0} = {n}_{n = 1, \dots, N}$ . This list represents the starting set that includes every asset to be allocated.
- Assign a unit weight to all items: $w_{n} = 1$ , for all $n = 1, \dots, N$ .
Stopping Condition: If $| L_{i} | = 1$ for all $L_{i} \in L$ , stop the algorithm.
Bisection (for each $L_{i} \in L$ such that $| L_{i} | > 1$ ):
1. Bisect the set: Divide $L_{i}$ into two subsets $L_{i}^{(1)} \cup L_{i}^{(2)} = L_{i}$ , where $\begin{array}{r} | L_{i}^{(1)} | = int (\frac{1}{2} | L_{i} |) \end{array}$ The order is preserved in the bisection.
2. Variance of subsets: For each subset $L_{i}^{(j)}$ (where $j = 1, 2$ ), calculate the variance ${\tilde{V}}_{i}^{(j)}$ using the quadratic form: $\begin{array}{r} {\tilde{V}}_{i}^{(j)} \equiv {\tilde{w}}_{i}^{(j)} V_{i}^{(j)} {\tilde{w}}_{i}^{(j)} \end{array}$ where $V_{i}^{(j)}$ is the covariance matrix of the assets in $L_{i}^{(j)}$ , and ${\tilde{w}}_{i}^{(j)}$ is defined as: $\begin{array}{r} {\tilde{w}}_{i}^{(j)} = diag {(V_{i}^{(j)})}^{- 1} \frac{1}{tr (diag {(V_{i}^{(j)})}^{- 1})} \end{array}$ Here, $diag [\cdot]$ denotes the diagonal matrix and $tr [\cdot]$ is the trace operator.
3. Split factor: Compute the split factor $α_{i}$ as: $\begin{array}{r} α_{i} = 1 - \frac{{\tilde{V}}_{i}^{(1)}}{{\tilde{V}}_{i}^{(1)} + {\tilde{V}}_{i}^{(2)}} \end{array}$ ensuring $0 \leq α_{i} \leq 1$ .
4. Rescale allocations:
- Rescale allocations for all $n \in L_{i}^{(1)}$ by a factor of $α_{i}$ .
- Rescale allocations for all $n \in L_{i}^{(2)}$ by a factor of $1 - α_{i}$ .
Repeat: Loop back to step 2 until the stopping condition is met.

Inverse Variance Portflio

Show code

def get_ivp(cov: pd.DataFrame) -> np.ndarray:
    """
    Compute the inverse variance portfolio (IVP) for a given covariance matrix.

    Parameters
    ----------
    cov : pd.DataFrame
        Covariance matrix of the cluster.

    Returns
    -------
    np.ndarray
        Inverse variance portfolio weights for the cluster.
    """
    ivp = 1.0 / np.diag(cov)
    ivp /= ivp.sum()  # Normalize weights
    print(f"\n    IVP weights for cluster: {ivp}")
    return ivp

Cluster Variance

Show code

def get_cluster_var(cov: pd.DataFrame, cluster_items: List[str]) -> float:
    """
    Compute the variance for a given cluster.

    Parameters
    ----------
    cov : pd.DataFrame
        Covariance matrix for the assets.
    cluster_items : List[str]
        List of tickers in the cluster.

    Returns
    -------
    float
        The variance of the cluster.
    """
    # Subset of the covariance matrix containing only the cluster assets
    cov_slice = cov.loc[cluster_items, cluster_items]
    # Define the left space (4 spaces in this case)
    hspace = "    "
    formatted_cov_slice = str(cov_slice).replace("\n", "\n" + hspace)
    print(f"\n    Covariance matrix slice for cluster {cluster_items}:\n{hspace}{formatted_cov_slice}")
    # Calculate inverse variance portfolio (IVP) weights for the cluster and reshape to column vector (n x 1)
    weights = get_ivp(cov_slice).reshape(-1, 1)
    # Compute cluster variance as w.T * cov_slice * w, which is [1 x n] * [n x n] * [n x 1] = [1 x 1]
    cluster_var = np.dot(np.dot(weights.T, cov_slice), weights)[0, 0]
    print(f"\n    Cluster variance for {cluster_items}: {cluster_var}\n")
    return cluster_var

Recursive Bisection

Show code

def recursive_bisection(cov: pd.DataFrame, ordered_tickers: Union[pd.Index, List[str]]) -> pd.Series:
    """
    Compute the Hierarchical Risk Parity (HRP) portfolio allocation using recursive bisection.

    Parameters
    ----------
    cov : pd.DataFrame
        Covariance matrix for the assets.
    ordered_tickers : Union[pd.Index, List[str]]
        List of tickers ordered by hierarchical clustering.

    Returns
    -------
    pd.Series
        Portfolio weights for each asset.
    """
    # Initialize equal weights for all assets
    weights = pd.Series(1.0, index=ordered_tickers)
    hspace = "    "  # Define the left space (4 spaces in this case)
    formatted_weights = str(weights).replace("\n", "\n" + hspace)
    print(f"Initial weights:\n{hspace}{formatted_weights}\n")
    # Initialize the cluster as the full set of ordered tickers
    cluster_items = [ordered_tickers]
    
    step = 0
    while len(cluster_items) > 0:
        step += 1
        print(f"\n{'='*20} Step {step}: Recursive Bisection {'='*20}\n")
        print(f"Cluster items at Step {step}: {[list(cluster) for cluster in cluster_items]}")
        # Split each cluster into two halves only if the cluster contains more than one asset
        cluster_items = [
            i[j:k]
            for i in cluster_items
            for j, k in ((0, len(i) // 2), (len(i) // 2, len(i)))
            if len(i) > 1
        ]
        # For each pair of clusters, calculate variance and adjust weights
        for i in range(0, len(cluster_items), 2):
            first_cluster = cluster_items[i]
            second_cluster = cluster_items[i + 1]
            print(f"\n    ---- Processing clusters ----")
            print(f"    First cluster: {first_cluster}")
            print(f"    Second cluster: {second_cluster}")
            # Compute variance for each cluster
            first_var = get_cluster_var(cov, list(first_cluster))
            second_var = get_cluster_var(cov, list(second_cluster))
            # Compute the weighting factor alpha
            alpha = 1 - first_var / (first_var + second_var)
            print(f"    Weighting factor alpha for {list(first_cluster)}: {alpha}")
            print(f"    Weighting factor 1 - alpha for {list(second_cluster)}: {1 - alpha}")
            # Adjust weights for each cluster
            weights[first_cluster] *= alpha
            weights[second_cluster] *= 1 - alpha
            formatted_weights = str(weights).replace("\n", "\n" + hspace)
            print(f"    Updated weights after adjustment:\n{hspace}{formatted_weights}\n")
    return weights

Portfolio Allocation

Applying the recursive bisection algorithm to the covariance matrix:

Show code

hrp_weights = recursive_bisection(cov=shrinkage_covariance_matrix, ordered_tickers=ordered_tickers)

Initial weights:
    Ticker
    IYW    1.0
    VGT    1.0
    IYF    1.0
    IYR    1.0
    IYK    1.0
    XLU    1.0
    dtype: float64


==================== Step 1: Recursive Bisection ====================

Cluster items at Step 1: [['IYW', 'VGT', 'IYF', 'IYR', 'IYK', 'XLU']]

    ---- Processing clusters ----
    First cluster: Index(['IYW', 'VGT', 'IYF'], dtype='object', name='Ticker')
    Second cluster: Index(['IYR', 'IYK', 'XLU'], dtype='object', name='Ticker')

    Covariance matrix slice for cluster ['IYW', 'VGT', 'IYF']:
    Ticker       IYW       VGT       IYF
    Ticker                              
    IYW     0.000329  0.000313  0.000187
    VGT     0.000313  0.000313  0.000193
    IYF     0.000187  0.000193  0.000256

    IVP weights for cluster: [0.2999099  0.31479628 0.38529382]

    Cluster variance for ['IYW', 'VGT', 'IYF']: 0.00024779547512095265

    Covariance matrix slice for cluster ['IYR', 'IYK', 'XLU']:
    Ticker       IYR       IYK       XLU
    Ticker                              
    IYR     0.000253  0.000132  0.000175
    IYK     0.000132  0.000137  0.000121
    XLU     0.000175  0.000121  0.000219

    IVP weights for cluster: [0.25009541 0.46103057 0.28887402]

    Cluster variance for ['IYR', 'IYK', 'XLU']: 0.00015126791848830677

    Weighting factor alpha for ['IYW', 'VGT', 'IYF']: 0.3790573650972854
    Weighting factor 1 - alpha for ['IYR', 'IYK', 'XLU']: 0.6209426349027146
    Updated weights after adjustment:
    Ticker
    IYW    0.379057
    VGT    0.379057
    IYF    0.379057
    IYR    0.620943
    IYK    0.620943
    XLU    0.620943
    dtype: float64


==================== Step 2: Recursive Bisection ====================

Cluster items at Step 2: [['IYW', 'VGT', 'IYF'], ['IYR', 'IYK', 'XLU']]

    ---- Processing clusters ----
    First cluster: Index(['IYW'], dtype='object', name='Ticker')
    Second cluster: Index(['VGT', 'IYF'], dtype='object', name='Ticker')

    Covariance matrix slice for cluster ['IYW']:
    Ticker       IYW
    Ticker          
    IYW     0.000329

    IVP weights for cluster: [1.]

    Cluster variance for ['IYW']: 0.00032875329729967353

    Covariance matrix slice for cluster ['VGT', 'IYF']:
    Ticker       VGT       IYF
    Ticker                    
    VGT     0.000313  0.000193
    IYF     0.000193  0.000256

    IVP weights for cluster: [0.44965109 0.55034891]

    Cluster variance for ['VGT', 'IYF']: 0.00023624836620976213

    Weighting factor alpha for ['IYW']: 0.41813747014891167
    Weighting factor 1 - alpha for ['VGT', 'IYF']: 0.5818625298510883
    Updated weights after adjustment:
    Ticker
    IYW    0.158498
    VGT    0.220559
    IYF    0.220559
    IYR    0.620943
    IYK    0.620943
    XLU    0.620943
    dtype: float64

    ---- Processing clusters ----
    First cluster: Index(['IYR'], dtype='object', name='Ticker')
    Second cluster: Index(['IYK', 'XLU'], dtype='object', name='Ticker')

    Covariance matrix slice for cluster ['IYR']:
    Ticker       IYR
    Ticker          
    IYR     0.000253

    IVP weights for cluster: [1.]

    Cluster variance for ['IYR']: 0.0002526027501514032

    Covariance matrix slice for cluster ['IYK', 'XLU']:
    Ticker       IYK       XLU
    Ticker                    
    IYK     0.000137  0.000121
    XLU     0.000121  0.000219

    IVP weights for cluster: [0.61478563 0.38521437]

    Cluster variance for ['IYK', 'XLU']: 0.0001417154399539511

    Weighting factor alpha for ['IYR']: 0.3593936153847923
    Weighting factor 1 - alpha for ['IYK', 'XLU']: 0.6406063846152077
    Updated weights after adjustment:
    Ticker
    IYW    0.158498
    VGT    0.220559
    IYF    0.220559
    IYR    0.223163
    IYK    0.397780
    XLU    0.397780
    dtype: float64


==================== Step 3: Recursive Bisection ====================

Cluster items at Step 3: [['IYW'], ['VGT', 'IYF'], ['IYR'], ['IYK', 'XLU']]

    ---- Processing clusters ----
    First cluster: Index(['VGT'], dtype='object', name='Ticker')
    Second cluster: Index(['IYF'], dtype='object', name='Ticker')

    Covariance matrix slice for cluster ['VGT']:
    Ticker       VGT
    Ticker          
    VGT     0.000313

    IVP weights for cluster: [1.]

    Cluster variance for ['VGT']: 0.0003132069093796539

    Covariance matrix slice for cluster ['IYF']:
    Ticker       IYF
    Ticker          
    IYF     0.000256

    IVP weights for cluster: [1.]

    Cluster variance for ['IYF']: 0.000255899169752777

    Weighting factor alpha for ['VGT']: 0.44965109166094375
    Weighting factor 1 - alpha for ['IYF']: 0.5503489083390563
    Updated weights after adjustment:
    Ticker
    IYW    0.158498
    VGT    0.099175
    IYF    0.121385
    IYR    0.223163
    IYK    0.397780
    XLU    0.397780
    dtype: float64

    ---- Processing clusters ----
    First cluster: Index(['IYK'], dtype='object', name='Ticker')
    Second cluster: Index(['XLU'], dtype='object', name='Ticker')

    Covariance matrix slice for cluster ['IYK']:
    Ticker       IYK
    Ticker          
    IYK     0.000137

    IVP weights for cluster: [1.]

    Cluster variance for ['IYK']: 0.00013702949781692518

    Covariance matrix slice for cluster ['XLU']:
    Ticker       XLU
    Ticker          
    XLU     0.000219

    IVP weights for cluster: [1.]

    Cluster variance for ['XLU']: 0.00021869320932602028

    Weighting factor alpha for ['IYK']: 0.6147856319954843
    Weighting factor 1 - alpha for ['XLU']: 0.3852143680045157
    Updated weights after adjustment:
    Ticker
    IYW    0.158498
    VGT    0.099175
    IYF    0.121385
    IYR    0.223163
    IYK    0.244549
    XLU    0.153231
    dtype: float64


==================== Step 4: Recursive Bisection ====================

Cluster items at Step 4: [['VGT'], ['IYF'], ['IYK'], ['XLU']]

Show code

hrp_weights.sort_index()

Ticker
IYF    0.121385
IYK    0.244549
IYR    0.223163
IYW    0.158498
VGT    0.099175
XLU    0.153231
dtype: float64

Verify Results with `PyportfolioOpt`

The PyportfolioOpt package provides a convenient implementation of the HRP algorithm. We can compare the results obtained from our manual implementation with the package’s implementation to verify the correctness of our approach.

Show code

hrp = ppo.hierarchical_portfolio.HRPOpt(
  cov_matrix=shrinkage_covariance_matrix,
)

hrp.optimize(linkage_method="ward")

OrderedDict([('IYF', 0.12138455754895279), ('IYK', 0.2445493158195804), ('IYR', 0.22316281850424574), ('IYW', 0.1584980876830913), ('VGT', 0.0991747198652413), ('XLU', 0.15323050057888848)])

Evaluation Strategy

To assess the out-of-sample performance of the HRP optimal allocation strategy, a backtest can be employed. Backtesting is a key technique for evaluating an investment strategy’s performance using historical data. It allows for an assessment of how reliably the strategy could be replicated in real-world scenarios, while also providing valuable insights into its associated risk-return profile. To this end, Joubert et al., 2024 outlines three primary backtesting methods:

Backtesting Method	Description	Advantages	Limitations
Walk-Forward Backtest	Historical data is split into multiple segments. The strategy is trained on one segment and tested on the next.	Easy to implement.	Tests only a single path, risking overfitting. Past performance may not predict future results (e.g., Nvidia’s growth since COVID).
Resampling	A statistical technique that creates multiple samples from historical data to assess performance. This approach yields a distribution of performance metrics, including: - Bootstrap: Draws samples with replacement from historical data, enabling performance estimation over many iterations. - Cross-Validation: Splits the data into multiple training and testing sets, providing multiple assessments of the strategy’s performance.	Provides more robust analysis by generating multiple scenarios.	Does not create new data, but resamples existing data, which may miss unique future events.
Monte Carlo Simulation	Generates synthetic data with properties similar to historical data using an understanding of the data generation process.	Yields performance estimates indicative of future outcomes (assuming correct and stable data processes).	Requires accurate modeling of the data generation process, which is difficult for volatile financial markets.

For evaluating the HRP portfolio allocation strategy and optimizing the hierarchical clustering linkage method, we will use time-series bootstraps tools from the arch package.

Time-Series Bootstrapping

The arch.bootstrap.StationaryBootstrap class implements the method developed by Politis and Romano (1994). This method addresses the limitations of traditional bootstrapping techniques when applied to time-series data, aiming to preserve crucial time-dependent structures of the data while generating resampled datasets.

Let $X_{1}, X_{2}, \dots, X_{T}$ represent the original time series. To generate a resampled time series using block bootstrapping, define a block $B_{i, b}$ as consisting of $b$ consecutive observations starting at $X_{i}$ :

$\begin{array}{r} B_{i, b} = {X_{i}, X_{i + 1}, \dots, X_{i + b - 1}} \end{array}$

Periodic Boundry Conditions

If $i + b - 1 > T$ (i.e., the block extends beyond the end of the time series), periodic boundary conditions are applied. This means that the series wraps around, treating the first observations in the series as if they come after the last. In other words, for any index $j > T$ , the observation is taken as:

$\begin{array}{r} X_{j} = X_{(j \mod T)} for j > T, where X_{0} = X_{T} . \end{array}$

The expression $j \mod T$ calculates the remainder when $j$ is divided by $T$ , effectively cycling the indices back to the beginning of the series. For example, if $T = 10$ , and $X_{12}$ is requested, it would be equivalent to sampling $X_{2}$ .

Block Selection

The starting indices of the blocks are determined by a sequence of independent and identically distributed (iid) random variables $I_{1}, I_{2}, \dots$ , drawn uniformly from ${1, \dots, T}$ . The lengths of the blocks are governed by another sequence of iid random variables $L_{1}, L_{2}, \dots$ , where each $L_{i}$ follows a geometric distribution with parameter $p \in [0, 1]$ , i.e.,

$\begin{array}{r} P (L_{i} = m) = (1 - p)^{m - 1} p, m = 1, 2, \dots . \end{array}$

Resampling

To construct the resampled series $X_{1}^{*}, X_{2}^{*}, \dots, X_{T}^{*}$ :

Select the first block $B_{I_{1}, L_{1}}$ and copy $L_{1}$ consecutive observations starting at $X_{I_{1}}$ into the resampled series.
Select the next block $B_{I_{2}, L_{2}}$ and append $L_{2}$ consecutive observations starting at $X_{I_{2}}$ , continuing this process.
Repeat the block selection and copying processes until a total of $T$ observations have been generated.

It is possible for this procedure to result in overlapping blocks when consecutive starting indices $I_{1}, I_{2}, \dots$ are close together, leading to the reuse of some observations. The process terminates once $T$ observations are resampled, but it can be extended to generate longer series if desired.

Implementation

The following script implements the time-series bootstrapping:

Show code

import warnings
from argparse import ArgumentParser
from datetime import datetime, timedelta
from multiprocessing import Pool, cpu_count
from pathlib import Path
from typing import List

import numpy as np
import pandas as pd
import yfinance as yf
from arch.bootstrap import StationaryBootstrap, optimal_block_length
from pypfopt.hierarchical_portfolio import HRPOpt
from sklearn.covariance import ledoit_wolf

# Define the tickers for the assets in the portfolio
tech = np.array(["IYW", "VGT"])
real_estate = np.array(["IYR"])
bank_finance = np.array(["IYF"])
utilities = np.array(["XLU"])
consumer_staples = np.array(["IYK"])
tickers = np.concatenate(
    [tech, real_estate, bank_finance, utilities, consumer_staples], axis=0
)

# Download the daily adjusted closing prices of the assets
start_date = (datetime.now() - timedelta(days=365 * 5)).strftime("%Y-%m-%d")
daily_prices = yf.download(", ".join(tickers), start=start_date, auto_adjust=False)["Adj Close"]

# Calculate the daily returns of the assets
daily_returns = daily_prices.pct_change().dropna()
frequency = 252
# Typical savings account
annual_risk_free_rate = 0.02
daily_risk_free_rate = (1 + annual_risk_free_rate) ** (1 / frequency) - 1

pd.set_option("mode.copy_on_write", True)


def compute_optimal_block_length(returns: pd.DataFrame) -> float:
    """
    Compute the optimal block length for the stationary bootstrap.

    The optimal block length is computed for the squared returns since the
    autocorrelation in the squares tends to be stronger than in the returns
    themselves. This approach helps capture more of the dependence in the
    returns for the bootstrap.

    Parameters
    ----------
    returns : pd.DataFrame
        Daily returns of the assets.

    Returns
    -------
    float
        The average optimal block length for the stationary bootstrap.
    """
    block_lengths = optimal_block_length(returns**2)
    avg_block_length = block_lengths.loc[:, "stationary"].mean()
    return avg_block_length


def compute_shrinkage_covariance(returns: pd.DataFrame) -> pd.DataFrame:
    """
    Calculate the Ledoit-Wolf shrunk covariance matrix of the daily returns.

    Parameters
    ----------
    returns : pd.DataFrame
        Daily returns of the assets.

    Returns
    -------
    pd.DataFrame
        The shrunk covariance matrix of the daily returns.
    """
    shrinkage_cov = pd.DataFrame(
        ledoit_wolf(returns, assume_centered=False)[0],
        index=returns.columns,
        columns=returns.columns,
    )
    return shrinkage_cov


def compute_performance_metrics(
    returns: pd.DataFrame, linkage_method: str
) -> np.ndarray:
    """
    Calculate the expected return, volatility, and Sharpe ratio of the portfolio.

    Parameters
    ----------
    returns : pd.DataFrame
        Daily returns of the assets.
    linkage_method : str
        The method used for hierarchical clustering.
        Possible values: 'centroid', 'weighted', 'complete', 'average', 'ward'.

    Returns
    -------
    np.ndarray
        An array containing the annualized expected return, volatility, and Sharpe ratio.
    """
    shrinkage_cov = compute_shrinkage_covariance(returns)
    hrp = HRPOpt(returns=returns, cov_matrix=shrinkage_cov)

    with warnings.catch_warnings():
        warnings.simplefilter("ignore", category=FutureWarning)
        hrp.optimize(linkage_method=linkage_method)

    expected_return, volatility, sharpe_ratio = hrp.portfolio_performance(
        risk_free_rate=daily_risk_free_rate,
        frequency=frequency,
    )
    return np.array([expected_return, volatility, sharpe_ratio])


def bootstrap(seed_value: int, linkage_method: str) -> pd.DataFrame:
    """
    Perform stationary bootstrapping to compute portfolio performance metrics.

    Parameters
    ----------
    seed_value : int
        Seed for the random number generator to ensure reproducibility.
    linkage_method : str
        The method used for hierarchical clustering.
        Possible values: 'centroid', 'weighted', 'complete', 'average', 'ward'.

    Returns
    -------
    pd.DataFrame
        A DataFrame containing the annualized expected return, volatility, Sharpe ratio,
        and the linkage method used for clustering.
    """
    rs = np.random.default_rng(seed_value)
    avg_block_length = compute_optimal_block_length(daily_returns)
    bs = StationaryBootstrap(
        block_size=avg_block_length,
        returns=daily_returns,
        seed=rs,
    )
    results_array = bs.apply(
        func=compute_performance_metrics,
        reps=5000,
        extra_kwargs={"linkage_method": linkage_method},
    )

    performance_metrics = pd.DataFrame(
        results_array,
        columns=[
            "annualized_expected_return",
            "annualized_volatility",
            "annualized_sharpe_ratio",
        ],
    )
    performance_metrics["linkage_method"] = linkage_method

    return performance_metrics


def main() -> int:
    """
    Main function to run stationary bootstrapping on portfolio returns
    using hierarchical clustering with various linkage methods.

    This function parses command line arguments to specify the output directory,
    performs bootstrapping using parallel processing, and saves the results.

    Returns
    -------
    int
        Returns 0 upon successful completion.
    """
    parser = ArgumentParser()
    parser.add_argument("--output_dir", type=str, default=Path.cwd())
    args, _ = parser.parse_known_args()

    linkage_methods = ["centroid", "weighted", "complete", "average", "ward"]
    n_cores = min(cpu_count() - 1, len(linkage_methods))
    # Generate unique seeds for each process
    seeds = np.random.randint(0, 1e6, size=len(linkage_methods))

    with Pool(n_cores) as pool:
        # Run bootstrap for each linkage method with a unique seed
        simulated_performance: List[pd.DataFrame] = pool.starmap(
            bootstrap, zip(seeds, linkage_methods)
        )

    final_results = pd.concat(simulated_performance, ignore_index=True)
    final_results.to_csv(
        Path(args.output_dir) / "bootstrapped_performance_metrics.csv", index=False
    )
    return 0

if __name__ == "__main__":
  
    main()

Data Collection:
- Downloads daily adjusted closing prices for the specified ETFs
- Computes the daily returns time series.
Risk-Free Rate: Converts a $2 %$ annual risk-free rate to a daily rate to match daily returns frequency, which is typically $252$ for trading days.

$\begin{array}{r} daily risk-free rate = (1 + annual risk-free rate)^{\frac{1}{frequency}} - 1 \end{array}$
Block Length Calculation: Computes optimal block length for StationaryBootstrap using squared returns to capture dependencies. See docstring for details.
Ledoit-Wolf Covariance: Estimates the covariance matrix using the Ledoit-Wolf shrinkage method.
Bootstrapping: Runs StationaryBootstrap with 5000 replications for each linkage method to assess portfolio performance.
Performance Metrics: Computes annualized expected return, volatility, and Sharpe ratio from bootstrapped samples.

The script uses parallel processing to run bootstrapping for each linkage method with a unique seed in an separate process. The results are collected and saved to a CSV file for further analysis.

$ python3 bootstrap_backtest.py --output_dir /path/to/output/bootstrapped_performance_metrics.csv

Analysis

The bootstrapped performance metrics are stored in long format, with the linkage_method column indicating the hierarchical clustering method used for each set of bootstrapped samples. Each bootstrapped sample contains the annualized expected return, volatility, and Sharpe ratio, computed over a resampled set of all $T$ observations from the original time series.

Show code

performance_metrics = pd.read_csv("hrp/bootstrapped_performance_metrics.csv")

annualized_expected_return	annualized_volatility	annualized_sharpe_ratio	linkage_method
0.0853552	0.2050164	0.4159504	centroid
0.2191202	0.1490413	1.4696702	centroid
0.1851887	0.1354929	1.3661976	centroid
0.1208231	0.2248555	0.5369872	centroid
0.1912013	0.2415448	0.7912517	centroid

Bootstrapped Distributions

Show code

def plot_performance_metrics(performance_metrics: pd.DataFrame, metric: str) -> None:
    """
    Plot the performance metrics of the Hierarchical Risk Parity (HRP) portfolio.

    Parameters
    ----------
    performance_metrics : pd.DataFrame
        Performance metrics of the HRP portfolio.
    metric : str
        The performance metric to plot.
    """
    metric_title = metric.replace("_", " ").title()
    median_values = performance_metrics.groupby("linkage_method")[metric].median().reset_index()
    
    fig = px.violin(
        performance_metrics,
        x="linkage_method",
        y=metric,
        color="linkage_method",
        title=f"Bootstrapped {metric_title}s of Hierarchical Risk Parity Portfolios",
        labels={"metric": "Performance Metric", metric: "Value"},
    );
    for i, row in median_values.iterrows():
        value = row[metric]
        if metric in ["annualized_expected_return", "annualized_volatility"]:
            value = value * 100
            formated_value = f"{value:.4f}%"
        else:
            formated_value = f"{value:.4f}"
        fig.add_annotation(
            x=row['linkage_method'],
            y=row[metric],
            text=f"Median:<br>{formated_value}",
            showarrow=True,
            arrowhead=2,
            ax=0,
            ay=-40  # Offset to place the label above the plot
        );
    fig.update_layout(
      xaxis_title="Linkage Method", 
      yaxis_title=metric_title,
      legend_title="Linkage Method",
      legend=dict(
          orientation="h",  # Horizontal legend
          yanchor="bottom",  # Align the legend to the bottom of its position
          y=1.02,            # Place it slightly above the plot
          xanchor="center",   # Center the legend
          x=0.5               # Place it at the middle horizontally
      ),
      autosize=True,
    );
    fig.show()

Annualized Expected Returns

Show code

plot_performance_metrics(performance_metrics, "annualized_expected_return")

Annualized Volatility

Show code

plot_performance_metrics(performance_metrics, "annualized_volatility")

Annualized Sharpe Ratios

Show code

plot_performance_metrics(performance_metrics, "annualized_sharpe_ratio", True)

Bootstrap Confidence Intervals

From the bootstrapped distributions, we can calculate the confidence intervals for the performance metrics using the quantiles directly from the bootstrapped samples.

Show code

def bootstrapped_confidence_intervals(performance_metrics: pd.DataFrame, metric: str, alpha: float = 0.05) -> pd.DataFrame:
    """
    Calculate the (1 - alpha) confidence intervals for the performance metrics.

    Parameters
    ----------
    performance_metrics : pd.DataFrame
        Performance metrics of the HRP portfolio.
    metric : str
        The performance metric for which to calculate the confidence intervals.
    alpha : float, default 0.05
        The significance level for the confidence interval.

    Returns
    -------
    pd.DataFrame
        A DataFrame containing the (1 - alpha) confidence intervals for each performance metric.
    """
    lower_percentile = 100 * (alpha / 2)
    upper_percentile = 100 * (1 - (alpha / 2))

    def compute_ci(series: np.ndarray, is_lower: bool) -> float:
        percentile = lower_percentile if is_lower else upper_percentile
        bound = np.percentile(series, percentile)
        if metric in ["annualized_expected_return", "annualized_volatility"]:
            return bound * 100
        return bound

    confidence_intervals = performance_metrics.groupby("linkage_method")[metric].agg(
        lower_bound=lambda x: compute_ci(x, True),
        upper_bound=lambda x: compute_ci(x, False),
    )

    return confidence_intervals

Annualized Expected Returns

Show code

annualized_expected_return_ci = bootstrapped_confidence_intervals(performance_metrics, "annualized_expected_return", 0.05)

	lower_bound	upper_bound
average	0.5670151	27.02969
centroid	0.2235914	26.72821
complete	-0.3654613	27.02249
ward	-0.7725311	26.41289
weighted	-0.5187694	26.98316

Annualized Volatility

Show code

annualized_volatility_ci = bootstrapped_confidence_intervals(performance_metrics, "annualized_volatility", 0.05)

	lower_bound	upper_bound
average	13.03016	27.67410
centroid	13.09311	27.83502
complete	13.06632	28.07124
ward	13.33118	28.28109
weighted	13.23395	27.70838

Annualized Sharpe Ratios

Show code

annualized_sharpe_ratio_ci = bootstrapped_confidence_intervals(performance_metrics, "annualized_sharpe_ratio", 0.05)

	lower_bound	upper_bound
average	0.0257630	1.685667
centroid	0.0105815	1.656226
complete	-0.0153102	1.642556
ward	-0.0353160	1.614624
weighted	-0.0210381	1.639605

Projected Growth

Show code

def plot_investment_growth_with_compounding(performance_metrics: pd.DataFrame, initial_investment: float, years: int) -> None:
    """
    Plot the projected investment growth over a holding period by linkage method, using compounding with median annualized expected returns.

    Parameters
    ----------
    performance_metrics : pd.DataFrame
        Performance metrics of the HRP portfolio.
    initial_investment : float
        The initial amount of the investment.
    years : int
        The holding period in years.
    """
    median_values = performance_metrics.groupby("linkage_method")["annualized_expected_return"].median().reset_index()
    growth_data = pd.DataFrame({'year': range(1, years + 1)})
    # Calculate the investment growth for each linkage method for each year, incorporating compounding
    for _, row in median_values.iterrows():
        growth_data[row['linkage_method']] = initial_investment * ((1 + row['annualized_expected_return']) ** growth_data['year'])
    # Reshape from wide to long format for plotting
    growth_data_melted = growth_data.melt(id_vars=['year'], var_name='linkage_method', value_name='investment_value')
    fig = px.line(
        growth_data_melted,
        x='year',
        y='investment_value',
        color='linkage_method',
        title=f'Projected Growth Over {years} Years (Initial Investment: ${initial_investment:,.2f})',
        labels={'investment_value': 'Investment Value ($)', 'year': 'Year'}
    );
    fig.update_layout(
        xaxis_title="Year",
        yaxis_title="Investment Value ($)",
        legend_title="Linkage Method",
        autosize=True,
        legend=dict(
            orientation="h",
            yanchor="bottom",
            y=1.02,
            xanchor="center",
            x=0.5
        )
    );
    fig.show()

Plot the projected investment growth over a $5$ -year period with an initial investment of $$ 10, 000$ :

Show code

plot_investment_growth_with_compounding(performance_metrics, initial_investment=10000, years=5)

Risk-Return Profile

Show code

def plot_risk_return_profile(performance_metrics: pd.DataFrame) -> None:
    """
    Plot the risk-return profile (annualized expected return vs. annualized volatility) for each linkage method.

    Parameters
    ----------
    performance_metrics : pd.DataFrame
        Performance metrics of the HRP portfolio.
    """
    linkage_methods = performance_metrics['linkage_method'].unique()
    n_cols = 2  # Number of columns in the grid
    n_rows = (len(linkage_methods) + 1) // n_cols

    fig = make_subplots(
        rows=n_rows, cols=n_cols,
        subplot_titles=linkage_methods,
        shared_xaxes=False, shared_yaxes=True,
        x_title="Annualized Volatility (Risk)", y_title="Annualized Expected Return"
    )
    
    # Find the min and max values of the Sharpe ratio for the color scale
    sharpe_min = performance_metrics['annualized_sharpe_ratio'].min()
    sharpe_max = performance_metrics['annualized_sharpe_ratio'].max()

    for i, method in enumerate(linkage_methods):
        row = i // n_cols + 1
        col = i % n_cols + 1
        filtered_data = performance_metrics[performance_metrics['linkage_method'] == method]
        
        # Add scatter plot trace with shared color scale
        fig.add_trace(
            go.Scatter(
                x=filtered_data['annualized_volatility'],
                y=filtered_data['annualized_expected_return'],
                mode='markers',
                name=method.capitalize(),
                hovertext=filtered_data['annualized_sharpe_ratio'],
                marker=dict(
                    size=8,
                    color=filtered_data['annualized_sharpe_ratio'],  # Color based on Sharpe ratio
                    cmin=sharpe_min,  # Set the same min for all subplots
                    cmax=sharpe_max,  # Set the same max for all subplots
                    colorscale="Viridis",
                    showscale=(i == 0),  # Only show the scale on the first subplot
                    colorbar=dict(title="Sharpe Ratio") if i == 0 else None,  # Add color bar on the first subplot
                ),
            ),
            row=row, col=col
        )
    fig.update_layout(
        title="Risk-Return Profile by Linkage Method",
        showlegend=False,
        autosize=True,
        height=1000,
    )
    fig.show()

Show code

plot_risk_return_profile(performance_metrics)

Optimal Linkage Method

We will select the linkage method that yields the highest median annualized Sharpe ratio from the bootstrapped distribution.

Show code

optimal_linkage_method = performance_metrics.groupby("linkage_method")["annualized_sharpe_ratio"].median().idxmax()

hrp = ppo.hierarchical_portfolio.HRPOpt(returns=daily_returns, cov_matrix=shrinkage_covariance_matrix)
with warnings.catch_warnings():
    warnings.simplefilter("ignore", category=FutureWarning)
    hrp.optimize(linkage_method=optimal_linkage_method)

OrderedDict([('IYF', 0.10360039391039054), ('IYK', 0.3546805641591898), ('IYR', 0.10498064624201552), ('IYW', 0.15931945337285217), ('VGT', 0.11357280483322588), ('XLU', 0.16384613748232604)])

Allocation Strategy

The optimal portfolio weights derived from the HRP algorithm can now be used to construct the final asset allocation. The pypfopt.discrete_allocation.DiscreteAllocation class provides several methods to convert continuous weights into discrete share quantities. One such method treats this conversion as an integer programming problem formulated as follows:

$\begin{aligned} \underset{x \in Z^{n}}{minimise} & r + ∥ w T - x ⊙ p ∥_{1} \\ subject to & r + x \cdot p = T \end{aligned}$

Where:

$T \in R$ is the total dollar amount to be allocated.
$p \in R^{n}$ is the vector of the latest prices for each asset.
$w \in R^{n}$ is the vector of target portfolio weights for each asset.
$x \in Z^{n}$ is the integer allocation, representing the number of units of each asset to be purchased.
$r \in R$ is the remaining unallocated dollar amount, defined as $r = T - x \cdot p$ .
$⊙$ denotes element-wise (Hadamard) multiplication.
$∥ \cdot ∥_{1}$ represents the $L_{1}$ norm, i.e., the sum of the absolute values of the elements in the vector.

The goal is to minimize the remaining unallocated value $r$ while also minimizing the difference between the target dollar allocation, $w T$ , and the actual allocation, $x ⊙ p$ .

Show code

total_portfolio_value = 10000

discrete_allocation = ppo.discrete_allocation.DiscreteAllocation(
    weights=hrp.clean_weights(),
    latest_prices=daily_prices.iloc[-1, :],
    total_portfolio_value=total_portfolio_value,
)

allocations, fund_remaning = discrete_allocation.lp_portfolio(verbose=True, solver=cp.SCIPY)

Funds remaining: 0.07
IYF: allocated 0.095, desired 0.104
IYK: allocated 0.360, desired 0.355
IYR: allocated 0.106, desired 0.105
IYW: allocated 0.150, desired 0.159
VGT: allocated 0.129, desired 0.114
XLU: allocated 0.160, desired 0.164
Allocation has RMSE: 0.008

The number of shares to purchase for each asset is stored in the allocations dictionary, while the remaining unallocated funds are stored in the fund_remaining variable.

Show code

allocations = pd.DataFrame.from_dict(allocations, orient="index", columns=["Shares"])
allocations

     Shares
IYF       8
IYK      53
IYR      11
IYW       9
VGT       2
XLU      20

To see a comprehensive list of available solvers, refer to the CVXPY solver documentation.

References

De Prado, M. L. (2016). Building diversified portfolios that outperform out-of-sample. SSRN Electronic Journal. https://papers.ssrn.com/sol3/papers.cfm?abstract_id=2708678
Ledoit, O., & Wolf, M. (2003). Honey, I shrunk the sample covariance matrix. UPF Economics and Business Working Paper No. 691. https://doi.org/10.2139/ssrn.4897573
Politis, D. N., & Romano, J. P. (1994). The stationary bootstrap. Journal of the American Statistical Association, 89(428), 1303-1313. https://doi.org/10.1080/01621459.1994.10476870
Joubert, J., Sestovic, D., Barziy, I., Distaso, W., & Lopez de Prado, M. (2024). The three types of backtests. SSRN Electronic Journal. https://doi.org/10.2139/ssrn.4897573
De Prado, M. L. (2018). Advances in Financial Machine Learning. John Wiley & Sons. https://www.wiley.com/en-fr/Advances+in+Financial+Machine+Learning-p-9781119482086

Tools From The Python Ecosystem

All-ETF Portfolio

ETF Selection

Technology ETFs

Financial Services and Banking ETFs

Real Estate ETFs

Utilities ETFs

Consumer Staples ETFs

Daily Price Time Series

Time Series Plots

Daily Returns

Annualized Mean Returns

Shrinkage Covariance Matrix

Shrinkage Target

Population and Sample Correlations

Average Population and Sample Correlations

Constant Correlation Matrix

Computing the Shrinkage Correlation Matrix

Hierarchical Risk Parity (HRP)

Clustering

Distance Matrix

Hierarchical Clustering

Linkage Matrix Structure

Quasi-Diagonalization

Recursive Bisection

Inverse Variance Portflio

Cluster Variance

Recursive Bisection

Portfolio Allocation

Verify Results with PyportfolioOpt

Evaluation Strategy

Time-Series Bootstrapping

Periodic Boundry Conditions

Block Selection

Resampling

Implementation

Analysis

Bootstrapped Distributions

Annualized Expected Returns

Annualized Volatility

Annualized Sharpe Ratios

Bootstrap Confidence Intervals

Annualized Expected Returns

Annualized Volatility

Annualized Sharpe Ratios

Projected Growth

Risk-Return Profile

Optimal Linkage Method

Allocation Strategy

References

Verify Results with `PyportfolioOpt`