Show code
from datetime import datetime, timedelta
import numpy as np
import pandas as pd
import yfinance as yf
import scipy.optimize as sco
import scipy.interpolate as sci
import matplotlib.pyplot as plt
Yang Wu
September 15, 2021
In a previous post, we covered portfolio optimization and its implementations in R. In this post, We will tackle the problem of portfolio optimization using Python, which offers some elegant implementations. Much of the structure of the post is gleaned from Yves Hilpisch’s awesome book Python for Finance. Our analysis essentially boils down to the following tasks:
Import financial data via API’s
Compute returns and statistics
Simulate a random set of portfolios
Construct a set of optimal portfolios using Markowitz’s mean-variance framework
Visualize the set of optimal portfolios using the R plotly library (python’s plotly
library is also a great alternative)
We need the following Python libraries and packages:
The following R packages are required if we wish to visualize the results in R:
For our sample, we select 12 different equity assets with exposure to all of the GICS sectors— energy, materials, industrials, utilities, healthcare, financials, consumer discretionary, consumer staples, information technology, communication services, and real estate. The sampling period will cover the last 10 years, starting from today’s date (2024-10-24), ensuring comprehensive historical data for analysis.
[ 0% ]
[******** 17% ] 2 of 12 completed
[************ 25% ] 3 of 12 completed
[**************** 33% ] 4 of 12 completed
[******************** 42% ] 5 of 12 completed
[**********************50% ] 6 of 12 completed
[**********************58%*** ] 7 of 12 completed
[**********************67%******* ] 8 of 12 completed
[**********************75%*********** ] 9 of 12 completed
[**********************83%*************** ] 10 of 12 completed
[**********************92%******************* ] 11 of 12 completed
[*********************100%***********************] 12 of 12 completed
dates | XOM | SHW | JPM | AEP | UNH | AMZN | KO | BA | AMT | DD | TSN | SLG |
---|---|---|---|---|---|---|---|---|---|---|---|---|
2014-10-26 | 39.77093 | 76.78194 | 14.4985 | 106.1893 | 51.96864 | 44.48439 | 29.67780 | 68.44363 | 68.80859 | 31.38823 | 79.13648 | 60.67173 |
2014-10-27 | 39.65180 | 76.98161 | 14.7795 | 107.4849 | 53.63420 | 45.23541 | 29.53218 | 67.93410 | 69.19922 | 31.84126 | 79.57455 | 61.57168 |
2014-10-28 | 39.77093 | 75.90353 | 14.7060 | 107.0241 | 52.96350 | 44.97749 | 29.82342 | 67.28209 | 69.11986 | 31.51766 | 79.84943 | 61.24149 |
2014-10-29 | 40.82895 | 78.97800 | 14.9535 | 107.6675 | 53.93604 | 45.05335 | 30.14379 | 68.28593 | 70.15750 | 32.05157 | 80.63968 | 61.15085 |
2014-10-30 | 40.87800 | 77.86003 | 15.2730 | 108.6154 | 55.22155 | 45.88022 | 30.49328 | 69.62034 | 70.62142 | 32.64213 | 81.61036 | 62.61404 |
Next, we find the simple daily returns for each of the 12 assets using the pct_change()
method, since our data object is a Pandas DataFrame
. We use simple returns since they have the property of being asset-additive, which is necessary since we need to compute portfolios returns:
dates | XOM | SHW | JPM | AEP | UNH | AMZN | KO | BA | AMT | DD | TSN | SLG |
---|---|---|---|---|---|---|---|---|---|---|---|---|
2024-10-17 | -0.0021802 | 0.0095467 | 0.0077855 | -0.0019960 | -0.0017598 | 0.0042331 | 0.0077253 | -0.0018505 | 0.0481703 | 0.0036795 | 0.0063247 | -0.0028251 |
2024-10-20 | -0.0052637 | -0.0213434 | 0.0004233 | 0.0310968 | -0.0025854 | -0.0105160 | -0.0140546 | -0.0170203 | -0.0250094 | -0.0138310 | 0.0032654 | 0.0005833 |
2024-10-21 | -0.0026957 | -0.0000451 | 0.0033320 | 0.0003754 | 0.0002356 | 0.0050224 | 0.0000000 | -0.0533595 | -0.0016928 | -0.0049003 | -0.0028173 | 0.0051632 |
2024-10-22 | 0.0094104 | 0.0217195 | -0.0263046 | -0.0176383 | -0.0058900 | -0.0031679 | -0.0207343 | 0.0037080 | -0.0142167 | -0.0001698 | -0.0091601 | -0.0035626 |
2024-10-23 | -0.0142815 | -0.0015027 | 0.0090412 | -0.0118426 | -0.0018959 | 0.0070274 | -0.0104396 | -0.0031154 | 0.0250066 | 0.0025475 | -0.0067831 | -0.0056539 |
The simple daily returns may be visualized using line charts, density plots, and histograms, which are covered in my other post on visualizing asset data. Even though the visualizations in that post use the ggplot2
package in R, the plotnine
package, or any other Python graphics librarires, can be employed to produce them in Python. For now, let us annualize the daily returns over the 10-year period ending in 2024-10-24. We assume the number of trading days in a year is computed as follows:
\[\begin{align*} 365.25 \text{(days on average per year)} \times \frac{5}{7} \text{(proportion work days per week)} \\ - 6 \text{(weekday holidays)} - 3\times\frac{5}{7} \text{(fixed date holidays)} = 252.75 \approx 253 \end{align*}\]
annualized_simple_daily_returns = daily_returns.mean() * 253
fig, ax = plt.subplots(figsize=(10, 6))
# Plot the annualized simple daily returns
annualized_simple_daily_returns.plot(kind='bar', ax=ax);
ax.set_title('Annualized Simple Daily Returns');
ax.set_ylabel('Annualized Simple Daily Returns');
plt.show()
As can be seen, there are significant differences in the annualized performances between these assets. The annualized variance-covariance matrix of the returns can be computed using built-in pandas
method cov
. Note that while the sample covariance matrix is an unbiased estimator of the population covariance matrix, it can be subject to significant estimation error, especially when the number of assets is large relative to the number of observations. In our case, because of the relatively long sample period, the estimation error is likely to be small:
annualized_sample_covariance = daily_returns.cov() * 253
fig, ax = plt.subplots();
cax = ax.matshow(annualized_sample_covariance, cmap="coolwarm");
ax.xaxis.set_ticks_position('bottom');
ax.set_xticks(np.arange(len(annualized_sample_covariance.columns)));
ax.set_yticks(np.arange(len(annualized_sample_covariance.columns)));
ax.set_xticklabels(annualized_sample_covariance.columns);
ax.set_yticklabels(annualized_sample_covariance.columns);
for i in range(annualized_sample_covariance.shape[0]):
for j in range(annualized_sample_covariance.shape[1]):
# Ensure text can be easily seen
value = annualized_sample_covariance.iloc[i, j]
color = "white" if value < 0.5 else "black"
ax.text(j, i, f"{value:.2f}", ha="center", va="center", color=color);
plt.title("Annualized Sample Covariance Matrix of Returns Series");
plt.tight_layout();
plt.show()
The variance-covariance matrix of the returns will be needed to compute the variance of the portfolio returns.
The portfolio optimization problem, therefore, given a universe of assets and their characteristics, deals with a method to spread the capital between them in a way that maximizes the return of the portfolio per unit of risk taken. There is no unique solution for this problem, but a set of solutions, which together define what is called an efficient frontier— the portfolios whose returns cannot be improved without increasing risk, or the portfolios where risk cannot be reduced without reducing returns as well. The Markowitz model for the solution of the portfolio optimization problem has a twin objective of maximizing return and minimizing risk, built on the Mean-Variance framework of asset returns and holding the basic constraints, which reduces to the following:
Minimize Risk given Levels of Return
\[\begin{align*} \min_{\vec{w}} \hspace{5mm} \sqrt{\vec{w}^{T} \hat{\Sigma} \vec{w}} \end{align*}\]
subject to
\[\begin{align*} &\vec{w}^{T} \hat{\mu}=\bar{r}_{P} \\ &\vec{w}^{T} \vec{1} = 1 \hspace{5mm} (\text{Full investment}) \\ &\vec{0} \le \vec{w} \le \vec{1} \hspace{5mm} (\text{Long only}) \end{align*}\]
Maximize Return given Levels of Risk
\[\begin{align*} \max _{\vec{w}} \hspace{5mm} \vec{w}^{T} \hat{\mu} \end{align*}\]
subject to
\[\begin{align*} &\vec{w}^{T} \hat{\Sigma} \vec{w}=\bar{\sigma}_{P} \\ &\vec{w}^{T} \vec{1} = 1 \hspace{5mm} (\text{Full investment}) \\ &\vec{0} \le \vec{w} \le \vec{1} \hspace{5mm} (\text{Long only}) \end{align*}\]
In absence of other constraints, the above model is loosely referred to as the “unconstrained” portfolio optimization model. Solving the mathematical model yields a set of optimal weights representing a set of optimal portfolios. The solution set to these two problems is a hyperbola that depicts the efficient frontier in the \(\mu-\sigma\) -diagram.
The first task is to simulate a random set of portfolios to visualize the risk-return profiles of our given set of assets. To carry out the Monte Carlo simulation, we define two functions that both take as inputs a vector of asset weights and output the annualized expected portfolio return and standard deviation:
Annualized Returns
Annualized Standard Deviation
Next, we use a for loop to simulate random vectors of asset weights, computing the expected portfolio return and standard deviation for each permutation of random weights. Again, we ensure that each random weight vector is subject to the long-positions-only and full-investment constraints.
Monte Carlo Simulation
The empty containers we instantiate are lists, which are mutable. This makes python lists more memory efficient than growing vectors in R, provided the number of simulations isn’t excessively large (i.e., in the millions). In such cases, resizing and over-allocation costs would be huge, and we should vectorize the potoflio_returns
and portfolio_sd
functions to leverage NumPy’s vectorized operations.
num_portfolios = 5000
num_assets = len(symbols)
# Generate random weights for all portfolios
random_weights = np.random.random((num_portfolios, num_assets))
# Normalize the weights so that the sum of weights for each portfolio equals 1
random_weights /= np.sum(random_weights, axis=1, keepdims=True)
list_portfolio_returns = []
list_portfolio_sd = []
# Monte Carlo simulation: loop through each set of random weights
for weights in random_weights:
list_portfolio_returns.append(portfolio_returns(weights)) # Calculate and store returns
list_portfolio_sd.append(portfolio_sd(weights)) # Calculate and store standard deviations (risk)
port_returns = np.array(list_portfolio_returns)
port_sd = np.array(list_portfolio_sd)
Let us examine the simulation results. In particular, the highest and the lowest expected portfolio returns are as follows:
def plot_violin_with_quantiles(data: np.ndarray, title: str) -> None:
fig, ax = plt.subplots(figsize=(6, 7))
parts = ax.violinplot(data, showmeans=False, showmedians=False, showextrema=False)
# Customize the violin plot appearance
for pc in parts['bodies']:
pc.set_facecolor('#1f77b4')
pc.set_edgecolor('black')
pc.set_alpha(0.7)
# Calculate and annotate quantiles
quantiles = np.percentile(data, [25, 50, 75])
for q, label in zip(quantiles, ['25%', '50%', '75%']):
ax.axhline(q, color='black', linestyle='--')
ax.text(1.05, q, f'{label}', transform=ax.get_yaxis_transform(), ha='left', va='center')
# Set labels and title
ax.set_ylabel('Value')
ax.set_title(title)
plt.show()
We may also visualize the expected returns and standard deviations on a \(\mu-\sigma\) trade-off diagram. For this task, I will leverage R’s graphics engine and the plotly
graphics library. The reticulate
package in R allows for relatively seamless transition between Python and R. Fortunately, the NumPy arrays created in Python can be accessed as R vector objects; this makes plotting in R using Python objects simple:
# Plot the sub-optimal portfolios
plot_ly(
x = py$port_sd, y = py$port_returns, color = (py$port_returns / py$port_sd),
mode = "markers", type = "scattergl", showlegend = FALSE,
marker = list(size = 5, opacity = 0.7)
) |>
layout(
title = "Mean-Standard Deviation Diagram",
yaxis = list(title = "Expected Portfolio Return (Annualized)", tickformat = ".2%"),
xaxis = list(title = "Portoflio Standard Deviation (Annualized)", tickformat = ".2%")
) |>
colorbar(title = "Sharpe Ratio")
Each point in the diagram above represents a permutation of expected-return-standard-deviation value pair. The points are color coded such that the magnitudes of the Sharpe ratios, defined as \(SR ≡ \frac{\mu_{P} – r_{f}}{\sigma_{P}}\), can be readily observed for each expected-return-standard-deviation pairing. For simplicity, we assume that \(r_{f} ≡ 0\). It could be argued that the assumption here is restrictive, so I explored using a different risk-free rate in my previous post.
Solving the optimization problem defined earlier provides us with a set of optimal portfolios given the characteristics of our assets. There are two important portfolios that we may be interested in constructing— the minimum variance portfolio and the maximal Sharpe ratio portfolio. In the case of the maximal Sharpe ratio portfolio, the objective function we wish to maximize is our user-defined Sharpe ratio function. The constraint is that all weights sum up to 1. We also specify that the weights are bound between 0 and 1. In order to use the minimization function from the SciPy
library, we need to transform the maximization problem into one of minimization. In other words, the negative value of the Sharpe ratio is minimized to find the maximum value; the optimal portfolio composition is therefore the array of weights that yields that maximum value of the Sharpe ratio.
We will use a list of dictionaries to represent the constraints:
Next, the bound values for the weights:
We also need to supply a starting sequences of weights, which essentially functions as an initial guesses. For our purposes, this will be an equal weight array:
We will use the scipy.optimize.minimize function and the Sequential Least Squares Programming (SLSQP) method for the minimization:
The optimization results is an isntance of scipy.optimize.optimize.OptimizeResult
, which contains many objects. The object of interest to us is the weight composition array, which we employ to construct the maximal Sharpe ratio portfolio:
array([1.15225784e-01, 5.44865802e-16, 3.27977694e-01, 0.00000000e+00,
0.00000000e+00, 1.14734659e-01, 1.35091591e-16, 1.36546746e-01,
0.00000000e+00, 1.01535533e-16, 3.05515117e-01, 0.00000000e+00])
Check that the weights sum to 1:
The expected return, standard deviation, and Sharpe ratio of the maximal Sharpe ratio portfolio are as follows:
Expected Return Standard Deviation Sharpe Ratio
0 0.23589 0.19533 1.207646
The minimum variance portfolio may be constructed similarly. The objective function, in this case, is the standard deviation function:
The expected return, standard deviation, and Sharpe ratio of the minimum variance portfolio are as follows:
Expected Return Standard Deviation Sharpe Ratio
0 0.138966 0.154233 0.901018
As an investor, one is generally interested in the maximum return given a fixed risk level or the minimum risk given a fixed return expectation. As mentioned in the earlier section, the set of optimal portfolios— whose expected portfolio returns for a defined level of risk cannot be surpassed by any other portfolio— depicts the so-called efficient frontier. The Python implementation is to fix a target return level and, for each such level, minimize the volatility value. For the optimization, we essentially “fit” the twin-objective described earlier into an optimization problem that can be solved using quadratic programming. (The objective function is the portfolio standard deviation formula, which is a quadratic function) Therefore, the two linear constraints are the target return (a linear function) and that the weights must sum to 1 (another linear function). We will again use a tuple of dictionaries to represent the constraints:
The full-investment and long-positions-only specifications will remain unchanged throughout the optimization process, but the value the name target
is bound to will be different during each iteration. Since dictionaries are mutable, the first constraint dictionary will be updated repeatedly during the minimization process. However, because tuples are immutable, the references held by the constraints
tuple will always point to the same objects. This nuance makes the implementation Pythonic. We constrain the weights such that all weights fall within the interval \([0, 1]\):
We will use the scipy.optimize.minimize
function again and the Sequential Least Squares Programming (SLSQP) method for the minimization:
# Initialize an array of target returns
target = np.linspace(
start = 0.15,
stop = 0.30,
num = 100
)
obj_sd = []
for target in target:
# Minimize the twin-objective function given the target expected return
min_result_object = sco.minimize(
fun = portfolio_sd,
x0 = equal_weights,
method = 'SLSQP',
bounds = bounds,
constraints = constraints
)
# Extract the objective value and append it to the output container
obj_sd.append(min_result_object['fun'])
obj_sd = np.array(obj_sd)
# Rebind target to a new array object
target = np.linspace(
start = 0.15,
stop = 0.30,
num = 100
)
Before we plot the efficient frontier, we may wish to highlight the two portfolios in the plot— the maximal Sharpe ratio and the minimum variance portfolios:
Since the optimal expected portfolio returns and standard deviations are both array objects, we can access them via the reticulate
package and plot them in R:
plot_ly(
x = py$port_sd, y = py$port_returns, color = (py$port_returns / py$port_sd),
mode = "markers", type = "scattergl", showlegend = FALSE,
marker = list(size = 5, opacity = 0.7)
) |>
# Efficient frontier
add_trace(
data = tibble::tibble(
Risk = py$obj_sd,
Return = py$target,
SharpeRatio = py$target / py$obj_sd
),
x = ~Risk,
y = ~Return,
color = ~SharpeRatio,
marker = list(size = 7)
) |>
# Maximal Sharpe ratio portfolio
add_trace(
data = tibble::tibble(
Risk = py$max_sharpe_port_sd,
Return = py$max_sharpe_port_return,
SharpeRatio = py$max_sharpe_port_return / py$max_sharpe_port_sd
),
x = ~Risk,
y = ~Return,
color = ~SharpeRatio,
marker = list(size = 7)
) |>
# Minimum variance portfolio
add_trace(
data = tibble::tibble(
Risk = py$min_sd_port_sd,
Return = py$min_sd_port_return,
SharpeRatio = py$min_sd_port_return / py$min_sd_port_sd
),
x = ~Risk,
y = ~Return,
color = ~SharpeRatio,
marker = list(size = 7)
) |>
layout(
title = "Mean-Standard Deviation Diagram",
yaxis = list(title = "Expected Portfolio Return (Annualized)", tickformat = ".2%"),
xaxis = list(title = "Portoflio Standard Deviation (Annualized)", tickformat = ".2%")
) |>
add_annotations(
x = annotation_data[["x"]],
y = annotation_data[["y"]],
text = annotation_data[["type"]],
xref = "x",
yref = "y",
showarrow = TRUE,
arrowhead = 5,
arrowsize = .5,
ax = 20,
ay = -40
) |>
colorbar(title = "Sharpe Ratio")