Hide Code
library(tidyverse)
library(quantmod)
library(PerformanceAnalytics)
library(PortfolioAnalytics)
library(timetk)
library(tibbletime)
library(xts)
library(zoo)
library(tidyquant)Cookies
This site uses cookies for essential functionality and anonymized analytics. Accept or manage preferences below.
Yang Wu
December 1, 2020
In this post, we will explore some finance topics— portfolio optimization and computing portfolio returns. My goal is to apply what I’ve learned in portfolio theory using R as the main tool of analysis. There are many advantages to using a GUI like MS Excel, but R has an amazing data analysis work flow— a sort of one-stop solution from initial data importation to data wrangling and transformation to computations and analysis and then finally to visualizing and reporting results. We will be using functions from several R packages— xts, PerformanceAnalytics, PortfolioAnalytics, tidyquant, tidyverse. In particular, the tidyquant package is a recent development that has markedly enriched R’s financial analytics infrastructure, enhancing its usability in finance. While I cover some theory in this post, my main focus will be on the implementation of these topics in R.
For more readings on the theory, I recommend Essentials of Investments and Practical Portfolio Performance Measurement and Attribution. To learn more about analyzing financial data in R, there is Reproducible Finance with R, which is a very practical book with a strong emphasis on application.
The following packages will be used in this post:
We will employ Markowitz’s Mean-Variance model as the framework for computing optimal weights, essentially treating the task as an “unconstrained” optimization problem. The objective of this optimization problem is one of minimization:
subject to the sum of weights constraint and the box constraint:
The minimization problem is a quadratic problem with linear constraints, since the variance formula is a quadratic function and the constraints are linear functions; this type of problem is well suited to be solved using a quadratic programming solver. The PortfolioAnalytics package uses ROI.plugin.quadprog, a plug-in for the “R” Optimization Infrastructure, to solve the problem. The solver can be specified with the optimize_method argument in optimize.portfolio(). If optimize_method = “ROI” is specified, a default solver will be selected based on the optimization problem. The glpk solver is the default solver for LP and MILP optimization problems. The quadprog solver is the default solver for QP optimization problems.
For this task, we will import our data from Yahoo Finance. The five assets under examination are Exchange Traded Funds, which are funds that can be traded on an exchange like a stock. Exchange-traded funds are a type of investment fund that offers the best attributes of two popular assets: they have the diversification benefits of mutual funds and the ease with which stocks are traded. However, before we import any data, we must answer the following question: In what form do we want our data to be? Since we are in the world of finance, times series is the most common type of data. R has the xts package to handle data that are indexed by date. Our task, therefore, reduces to the following:
Import daily prices from Yahoo Finance
Convert daily prices to monthly return
Because we will be aggregating monthly returns to form a portfolio, we will need to compute the simple returns
# Create a vector of ticker symbols
symbols <- c("SPY", "EFA", "IJS", "EEM", "AGG")
# Load data from 2012 to today
# Specify the "to = " argument to specify an end date
prices <- getSymbols(
Symbols = symbols,
src = "yahoo",
from = "2012-12-31",
auto.assign = TRUE,
warnings = FALSE
) %>%
# The map function takes an anonymous function and will return a list of five
# The function Ad() extracts the daily adjusted price series
map(.f = ~ Ad(get(x = .x))) %>%
# Use reduce() to merge the elements of .x interactively
reduce(.f = merge) %>%
# Use a replacement function to set column names to ticker symbols
# This function is in prefix form
# It is equivalent to colnames(x = prices) <- value
`colnames<-`(value = symbols)
# Keep only the last reading of each month
# We could have chosen to keep only the first reading of each month
asset_returns_xts <- to.monthly(
x = prices,
drop.time = TRUE,
indexAt = "lastof",
OHLC = FALSE
) |>
# Compute simple returns
# Log returns are time-additive but not portfolio additive
Return.calculate(method = "discrete") |>
# Drop the first row since we lose 12/31/2012
na.omit()
# Keep only the xts returns, ticker symbols, and the prices series
rm(list = setdiff(x = ls(), y = c("symbols", "prices", "asset_returns_xts"))) SPY EFA IJS EEM AGG
2013-01-31 0.05118977 0.03728443 0.053516438 -0.002931103 -0.0062119790
2013-02-28 0.01275927 -0.01288530 0.016306887 -0.022840267 0.0059082324
2013-03-31 0.03797115 0.01305350 0.041079404 -0.010182735 0.0009852285
2013-04-30 0.01921212 0.05018664 0.001223212 0.012157888 0.0096856794
2013-05-31 0.02361009 -0.03019042 0.042869628 -0.048279197 -0.0200107829
[1] "list"
# Add the full investment constraint that specifies that the weights must sum to 1
min_var_portfolio <- add.constraint(
portfolio = min_var_portfolio,
type = "full_investment"
)
# Examine the constraint element by extracting min_var_portfolio[["constraints"]][[1]]
str(pluck(.x = min_var_portfolio, "constraints", 1))List of 6
$ type : chr "full_investment"
$ enabled: logi TRUE
$ message: logi FALSE
$ min_sum: num 1
$ max_sum: num 1
$ call : language add.constraint(portfolio = min_var_portfolio, type = "full_investment")
- attr(*, "class")= chr [1:2] "weight_sum_constraint" "constraint"
# Add the box constraint that ensure the weights are between 0.1 and 0.6
min_var_portfolio <- add.constraint(
portfolio = min_var_portfolio,
type = "box", min = 0.05, max = 0.6
)
# Examine the constraint element by extracting min_var_portfolio[["constraints"]][[2]]
str(pluck(.x = min_var_portfolio, "constraints", 2))List of 5
$ type : chr "box"
$ enabled: logi TRUE
$ min : Named num [1:5] 0.05 0.05 0.05 0.05 0.05
..- attr(*, "names")= chr [1:5] "SPY" "EFA" "IJS" "EEM" ...
$ max : Named num [1:5] 0.6 0.6 0.6 0.6 0.6
..- attr(*, "names")= chr [1:5] "SPY" "EFA" "IJS" "EEM" ...
$ call : language add.constraint(portfolio = min_var_portfolio, type = "box", min = 0.05, max = 0.6)
- attr(*, "class")= chr [1:2] "box_constraint" "constraint"
# Run the optimization
global_min_portfolio <- optimize.portfolio(
R = asset_returns_xts,
portfolio = min_var_portfolio,
# This defaults to the "quadprog" solver
optimize_method = "quadprog",
# Return additional information on the path or portfolios searched
trace = TRUE
)
# Examine returned portfolio list object
global_min_portfolio***********************************
PortfolioAnalytics Optimization
***********************************
Call:
optimize.portfolio(R = asset_returns_xts, portfolio = min_var_portfolio,
optimize_method = "quadprog", trace = TRUE)
Optimal Weights:
SPY EFA IJS EEM AGG
0.2226 0.0774 0.0500 0.0500 0.6000
Objective Measure:
StdDev
0.02082
Now that we found the global minimum variance portfolio, we may be interested in finding the maximal expected return portfolio. The objective is one of maximization:
subject to the sum of weights constraint and the box constraint:
The optimization problem in this case is a linear programming problem, since the portfolio expected return formula is a linear function. This is best tackled using a linear programming solver. The package PortfolioAnalytics uses the ROI package with the glpk plugin, the GNU Linear Programming toolkit of R’s Optimization Infrastructure.
# Add the full investment constraint that specifies the weights must sum to 1
max_exp_return_portfolio <- add.constraint(
portfolio = max_exp_return_portfolio,
type = "full_investment"
)
# Add the box constraint that ensure the weights are between 0.1 and 0.6
max_exp_return_portfolio <- add.constraint(
portfolio = max_exp_return_portfolio,
type = "box", min = 0.05, max = 0.6
)# Run the optimization
global_max_portfolio <- optimize.portfolio(
R = asset_returns_xts,
portfolio = max_exp_return_portfolio,
# This defaults to the "glpk" solver
optimize_method = "glpk",
# Return additional information on the path or portfolios searched
trace = TRUE
)
# Examine returned portfolio list object
global_max_portfolio***********************************
PortfolioAnalytics Optimization
***********************************
Call:
optimize.portfolio(R = asset_returns_xts, portfolio = max_exp_return_portfolio,
optimize_method = "glpk", trace = TRUE)
Optimal Weights:
SPY EFA IJS EEM AGG
0.60 0.05 0.25 0.05 0.05
Objective Measure:
mean
0.01074
We have found two sets of optimal weights that yield portfolios that offer the lowest possible risk and the high possible expected return given two basic constraints. Our next task is to aggregate the monthly returns of the individual ETF’s to find the monthly returns of our five-asset portfolio.
# A tibble: 5 × 2
weights symbols
<dbl> <chr>
1 0.6 SPY
2 0.05 EFA
3 0.250 IJS
4 0.05 EEM
5 0.05 AGG
# A tibble: 1 × 1
total_weight
<dbl>
1 1
The portfolio return in month,
We can compute portfolio monthly returns using a brute force method:
# Compute by hand
portfolio_returns_by_hand <-
(weights[[1]] * asset_returns_xts[, 1]) +
(weights[[2]] * asset_returns_xts[, 2]) +
(weights[[3]] * asset_returns_xts[, 3]) +
(weights[[4]] * asset_returns_xts[, 4]) +
(weights[[5]] * asset_returns_xts[, 5])
# Name the series
portfolio_returns_by_hand <- `names<-`(portfolio_returns_by_hand, "Monthly portfolio returns")
# Examine
head(portfolio_returns_by_hand, 5) Monthly portfolio returns
2013-01-31 0.04550004
2013-02-28 0.01024142
2013-03-31 0.03324534
2013-04-30 0.01543459
2013-05-31 0.01995944
Another way to compute portfolio monthly returns is to use functions from the PerformanceAnalytics package, which works well with xts objects. We also adopt monthly re-balancing as a strategy. The re-balancing of investments is the action of keeping the portfolio weights consistent with the optimal weights. Note that re-balancing the weights every month may be unrealistic; in reality, the convention is often to re-balance quarterly or annually. For this example, however, we will re-balance monthly to be consistent with our brute force computation.
# Compute monthly portfolio returns
portfolio_returns_xts_rebalanced_monthly <-
Return.portfolio(
R = asset_returns_xts,
weights = weights,
# Monthly re-balancing
reblance_on = "months",
# Use simple/arithmetic chaining to aggregate returns
geometric = FALSE
) |>
`colnames<-`("Monthly_portfolio_returns")
# Examine
head(portfolio_returns_xts_rebalanced_monthly, 5) Monthly_portfolio_returns
2013-01-31 0.04550004
2013-02-28 0.01035769
2013-03-31 0.03358032
2013-04-30 0.01534395
2013-05-31 0.02066131
Return.portfolio(R, weights = NULL, wealth.index = FALSE, contribution = FALSE, geometric = TRUE, rebalance_on = c(NA, "years", "quarters", "months", "weeks", "days"), value = 1, verbose = FALSE, ...) calculates the returns of a portfolio with the same periodicity of the returns data.The tidyverse is a collection of R packages designed with the same underlying philosophy, grammar, and data structures. Simply put, the “tidy” data structure that works well with tidyverse functions is one where every row is an observation and every column is a variable. If we re-examine the xts object called “asset_returns_xts,” we see that every column is a returns series for a particular asset. This is inconsistent with the tidyverse data structure, and so we must convert the xts object to a tidy format for our computations. Ideally, we would like to have one column called “asset” that specifies the names of the ETF instead of having five individual columns of returns data. This idea will become clearer once we convert our xts object to a tibble.
Now, examine the “tidy” data structure and compare it to the xts object created earlier:
date asset returns
1 2013-01-31 SPY 0.051189775
2 2013-01-31 EFA 0.037284433
3 2013-01-31 IJS 0.053516438
4 2013-01-31 EEM -0.002931103
5 2013-01-31 AGG -0.006211979
801 2026-05-31 SPY 0.028539240
802 2026-05-31 EFA -0.005863941
803 2026-05-31 IJS -0.028231063
804 2026-05-31 EEM 0.016877606
805 2026-05-31 AGG -0.007593399
Since our returns data is in a “tidy” format, computing portfolio monthly returns is very a straight forward. For those who are familiar with SQL, we are essentially using the CASE statement here.
# Use vectorized if else statements to assign weights according to the asset column
potfolio_returns_dplyr_byhand <- asset_returns_long |>
group_by(asset) |>
mutate(
weights = dplyr::case_when(
asset == symbols[[1]] ~ weights[[1]],
asset == symbols[[2]] ~ weights[[2]],
asset == symbols[[3]] ~ weights[[3]],
asset == symbols[[4]] ~ weights[[4]],
asset == symbols[[5]] ~ weights[[5]]
),
weighted_returns = weights * returns
) |>
# Group by date
# We need to group by date so that the aggregate sum() function is carried out by month
# For each date, the original series has 5 weighted returns, one for each ETF
# The results should be 1 portfolio return (the sum of the 5 weighted returns) for each month
group_by(date) |>
# Compute monthly portfolio returns
summarize(Monthly_portfolio_returns = sum(weighted_returns))
# Examine the data
head(potfolio_returns_dplyr_byhand, 5)# A tibble: 5 × 2
date Monthly_portfolio_returns
<date> <dbl>
1 2013-01-31 0.0455
2 2013-02-28 0.0102
3 2013-03-31 0.0332
4 2013-04-30 0.0154
5 2013-05-31 0.0200
summarize(.data, ..., .groups = NULL) creates a new data frame. It will have one (or more) rows for each combination of grouping variables.The tidyquant package gives us two ways to compute the portfolio monthly returns.
# Use function from the tidyquant
portfolio_returns_tq_rebalanced_monthly_method_1 <- asset_returns_long |>
tq_portfolio(
assets_col = asset,
returns_col = returns,
weights = weights,
col_rename = "Monthly_portfolio_returns",
rebalance_on = "months"
)
# Examine
head(portfolio_returns_tq_rebalanced_monthly_method_1, 5)# A tibble: 5 × 2
date Monthly_portfolio_returns
<date> <dbl>
1 2013-01-31 0.0455
2 2013-02-28 0.0102
3 2013-03-31 0.0332
4 2013-04-30 0.0154
5 2013-05-31 0.0200
tq_portfolio(data, assets_col, returns_col, weights = NULL, col_rename = NULL, ...) aggregates a group of returns by asset into portfolio returns.
An alternative way to load data is to use the tidyquant wrapper function, which automatically coerces the returns into a long tidy format:
# Load data
asset_returns_tq <- tq_get(
x = symbols,
get = "stock.prices",
from = "2012-12-31"
) |>
# The asset column is named symbol by default (see body(tidyquant::tq_get))
group_by(symbol) |>
# Select adjusted daily prices
tq_transmute(
select = adjusted,
col_rename = "returns",
# This function is from quantmod
mutate_fun = periodReturn,
# These arguments are passed along to the mutate function quantmod::periodReturn
period = "monthly",
# Simple returns
type = "arithmetic",
# Do not return leading period data
leading = FALSE,
# This argument is passed along to xts::to.period, which is wrapped by quantmod::periodReturn
# We use the last reading of each month to find percentage changes
indexAt = "lastof"
) |>
rename(asset = symbol) |>
na.omit()tq_get(x, get = "stock.prices", complete_cases = TRUE, ...) gets data in tibble format. The most important argument is perhaps the dot-dot-dot, since these are the arguments passed to the underlying functions from the other packages that tq_get() uses. There could be multiple layers of wrapper functions, and so it is best practice to read the documentations carefully.A possibly more useful method of aggregating asset returns to portfolio returns is then to map a tibble of ticker symbols and weights to the tibble of ticker symbols and monthly returns:
# A tibble: 5 × 2
asset weights
<chr> <dbl>
1 SPY 0.6
2 EFA 0.05
3 IJS 0.250
4 EEM 0.05
5 AGG 0.05
# Map the weights to the returns column using the asset column as the identifier
portfolio_returns_tq_rebalanced_monthly_method_2 <- asset_returns_tq |>
tq_portfolio(
assets_col = asset,
returns_col = returns,
weights = weights_tibble,
col_rename = "Monthly_portfolio_returns",
rebalance_on = "months"
)
head(portfolio_returns_tq_rebalanced_monthly_method_2, 5)# A tibble: 5 × 2
date Monthly_portfolio_returns
<date> <dbl>
1 2013-01-31 0.0455
2 2013-02-28 0.0102
3 2013-03-31 0.0332
4 2013-04-30 0.0154
5 2013-05-31 0.0200
We have covered a variety of different methods for aggregating asset monthly returns to portfolio monthly returns. As a sanity check, we want to ensure that these methods yield consistent results:
potfolio_returns_dplyr_byhand |>
# Rename the column
rename(tidyverse_method = Monthly_portfolio_returns) |>
# Create three new columns that contain results for other methods
mutate(
"equation_byhand" = zoo::coredata(x = portfolio_returns_by_hand),
"tq_method_1" = portfolio_returns_tq_rebalanced_monthly_method_1[["Monthly_portfolio_returns"]],
"tq_method_2" = portfolio_returns_tq_rebalanced_monthly_method_2[["Monthly_portfolio_returns"]],
"xts_method" = zoo::coredata(portfolio_returns_xts_rebalanced_monthly)
) |>
modify_if(.p = is.numeric, .f = ~ round(x = .x, digits = 7)) |>
select(-date) |>
# Examine
head(10)# A tibble: 10 × 5
tidyverse_method equation_byhand[,"Monthly portfoli…¹ tq_method_1 tq_method_2
<dbl> <dbl> <dbl> <dbl>
1 0.0455 0.0455 0.0455 0.0455
2 0.0102 0.0102 0.0102 0.0102
3 0.0332 0.0332 0.0332 0.0332
4 0.0154 0.0154 0.0154 0.0154
5 0.0200 0.0200 0.0200 0.0200
6 -0.0131 -0.0131 -0.0131 -0.0131
7 0.0509 0.0509 0.0509 0.0509
8 -0.0292 -0.0292 -0.0292 -0.0292
9 0.0436 0.0436 0.0436 0.0436
10 0.0406 0.0406 0.0406 0.0406
# ℹ abbreviated name: ¹equation_byhand[,"Monthly portfolio returns"]
# ℹ 1 more variable: xts_method <dbl[,1]>
coredata(x, ...) is a generic function for extracting the core data contained in, a (more complex) object and replacing it. The replacement function is coredata<-.As can be seen, the results are all consistent. Sigh of relief. For the upcoming posts, I also plan on tackling other topics in portfolio theory and to build off of what I’ve covered in this post. In particular, Python also has many great libraries— numpy, scipy, and yahoo finance— all of which contain great tools for financial analysis.