While investing in equity markets (stock markets), we often diversify our investments in different stocks to maximize our returns. When investing in multiple stocks, determining the optimal allocation of funds to each stock to minimize risk and maximize returns can be challenging. However, with the help of the portfolio optimization method, we can overcome this issue. In this article, we will try to understand what portfolio optimization is, how to download historical stock price data from Yahoo Finance, and how to build a portfolio optimization code using mean variance method in Python.

This article was published as a part of the Data Science Blogathon.

Let us first understand what a portfolio is. An investment portfolio refers to a collection of financial assets held by an individual, organization, or entity for the purpose of investment and wealth management. It typically includes a diverse range of investments such as stocks, bonds, mutual funds, exchange-traded funds (ETFs), commodities, and other investment vehicles. The primary objective of a portfolio is to generate returns and grow wealth over time. The composition of an investment portfolio may change over time as market conditions, investment goals, and risk preferences evolve.

Portfolio Optimization aims to strike a balance between risk and reward, tailor investments to individual objectives, and increase the likelihood of achieving long-term financial success. Portfolio Optimization acts as a base in mutual funds. However, asset managers incorporate even more complex methods to increase the likelihood of comfortable returns. It is important to note that portfolio optimization is a complex process that may require expertise and professional guidance to implement effectively.

The Markowitz model, also commonly known as Modern Portfolio Theory (MPT), revolutionized portfolio management by introducing the concept of diversification and the trade-off between risk and return. The model suggests that an investor can construct an efficient portfolio by carefully selecting a combination of assets that maximizes returns for a given level of risk or minimizes risk for a desired level of returns.

**Expected Return**: Where we estimate the expected return of each investment in the portfolio based on historical data, financial analysis, or other relevant information.**Risk**: Risk is measured by the variance or standard deviation of an asset’s returns. Markowitz emphasizes that we should consider the overall portfolio’s risk, considering the correlation or covariance between different assets.**Diversification**: Markowitz highlights the benefits of diversification in reducing portfolio risk. By combining assets with low or negative correlations, we can potentially achieve a more efficient risk-return trade-off.**Efficient Frontier**: The efficient frontier represents a set of optimal portfolios that offer the highest expected return for each level of risk or the lowest risk for a given level of expected return. The goal is to identify the portfolio that lies on the efficient frontier, providing the best risk-return profile.

Markowitz’s contribution has been significant to the portfolio theory. However, investment practices have evolved over the years, and various enhancements to the original model have been proposed.

To determine how much return is expected for the risk we are willing to take, we try to maximize the Sharpe ratio of the portfolio. The Sharpe ratio is a measure used to evaluate the risk-adjusted return of an investment or portfolio. It quantifies the excess return generated per unit of risk taken. The ratio is calculated by subtracting the risk-free rate of return from the investment’s average return and dividing the result by the standard deviation of the investment’s returns. A higher Sharpe ratio indicates better risk-adjusted performance, reflecting higher returns relative to volatility. It helps us compare different investment opportunities and determine whether the potential returns adequately compensate for the level of risk involved. Usually, the government bond yield is considered the risk-free rate of return.

Where,

Rp = return of the portfolio,

Rf = risk-free rate,

σp = standard deviation of the portfolio.

Let’s now build a portfolio optimization code in Python. We will consider nine different stocks in our portfolio from the Indian National Stock Exchange. According to the Portfolio Theory, our objective will be to diversify our stock selection and balance maximizing returns for a given level of risk. Diversifying our selected stocks from different industrial sectors helps us to cover our losses better. For example, if we consider an automotive stock and a healthcare stock, we see that these two are relatively dissimilar. And any fluctuation in returns in one stock may or may not have an influence over the other. In this way, we could salvage any poor returns, helping us achieve our maximum expected returns over long-term investment.

Before we begin, I would like to point out that this article is purely for educational purposes, and I do not provide any stock recommendations. The Stock market is risky, so kindly do your own research before investing.

First, let us download the necessary library to perform the optimization operation. For this purpose, we will install the pyPortfolioOpt library. And to download our stocks’ historical prices, we will install the yfinance library.

```
!pip install pyPortfolioOpt
!pip install yfinance
```

Next, we will import the necessary libraries.

```
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import pandas_datareader.data as web
import datetime
import yfinance as yf
yf.pdr_override()
%matplotlib inline
```

We will need a start and end date to extract the stock prices. I am assuming a two-year time frame from April 2021 to the end of March 2023.

```
start_date = datetime.datetime(2021,4,1)
end_date = datetime.datetime(2023,3,31)
```

Now we will create an object to help us extract the stock prices from the Yahoo Finance website. This object will input the stock ticker name and extract the price between the time frame mentioned previously. Finally, we return a data frame consisting of portfolio prices. Note that to extract Indian stock price data, we need to add the “.NS” (for the National Stock Exchange) or “.BO” (for the Bombay Stock Exchange) at the end of the stock name, as shown below.

```
def get_stock_price(ticker):
prices = web.get_data_yahoo(ticker,start_date,end_date)
prices = prices["Adj Close"].dropna(how="all")
return prices
```

We will analyze eight stocks and one Gold ETF (Equity Traded Fund). We have diversified the stocks from different sectors as follows:

- Automotive: Tata Motors, Maruti Suzuki Ltd.
- Pharmacy: Cipla, Sun Pharma
- Info-Tech: Infosys, TCS
- FMCG: ITC, Marico
- GOLDBEES ETF

```
ticker_list = ['INFY.NS','TCS.NS','TATAMOTORS.NS','MARUTI.NS',
'SUNPHARMA.NS','CIPLA.NS','ITC.NS','MARICO.NS','GOLDBEES.NS']
portfolio = get_stock_price(ticker_list)
```

It’s now time to save the data frame of our portfolio for further analysis.

```
portfolio.to_csv("portfolio.csv",index=True)
portfolio = pd.read_csv("portfolio.csv",parse_dates=True,index_col="Date")
```

Let’s visualize the stock price trends for the two-year period.

```
portfolio[portfolio.index >= "2021-04-01"].plot(figsize=(15,10));
```

After importing our data and performing a simple descriptive analysis, we can now begin to optimize our portfolio to maximize our expected returns. Let’s import the pyPortfolioOpt library and first compute the variance-covariance matrix.

The variance-covariance matrix is a crucial component in mean-variance portfolio optimization. It describes the relationships between the returns of different assets within a portfolio. The matrix captures both the variance, representing the volatility of individual asset returns, and the covariance, which measures the co-movement between pairs of assets’ returns.

By constructing the variance-covariance matrix, portfolio managers can assess a portfolio’s total risk and diversification potential. It enables the calculation of portfolio-level risk measures, such as total variance and standard deviation, which are essential in optimizing the risk-return trade-off.

In mean-variance optimization, the variance-covariance matrix is used with expected returns to determine the optimal asset allocation. It helps identify the asset mix that maximizes returns for a given level of risk or minimizes risk for a desired level of returns. By quantifying risk contributions and considering correlations between assets, the matrix helps assess diversification benefits.

Overall, the variance-covariance matrix is central to mean-variance portfolio optimization as it allows for mathematical modeling and analysis of risk and return relationships within a portfolio. Its utilization assists in constructing efficient portfolios that balance risk and return based on the characteristics and interactions of the underlying assets.

```
import pypfopt
from pypfopt import risk_models
from pypfopt import plotting
sample_cov = risk_models.sample_cov(portfolio, frequency=252)
```

This matrix is better visualized as a heatmap, with the diagonal representing the variance and the rest of the color tiles representing the covariances. Lower value ( or darker color) indicates that the stocks are least similar; this can assist in risk management.

```
S = risk_models.CovarianceShrinkage(portfolio).ledoit_wolf()
plotting.plot_covariance(S, plot_correlation=True);
```

After generating the variance-covariance matrix, we compute the mean returns of each stock in our portfolio. The chart representing the same data as the table below shows that Tata Motors has given the highest mean return in the last two-year period we have considered, followed by Infosys, and GOLDBEES has given the least return.

```
from pypfopt import expected_returns
mu = expected_returns.capm_return(portfolio)
```

`mu.plot.barh(figsize=(10,6));`

Now we’ll calculate the weights of each stock using the Efficient Frontier module. This considers the returns and volatility (standard deviation) of each stock in our portfolio. Then we calculate the maximum Sharpe Ratio. A higher Sharpe Ratio is achieved by balancing higher returns and lower risk. So we are computing the weights for the maximum Sharper Ratio.

```
from pypfopt.efficient_frontier import EfficientFrontier
ef = EfficientFrontier(mu, S)
weights = ef.max_sharpe()
cleaned_weights = ef.clean_weights()
print(dict(cleaned_weights))
```

The output dictionary containing the weights of each stock in the portfolio is {‘CIPLA.NS’: 0.1122, ‘GOLDBEES.NS’: 0.0905, ‘INFY.NS’: 0.11603, ‘ITC.NS’: 0.11287, ‘MARICO.NS’: 0.11263, ‘MARUTI.NS’: 0.11094, ‘SUNPHARMA.NS’: 0.11454, ‘TATAMOTORS.NS’: 0.11785, ‘TCS.NS’: 0.11243}

Next, we compute the portfolio performance to understand our expected annual return and the portfolio’s volatility. A shape ratio greater than one is considered to be acceptable to good for investment. A shape ratio greater than 1.5 is considered extraordinary.

`ef.portfolio_performance(verbose=True)`

Now, if we find our portfolio to be suitable based on the returns, risk, and shape ratio values, we can further compute how many shares of each stock to purchase. For example, if we had ₹1,00,000 to invest, then based on the weights we calculated in the previous steps and using the current stock prices, we can compute the number of shares of each stock you can purchase within the budget. With the help of DiscreteAllocation class, we can convert the continuous portfolio weights into discrete allocations using integer programming. DiscreteAllocation inputs three values: the continuous weights, the latest stock prices, and the assumed budget. Using the “get_latest_prices” class, we can extract the latest prices, from Yahoo Finance, for the chosen stocks in our portfolio. The output is a dictionary of each stock’s proposed number of shares to purchase and the value of unused funds.

```
from pypfopt.discrete_allocation import DiscreteAllocation, get_latest_prices
latest_prices = get_latest_prices(portfolio)
da = DiscreteAllocation(weights, latest_prices, total_portfolio_value=100000)
# Number of shares of each stock to purchase
allocation, leftover = da.greedy_portfolio()
print("Discrete allocation:", allocation)
print("Funds remaining: \u20B9{:.2f}".format(leftover))
```

Discrete allocation: {‘TATAMOTORS.NS’: 28, ‘INFY.NS’: 9, ‘SUNPHARMA.NS’: 12, ‘ITC.NS’: 29, ‘MARICO.NS’: 24, ‘TCS.NS’: 4, ‘CIPLA.NS’: 13, ‘MARUTI.NS’: 1, ‘GOLDBEES.NS’: 178}

Funds remaining: ₹340.08

Finally, let us plot the efficient frontier and the various computed shape ratios of our portfolio. Ideal, the portfolio’s max Sharpe ratio for a given risk value must fall on the efficient frontier line. We’ll also plot this max Sharpe ratio on our chart. We will consider a sample of 10,000 different portfolio weights and shape ratios to build an efficient frontier. And then compute a risk-return scatter plot.

```
n_samples = 10000
w = np.random.dirichlet(np.ones(len(mu)), n_samples)
rets = w.dot(mu)
stds = np.sqrt((w.T * (S @ w.T)).sum(axis=0))
sharpes = rets / stds
print("Sample portfolio returns:", rets)
print("Sample portfolio volatilities:", stds)
```

```
# Plot efficient frontier with Monte Carlo sim
ef = EfficientFrontier(mu, S)
fig, ax = plt.subplots(figsize= (10,10))
plotting.plot_efficient_frontier(ef, ax=ax, show_assets=False)
# Find and plot the tangency portfolio
ef2 = EfficientFrontier(mu, S)
ef2.max_sharpe()
ret_tangent, std_tangent, _ = ef2.portfolio_performance()
# Plot random portfolios
ax.scatter(stds, rets, marker=".", c=sharpes, cmap="viridis_r")
ax.scatter(std_tangent, ret_tangent, c='red', marker='X',s=150, label= 'Max Sharpe')
# Format
ax.set_title("Efficient Frontier with random portfolios")
ax.legend()
plt.tight_layout()
plt.show()
```

You can easily adapt the mean-variance method in portfolio optimization to include additional industries such as automotive, IT, pharmacy, FMCG, and gold. For instance, constructing a portfolio of companies in energy, real estate, and commodities is also possible. This portfolio optimization method, while widely used, has certain limitations:

- It is sensitive to input estimates of expected returns, variances, and covariances, which can be uncertain and subject to small changes.
- Inaccurate estimates can lead to suboptimal portfolio allocations.
- Additionally, mean-variance optimization assumes a normal distribution of asset returns, neglecting non-normal behavior, skewness, and kurtosis. This can result in inadequate consideration of tail risk.
- The method also assumes linear relationships between asset returns, which may not capture the complexity of non-linear relationships, particularly in options or derivative instruments.
- Furthermore, mean-variance optimization heavily relies on historical data, which may not accurately represent future market conditions, especially during structural changes or extreme events.

Alternative approaches can address these limitations. Resampling techniques, such as Monte Carlo simulation, account for uncertainty in input estimates. Downside risk measures like Conditional Value at Risk (CVaR) or higher moment-based models like the Black-Litterman model provide a more comprehensive analysis of risk and returns. However, this can be an encouraging guide in exploring different portfolio components and optimization methods.

A. The mean-variance method is an investment portfolio optimization approach that aims to find the optimal balance between risk and return. It quantifies risk as the variance of returns and seeks to maximize the portfolio’s expected return while minimizing its variance.

A. Mean-variance optimization methods are mathematical techniques used to construct efficient investment portfolios. They involve analyzing the expected returns and variances of different assets in order to allocate weights to each asset, aiming to achieve the highest possible return for a given level of risk.

A. In Python, the variance can be calculated using various methods. One common approach is to use the NumPy library’s var() function, which computes the sample variance of an array or a specific axis of a multidimensional array. Another option is to manually calculate the variance using mathematical formulas.

A. The formula for mean variance typically refers to the formula used to calculate the variance of a dataset. It involves taking the sum of the squared differences between each data point and the mean, dividing it by the total number of data points (or n-1 for sample variance), and represents the average squared deviation from the mean.

**The media shown in this article is not owned by Analytics Vidhya and is used at the Author’s discretion.**

Lorem ipsum dolor sit amet, consectetur adipiscing elit,