Is replacing standard investments with ESG substitutes a good choice?

We compare traditional portfolio selection strategies with traditional stock market indexes in their standard and ESG versions. In addition to the comparison of 12 indexes from different markets in their ESG and standard versions. To compare the port-folios we use conventional performance measures, and the results show that replacing the indexes with their ESG versions does not detract from performance in terms of the Shape ratio and other measures. In general, there is no statistically signiﬁcant difference between the returns of the indexes and the portfolios built with these indexes compared to those using the standard versions. So that the investor concerned with environmental and social issues can replace their equity market investments with ESG assets without harming their ﬁnancial performance.


Introduction
Companies' environmental, social, and governance (ESG) policies have gained much visibility in recent years from different perspectives: consumers with greater social and environmental awareness are interested in transacting with companies with good practices; investors are trying to understand how the adoption of ESG practices can impact the company's results; governments and regulators interested in practices that maximize the general welfare.
Nevertheless, taking away the welfare gains that transacting or investing in companies that respect social and environmental causes can bring, from an investor's point of view, how profitable or harmful to their investments would it be to choose assets that follow ESG guidelines? Although the topic has been the subject of much debate in recent years, this issue is far from being entirely solved. Some studies point out that ESG assets would perform better, especially showed higher returns (Edmans, 2012;Khan, 2019;Consolandi et al., 2020), others showing that there would be no change in performance (Halbritter and Dorfleitner, 2015;Naffa and Fain, 2022), or that they would have negative impacts on investments' returns (Hübel and Scholz, 2020).
Recently, ESG indexes have been created to different markets to measure the performance of companies with good practices, excluding from these indicators assets issued by companies that do not adopt environmentally friendly actions and that commercialize or produce products harmful to health and human well-being such as tobacco and weapons. We take advantage of these already established indexes to assess their performance and attractiveness to investors. For this, our study consists of two ways of evaluating these indexes: (i) We compared the indexes in their standard and ESG versions. We used the Ledoit and Wolf (2008) test to assess whether their performances in terms of Sharpe Ratio are statistically different. (ii) We build easy-to-implement portfolios using the SP 500 index in its standard version, ESG and Carbon Efficient for the American market, among other traditional assets customarily used to construct investment portfolios. We did the same experiment with SP EURO 350 in its standard and ESG version for the European market. As the creation of several of these indexes is recent, their potential in terms of returns and diversification has not yet been thoroughly evaluated. Thus, what is proposed in this article is the evaluation of the inclusion of these indexes in place of their traditional versions in portfolio selection strategies. Further to evaluate them by traditional measures such as average return, volatility, Sharpe ratio, Sortino ratio, and Omega ratio. In addition, to compare the statistical difference of performance (using Ledoit and Wolf (2008) test) of well-defined indexes in their ESG and standard versions. Our goal is to check if the investor can replace her assets with socially responsible similar ones without being harmed or having a worse performance on her investments.
We used 12 indexes in their standard and ESG versions to evaluate their performance difference through the Ledoit and Wolf (2008) test. In addition, we built the 1/n, minimum-variance, mean-variance with and without short selling, and Volatility Timing and Reward-to-Risk Timing portfolios from Kirby and Ostdiek (2012) for the American and European market with SP 500 and SP EURO 350 in their standard and ESG versions representing the equity market. Our results show that the investments does not perform worse when choosing ESG substitutes and that in some cases may even benefit (in terms of financial performance) from this.

Environmental, Social, and Governance (ESG) and Financial Performance
The literature on environmental, social, and governance (ESG) in Finance has gained much space in the academic debate. There are countless works dealing with the subject, either from the point of view of the market as a whole (Pedersen et al., 2021) or analyzing its application and results at the level of assets or firms (Clementino and Perkins, 2021;Aouadi and Marsat, 2018).
Starting with the characterization of an ESG market as a whole, Pedersen et al. (2021) created a theoretical model that enabled them, through the environmental, social, and governance (ESG) score of different stocks, to have information about the fundamentals of companies. When solving the investor's portfolio problem, they find the ESG-efficient frontier (using the highest attainable Sharpe ratio to adapt the capital market theory to the ESG case). They also highlight the costs and benefits of responsible investments characterized by ESG assets.
Analyzing individual companies, Aouadi and Marsat (2018) find evidence that higher corporate social performance has an impact on market value only for large companies located in countries with greater press freedom, where free internet search is possible and more frequent. Khan (2019) investigate the relationship between the companies' ESG performance and their financial performance, finding a positive relationship between them. While Consolandi et al. (2020) found ESG momentum has a significative impact on stock performance and the market would reward more companies operating in sectors with a high level of ESG concentration materiality. Edmans (2012) showed corporate social responsibility has a positive impact on stock returns and that we can use the job satisfaction levels among the predictors of future stock market performance.
Through company ESG ratings, Auer and Schuhmacher (2016) analyze the performance of socially responsible investments in the Asia-Pacific region, the United States, and Europe. The authors showed that active selection of high or low rated stocks does not provide superior risk-adjusted performance than passive stock market investments across all regions, industries, and ESG criteria. Moreover, in the Asia-Pacific region and the United States, investors who focus their investments in ESG assets have their strategies performing similarly to the broad market. While in Europe investors face a cost in directing their portfolios towards socially responsible investments. Assessing the relationship between ESG indicators and financial performance, Duque-Grisales and Aguilera-Caracuel (2021) study the Latin American market, pre-cisely the results of multinational companies in this emerging market. Based on their results, the authors argue that investing in stocks with good ESG ratings would lead to greater visibility, more stakeholder recognition, cost reductions, and greater financial performance.
However, Halbritter and Dorfleitner (2015) and Naffa and Fain (2022) relates that the difference between portfolios and funds formed by high-ESG assets and the others non-high-ESG is not statistically significant, so the financial performance would not be related to the choice of assets which comply with ESG guidelines for the portfolios. Furthermore, studies such as Hübel and Scholz (2020) went in another direction, demonstrating that assets with low ESG ratings outperform high-ESG stocks. Analyzing the beginning of the pandemic period Folger-Laronde et al. (2020) suggests that socially responsible assets do not have a better financial performance, especially in market downturns. Concerned with the effect of rating changes on stock returns, Shanaev and Ghimire (2021) show that these changes lead to non-statistically significant increases in returns in the case of ESG rating upgrades and when there are downgrades, the results are more impactful and statistically significant.

Data and methodological aspects
This section describes the test for statistical comparison of performance between standard indexes and their ESG versions, the procedures used to build investment portfolios and to measure their performance and the data.

Data
The data are composed of 12 indexes in their standard and ESG versions presented in Table 1; the choice was made based on the availability of indexes during the whole period in the study. Table 2 shows the data used to construct portfolios for the American market and Table 3 for the European portfolios. The data comprises monthly values for the indexes among January 2011 and February 2022 (the last month available when the empirical work was done). Moreover, there is information for every index at every point in time. The indexes follow the methodology adopted by the S&P Dow Jones, and the monthly values were obtained through the database provided by Bloomberg. The criteria for selecting the assets that make up the indexes follow the S&P DJI ESG Scores, which contain information about company-level ESG taking into account the individual environmental (E), social (S), and economic & governance (G) aspects. More information on the methodologies and how weights are assigned or assets excluded from these indexes can be obtained from S&P Dow Jones.

Rolling window estimation
Our first step is to define the returns. In this case, for a given index I, its return is defined as follows: The formula is the same as the case in which we are comparing the prices for two distinct periods. The estimations of the parameters to be used in the strategies for building portfolios will follow the rolling window approach. The choice of this estimation methodology was made following DeMiguel et al. (2009) and Kirby and Ostdiek (2012) who use it in their work to predict the average parameters and the covariance matrices of returns. In this approach, there is a roll over of the sample, defined by an observation window L, in which the estimation is made covering a period that goes from t − (L − 1) to the present observation (t).
According to Kirby and Ostdiek (2012), this methodology is developed in order to balance the efficiency gains in using more observations with the loss of prediction accuracy when including observations from more distant times, which have little probability of synthesizing the current conditions of the Market. In the present study, the rolling window estimators (μ) will be used to forecast asset returns, with the following formulation: In addition to the covariance matrices: withσ 2 i,t representing the elements of its main diagonal andσ i,tσ j,t the other elements when i = j.
Here, the rolling windows will cover five years of the total observations of monthly returns. There are 134 monthly observations of each index, we take their returns so that we have 133 observations, being divided into 60 in-sample observations for rolling the estimators and 73 out-sample observations.

Portfolios strategies
The portfolio selection strategies used in this article were chosen because they are approaches easy to understand and build, which can be easily adopted even by an investor without many technological resources or theoretical knowledge. Despite their simplicity, many studies such as Kirby and Ostdiek (2012) and Tavares et al. (2022) showed that they bring good results compared to the market benchmark portfolio or the naive portfolio, which in many cases is difficult to be overperformed (DeMiguel et al., 2009).

Naive strategy
The equally weighted strategy, whose record of first adoptions has already been shown in several applications, as seen in DeMiguel et al. (2009), with superior performance to strategies that are sophisticated and challenging to implement. The weight for each of the n assets in this portfolio is defined below: here each asset receives equal weight in the portfolio formulation, and the sum of the weights must be equal to one.

Mean-variance
Mean-variance optimization strategies result from combinations of weights assigned to assets that aim to achieve the lowest risk for a fixed level of return µ 0 . Alternatively, they can be obtained by setting a maximum level of risk, in which case the optimization is done aiming at a maximum level of return. Two mean-variance optimization strategies will be adopted in this experiment. In one case, short selling is allowed.
The implemented portfolio will be built using the sample forecast of expected returns and covariance matrix 1 : here l is a vector in which all elements are equal to 1.

Minimum-variance
The minimum-variance strategy is a particular case of mean-variance portfolios. The reason for the elaboration of this strategy lies in the difficulty of obtaining good estimates for the expected returns, in addition to the fact that their poor specification can bring great harm to the optimization, as seen in Best and Grauer (1991). The vector of optimal weights of this approach is given by 2 : where l is a vector of size n in which each element has value 1.

Volatility timing
The study carried out by Fleming et al. (2001) aimed to build alternative portfolios whose structuring does not depend on the estimation of expected returns, which is quite problematic. The authors then developed volatility timing strategies, in which portfolio rebalancing depends only on expected volatility. This methodology is highly attractive because of its ease of implementation. It does not require optimization or inversion of covariance matrices. In addition, it does not generate negative weights.
The methodology also has a parameter (η) that gives the speed of adjustment of the portfolio's rebalancing in response to changes in volatility. According to Kirby and Ostdiek (2012), if this parameter approaches zero, the portfolio takes a similar form to the naive portfolio. But if η → ∞ the portfolio will tend to be formed only by the asset with lower volatility.
In the present study, as in Kirby and Ostdiek (2012), the parameter η will assume the following values: {1,2,4}. In addition, volatility will be calculated using sample rolling window estimates, whose weights for each asset i are given by:

Reward-to-Risk timing
Considering that the previous set of strategies ignores the information that can be brought in through the expected returns, Kirby and Ostdiek (2012) create the reward-to-risk timing strategies. As in the previous case, this approach does not consider the elements off the main diagonal of the predicted covariance matrix. The authors argue that the construction of strategies disregarding such covariance matrix elements reduces the risk of extreme weights for the assets. The weights of this approach for any i asset when using sample rolling window estimation are given by: , which guarantees positive weights to the portfolio. The η parameters are defined similarly to the previous portfolios.

Ledoit and Wolf (2008) Sharpe ratio test
To test the statistical difference between two different portfolios or assets, Ledoit and Wolf (2008) consider i and n as returns from different assets (or portfolios), and at time t they are defined as r ti and r tn , respectively. A total of T return pairs (r 1i , r 1n ) , . . . , (r Ti , r T n ) given the sample can be observed. We assume that these observations are strictly stationary time series and that the return distribution does not change over time. The distribution of the mean vector and covariance matrix can be defined as: The sample means and sample standard deviations of the observed returns can be defined asμ i ,μ n andσ i ,σ n respectively.
The difference between two Sharpe ratios is: Therefore, its estimator is given by: The null hypothesis is H 0 : ∆ = 0. Basically, we are interested in testing if this difference is statistically significant. For a detailed explanation about the test see Ledoit and Wolf (2008).

Performance assessment of portfolios
Once the universe of strategies for building portfolios is defined, some performance measures are taken to compare their performance. The first and simplest performance measure presented is the sample average, Another important measure, which allows the synthesis of information about the risk of each portfolio, is the sample variance, which can be computed as follows:σ The standard deviationσ is given by taking the square root of the sample variance.
Furthermore, the performance of the portfolios can be measured through the Sharpe ratio (λ ), which takes into account the risk-return ratio in its formulation and is defined as follows: The Sharpe Ratio divides the return of an asset (or portfolio) by the asset's (or portfolio) volatility. Due to its simplicity and its easy interpretation, the Sharpe Ratio has become one of the most widely used statistics in financial analysis and the risk-adjusted performance metric. In spite of this, Sharpe ratio presents some shortcomings (Lo, 2002). For example, the Sharpe Ratio assumes normally distributed returns as it measures risk by volatility and it might lead to wrong investment decisions when returns deviate from the normal distribution. In addition, to compute the Sharpe ratio one needs the expected returns and the volatilities, that are unknown and need to be estimated statistically.
We also estimate the Sortino ratio (Sortino and Van Der Meer, 1991) that consider the standard deviation of the downside returns, and it is given by whereσ d denotes the downside returns. Sortino and Van Der Meer (1991) argued that risk should be measured in terms of not meeting the investment target and defined the "Minimum Acceptable Return" or MAR.
The Sortino Ratio is more sensitive to negative or extreme risks than measures that use volatility. However, this measure is a modification of the Sharpe Ratio as it replaces the standard deviation by downside deviation which only considers the negative deviations from the mean or a minimum return threshold.
The last measure we consider is the Omega ratio (Keating and Shadwick, 2002) which is given by the ratio between the average gain and the average loss, defined as follows where F(r t ) represents the cumulative distribution function of the strategy return r t and τ is the minimum return threshold. We set the target return, threshold τ, to zero as Kajtazi and Moro (2019) and as it is usual in this measure.
The Ω measure allows us to compare returns for different strategies and to rank them. The rankings depend on the interval of returns under consideration and incorporate all higher moment effects. The attractiveness of the Ω, compared to Sharpe ratio and other traditional risk ratios, is that it takes into account the entire return distribution as well as it does not rely on any particular moments in terms of value and even existence. Since Ω considers additional moments as skewness and kurtosis along with the first two moments, it produces results that vary from the other models significantly. Ranking based on Ω ratio can be similar to traditional methods when the higher moments are not too significant.
The major advantage of measuring performance with Ω is that we don't need parametric assumptions (e.g. on mean and standard deviation) and there are no constraints on the form of the underlying distribution.

Results
In this section, we first describe and discuss the results of applying the Ledoit and Wolf (2008) test to the indexes in their standard and ESG versions. Then, we present and discuss the results of the investment portfolios, comparing them with the stock market indexes in their standard and ESG versions for the American and European cases.   Table 4 presents descriptive statistics of the indexes in their standard version, while Table 5 presents the same results for the ESG versions of these indexes. By Table 5, we can notice that the index with greater maximum observed monthly return was S&P Mid-East and Africa Emerging LargeMidCap for both versions. The index with a lower minimum observed monthly return was the Australian S&P/ASX 200 in both versions. Some indexes as S&P North America LargeMidCap have the larger monthly return mean in its ESG version, while others such as S&P/ASX 200 have a greater mean return in its standard version. In both formulations, S&P Mid-East and Africa Emerging LargeMidCap are the indexes with greater volatility.

Comparing ESG versions with the standard ones
Analyzing the performance of the indexes in terms of Sharper ratio, from Table 6, we can see that the differences between the Sharpe ratios of the indexes in their standard and ESG versions are not statistically significant at 5% confidence, except for the Australian index, in this case the difference was statistically significant. This shows us that in almost all cases, there is no real difference between choosing to invest in an index in the standard or ESG versions since the null hypothesis is ∆ = 0, which we accept and, therefore, we cannot say that there is any difference between the Sharpe ratios of the two formulations of the indexes, in terms of financial performance. The Sharpe Ratio summarizes the relevant information on risk and returns on assets and it is a good and commonly-used measure of asset performance. Thus, we can say that the investor at least does not lose in terms of performance if he chooses to allocate his investment in ESG assets instead of them in their standard formulation. Moreover, in the case of the Australian market, the investor would even have a positive result in terms of performance by making this option. This result is in line with Halbritter and Dorfleitner (2015) and Naffa and Fain (2022), which showed that the difference between portfolios and funds formed by high-ESG assets and the other non-high-ESG is not statistically significant. Nevertheless, note that we did not compare them with non-high-ESG indexes; we only compared them with their standard version.

Portfolio selection with SP 500 ESG and CARBON EFFICIENT-USA
In this subsection, the results of the portfolio selection with the strategies described in the previous section are presented. The results are presented with 3 different scenarios for the American case, in the first case we use the standard version of the SP 500, in the second we replace the SP 500 with its ESG version, and in the last case this replacement is made with the Carbon Efficient version of the index.
The rolling window estimators for variance and sample mean of returns are defined with L = 60. For the out-of-sample period, the estimations are made for each period taking into account the monthly observations of the index returns in the previous five years. Table 7 provides descriptive statistics of asset returns used in the portfolio selection problem for the American case. Table 8 describes the correlation between the assets used in the portfolios.
The performance measures for the portfolios in the out-sample period are presented in Table 9. We can see that, in all performance measures, the results in terms of performance are very close to the estimated portfolios with SP 500 in their ESG versions, and Carbon Efficient compared to the standard version. The differences are minimal in relation to the performance by all measures, i.e., mean returns, standard deviation, Sortino ratio, and Omega ratio.
In general, we note that, as described by DeMiguel et al. (2009), the naive portfolio has a good performance in all cases presented in Table 9. However, the portfolios proposed by Kirby and Ostdiek (2012) outperform the naive portfolio (or they are very close to) in almost all cases and measures, showing once again the efficiency in terms of the Volatility Timing and Rewardto-Risk Timing portfolios. When we analyze only the average return of the portfolios, the 1/n portfolio only loses in performance to the RWR(1) and the mean-variance portfolio with short selling. The same happens in the case where the SP 500 is replaced by its ESG version. However, we can note that for the scenario where we replace the index with its Carbon Efficient version, the naive portfolio is not defeated in terms of average return in the outof-sample period by any other, evidence in favor of DeMiguel et al. (2009) (again showing how difficult it is to outperform the 1/n portfolio in terms of the average return. ). The Ledoit and Wolf (2008) test allows us to have a more technical comparison between the portfolios' performance with the standard and alternative SP 500 versions. When analyzing the statistical significance for the difference in Sharpe ratio of the portfolios built using the American market index in its ESG version and its standard version, we can see that this difference is not statistically significant, so we do not rule out the null hypothesis that the difference between the indexes is equal to zero. In cases where we have this statistically significant difference, as in the case of the Volatility Timing portfolio (4), the difference is positive; that is, the investor would be better-off by replacing the asset in its standard version with the version ESG for the portfolio composition.
For the case where we replaced the standard version of the SP 500 with the Carbon Efficient version, the results differed from the previous case. The p-values are mostly very high, and the difference between the Sharpe ratios is negative (which could indicate a lower performance if these differences were statistically significant). Therefore, we can say that although the differences between the indexes are negative, which suggests that replacing the index with its Carbon Efficient version would decrease the performance in terms of the Sharpe ratio, the differences are not statistically significant, so we can say that the hypothesis in which the strategies have equal performance in terms of Sharpe ratio using the two versions of the index can not be rejected. We have a statistically significant difference at 10% for the RWR(1) portfolio; in this singular case, we can say that the replacement harmed the portfolio's performance in terms of Sharpe ratio with a 10% significance level. Figure 1 presents the average weights of each asset for the different portfolio selection strategies for the period outside the sample when we use the SP 500 index in its standard version for the construction of portfolios; Figure 2 presents the same data for the case where we replaced the index with the ESG version; Figure 3 for the scenario where the replacement was made with the Carbon Efficient version. One can visually notice that the replacement of indexes has minimal impact on how assets are selected to compose the portfolios. We do not present the graph of the 1/n portfolio because it always assigns equal weights to the indexes.

Portofolio selection with EURO 350 ESG -EUROPE
Now the results for the European case are presented. Again, we compare the results obtained by the same strategies of portfolio selection from the previous subsection, but with indexes that represent some European and global markets. The analysis is made in the same way, comparing how the replacement of SP EURO 350 by the ESG version would impact the performance of investment portfolios. Unfortunately, there was no data available for the Carbon Efficient version of the index for the entire period of study, so we only compared the portfolios built using the standard version with the ESG version.
Again, the rolling window estimators for variance and sample mean of re- turns used as inputs in the portfolio selection problem are defined with L = 60. For the out-of-sample period, the estimations are made for each period taking into account the monthly observations of the index returns in the previous five years. Table 12 provides descriptive statistics of asset returns used in the portfolio selection problem for the European case. Table 13 describes the correlation between the assets used in the portfolios' strategies.
Analyzing the performance of the portfolios by all the measures presented in Table14, we can see that, in general, only the Mean-VARD portfolios (mean-variance with short selling) and the Reward-to-Risk Timing portfolio (1) obtained a performance superior to portfolio 1/n. It was expected that when short selling is allowed, the portfolio would outperform the market benchmark portfolio. However, we see that only the Kirby and Ostdiek (2012)  When we compare the results from the Ledoit and Wolf (2008) test for the portfolios built considering the two scenarios with SP EURO 350 in its standard version and its ESG version, they differ from the American case. For the European market, we note that the difference is not significant for any portfolio, even with a significance level of 10%, and that the differences in some cases are negative. However, as the p-values are very high, we can accept for all portfolios that the difference in performance in terms of Sharpe ratios is not statically significant. This means that the investor is not harmed by replacing the European market stock index with its ESG version. Losses are not statistically significant. Moreover, one can notice from Table 14 that considering all the different performance measures, the results are very close when selecting portfolios using the standard or ESG version of the index.
Observing Figures 4 and 5, we can see that the replacement of the indexes did not bring significant changes in how the weights are assigned to the assets. Especially concerning the SP EURO 350, one can see that the weights are assigned very similarly in the two scenarios.
Finally, the analysis of the European case complements the analysis made with the American market since it shows that such substitution does not harm the investor in both markets.

Final considerations
From the first approach adopted in this study, we compared indexes in their traditional and ESG versions for different stock markets in different parts of the world. We noticed that the indexes in their standard and ESG versions have similar performance. Moreover, in terms of Sharpe ratio, if the investor wanted to replace his investments with ESG versions in the stock market, he would not have any losses, and in some cases she may even benefit from this choice. The results show that, comparing the performance of the individual indexes by the Sharpe test, the indexes are not statistically different in their ESG and standard versions. In the two cases where there is a statistically significant difference, the ESG indexes have a superior performance in terms of Sharpe ratio.
Regarding the portfolios built using the indexes in their standard version in ESG, we notice that the returns are also not statistically different in terms of Sharpe ratio. Furthermore, the average weights assigned to the SP 500 and EURO 350 indexes in the portfolios do not change significantly with its replacement by its ESG and Carbon Efficient versions for the American and the ESG version for the European case. Since the replacement of indexes in their standard version by ESG does not affect the performance, the investor concerned about investing in companies that follow the ESG guidelines can replace their standard investments with similar ESG investments without harming their investments' performance.
We have no evidence to say that the substitution would bring gains to the investor, as in Edmans (2012), Khan (2019) and Consolandi et al. (2020). However, our analysis complements other studies such as Halbritter and Dorfleitner (2015) and Naffa and Fain (2022) since it is a different approach that empirically shows that there is no harm to the investor in terms of financial performance when he chooses assets taking into account the spheres environmental, social and governance.