Manoel Francisco de Souza Pereira Nonparametric Estimation of Risk-Neutral Distribution via the Empirical Esscher Transform

Pereira, Manoel Francisco de Souza; Veiga Filho, Álvaro de Lima (Advisor). Nonparametric Estimation of Risk-Neutral Distribution via the Empirical Esscher Transform. Rio de Janeiro, 2016. 127p. PhD Thesis – Departamento de Engenharia Elétrica, Pontifícia Universidade Católica do Rio de Janeiro. This thesis is comprised of three studies concerning the use of an empirical version of the Esscher Transform for nonparametric option pricing. The first one introduces the empirical Esscher transform and compares its performance against some well-known parametric option pricing approaches. In our proposal, we make only mild assumptions on the pricing kernel and there is no need for a risk-neutral model. In the second study, we propose a method for nonparametric option pricing under a GARCH framework with nongaussian innovations. Several papers have extended nonparametric option pricing and provided evidence that this methodology performs adequately in the presence of realistic financial time series. To represent a realistic time series, we use a new class of observation driven model, called dynamic conditional score model, proposed by Harvey (2013), for modeling the volatility (and heavy tails) of the asset’s price. Finally, in the third study, we introduce a new approach for indirect estimation of state-prices implicit in financial asset prices from empirical Esscher transform. First, we generalize the discrete version of the Breeden and Litzenberger (1978) method for the case where states are not equally spaced. Second, we use the historical distribution of the underlying asset’s price and the observed option prices to estimate the implicit Esscher parameter.


Acknowledgments
To God.
To CNPQ for the financial support and PUC-RJ for opening the doors of knowledge in my life.
To my advisor, Alvaro Veiga, who since my master's degree shared with me all his experience and knowledge. Moreover, in the most difficult moments of my life, you helped me with words of courage. I do not have words to thank you enough.
To Professor Cristiano Fernandes for the corrections and for providing the information needed to complete this work.
To Professor Caio Almeida for all the information about bibliography that was essential to conclude this work.
To the secretarial staff for their assistance and kindness, especially to Alcina Portes and Adriane.
To Mrs. Ana Maria for offering me all the necessary infrastructure during my periods of studies. Without her help the burden would be heavier.

To my brothers and sisters for all the incentive.
To Flávia for the support and love.

Palavras-chave
Estimação não paramétrica; estimação indireta; preço de estado; probabilidade neutra ao risco; transformada de Esscher empírica.     fundamental theory of asset pricing says that, if a market model has a risk-neutral measure Q, then it does not admit arbitrage. The conditions that the risk-neutral probability structure must satisfy are that the discounted price process has zero drift and it must also be equivalent to the original structure (i.e., the same set of price paths must have positive probability under both structures). Then, a class of pricing kernels, or Radon-Nikodym derivatives, can be specified and impose restrictions that ensure the existence of a measure Q. In this case, the measure Q can be obtained without completely characterizing equilibrium in the economy (Christoffersen, Elkamhi, Feunou, andJacobs, 2010, Christoffersen, Jacobs andOrnthanalai, 2013).

List of Tables
In both cases, these approaches require the formulation of an explicit riskneutral model and are restricted to a few probability distributions for modeling the economy's uncertainty. However, empirical observations of asset returns showed several stylized facts, which highlight the parametric misspecification risk for the used stochastic process. Hence, due to the poor empirical performance of parametric methods, the nonparametric option pricing techniques have expanded rapidly in recent years, because they offer an alternative by avoiding possibly biased parametric restrictions (Haley and Walker, 2010).
This thesis presents two methods to determine the measure Q and to price European call option from nonparametric methods.

Objectives
The main objective of this thesis is to verify if simple assumptions on empirical pricing kernel are able to generate a measure Q that produces theoretical prices closer to those observed in the market.
From our investigation, we are able to derive three articles. The first article introduces the empirical Esscher transform and studies the nonparametric option pricing. In the second, we demonstrate that our proposal is flexible and performs very well in the presence of realistic financial time series and, in the third article, we use the empirical Esscher transform to include assets' and derivatives' data to get a risk-neutral probability.

Main Contributions
In general, in nonparametric option pricing methods, the historical distribution of the asset's prices is used to predict the distribution of future prices and the maximum entropy principle is employed to transform the empirical distribution into its risk-neutral counterpart, by minimizing some information criterion (Stutzer, 1996, Haley and Walker, 2010, Almeida and Azevedo, 2014.
As the change of measure does not involve the distribution of the model's innovations, this method of risk-neutralization is applicable even when there are restrictions under the innovations' probability distribution.
Roughly speaking, on nonparametric option pricing there are two main objectives, not necessarily excludent. In the first type, the main objective is the adoption of other functions (for example, the Cressie-Read family) as alternative ways of measuring distance in the space of probabilities. While the second one is more focused on providing evidence that the methodology is flexible and that it performs adequately in the presence of realistic financial time series. Our first and second contributions are in this direction.
In our first contribution (Chapter 2), we assume that the empirical pricing kernel is known and given by an empirical version of the Esscher transform (1932). This assumption is reasonable, because it is well known in information theory that a problem of entropy maximization has its solution in the form of the Esscher transform (Buchen and Kelly, 1996, Stutzer, 1996, Duan, 2002. In our second contribution (chapter 3), we use a recent proposal of dynamic model, proposed by Harvey (2013), which offers an alternative to model the volatility (and heavy tails) of observed underlying asset price using GARCH models and analyzes the nonparametric option pricing method.
In our third contribution (Chapter 4), we introduce a new approach for indirect estimation of the implicit state-price in financial asset prices using the empirical Esscher transform. First, we generalize the discrete version of the Breeden and Litzenberger (1978) method for the case where states are not equally spaced. Second, we use the empirical Esscher transform to include underlying assets' and derivatives' data. We use the historical distribution of the underlying asset prices and the observed option prices to estimate the implicit empirical Esscher parameter. Then, we fit polynomials between the implied Esscher parameter and the strike price, as in Shimko (1993), to obtain our measure Q.
In order to evaluate the flexibility of the proposed method to adapt to different levels of maturity and moneyness, experiments (with synthetic data and real data) are performed and compared to main benchmarking.

Outline
The papers in this thesis share the common theme of asset pricing under a nonparametric structure. Each of these papers examines a distinct, well defined research problem related to the main topic and occupies a separate chapter.
Moreover, the thesis contains sections on literature review, methodology and case study.
The remainder of this thesis is organized as follows. Chapter 2 presents an empirical version of the Esscher transform (1932). Chapter 3 uses a more realistic data generator process to analyze the nonparametric option pricing method.
Chapter 4 introduces a new approach for indirect estimation of implicit state-price in financial asset price using empirical Esscher transform. In chapter 5 we conclude by summarizing the main results of this thesis in more detail. All the works cited in the thesis and other relevant documents are presented in the Bibliography. The Appendix is included at the end of this document.

Introduction
In most option pricing models, the fair price is determined from the expected value of its cash flow, under a risk-neutral probability (measure Q), and discounted by a risk-free rate. Under the assumption that the market is dynamically complete, it could be shown that every derivative security can be hedged and the measure Q is unique (Bingham and Kiesel, 2004). However, incomplete markets exist for many reasons and, according to the second fundamental theorem of asset pricing, we have an infinite number of measures Q under which one can get prices of derivatives. Then, how to choose a measure Q from an infinite set of possible measures?
According to Danthine and Donaldson (2015), the literature highlights two approaches to this problem: models based on the general equilibrium (Arrow, 1964, Debreu, 1959, Lucas, 1978, Brennan, 1979 and the models based on absence of arbitrage (Black-Scholes, 1973, Cox and Ross, 1976, Harrison and Kreps, 1979, Harrison and Pliska, 1981. In the general equilibrium, the supply and demand interacts in various markets affecting the prices of many goods simultaneously. The valuation of assets occurs when the markets are balanced, that is, when the supply equals the demand. Thus, from a theoretical connection between macroeconomics (aggregate consumption) and financial markets, the marginal rate of substitution is used to determine a measure Q by solving a utility maximization problem.
In absence of arbitrage, we are appealing to the law of one price. This states that the equilibrium prices of two separate units of what is essentially the same good should be identical. If this was not the case, a riskless and costless arbitrage opportunity would open up: buy extremely large amounts at the low price and sell them at the high price, forcing the two prices to converge. The first fundamental theory of asset pricing says that, if a market model has a measure Q, then it does not admit arbitrage. The conditions that the risk-neutral probability structure must satisfy are that the discounted price process has zero drift and it must also be equivalent to the original structure. Then, a class of pricing kernels, or Radon-Nikodym derivatives, can be specified and impose restrictions that ensure the existence of a risk-neutral measure. In this case, the measure Q can be obtained without completely characterizing equilibrium in the economy (Christoffersen, Elkamhi, Feunou, andJacobs, 2010, Christoffersen, Jacobs andOrnthanalai, 2013).
In both cases, these approaches require the formulation of an explicit riskneutral model and are restricted to a few probability distributions for the measure Q. First, because it is difficult to characterize the general equilibrium setup underlying a Risk-Neutral Valuation Relationship (RNVR), see for example Duan (1995Duan ( , 1999. Second, it is possible to investigate option valuation for a large class of conditionally heteroskedastic processes (Gaussian or non-Gaussian), provided that the conditional moment generating function exists. Christoffersen, Jacobs and Wang (2004), cite that they help explain some stylized facts (smile effect, volatility variability over time and presence of clusters in certain periods) in a qualitative sense, but the magnitude of the effects is insufficient to completely solve the biases. The resulting pricing errors have the same sign as the Black-Scholes (1973) pricing errors, but are smaller in magnitude.
Due to the poor empirical performance of parametric methods, according to Haley and Walker (2010), the nonparametric option pricing techniques have expanded rapidly in recent years. In these methods, the historical distribution of prices is used to predict the distribution of future asset prices. According to Stutzer (1996), by using past data to estimate the payoff distribution at expiration, it permits more accurate assessment of the likely pricing impact caused by investors' data-based beliefs about the future value distribution. Moreover, it offers an alternative by avoiding possibly biased parametric restrictions and reducing the misspecification risk.
There are two ways to nonparametric estimate risk-neutral probabilities implicit in financial instruments: the methods that seek to infer the empirical riskneutral probability from option market 1 (kernel, maximum entropy, and curve fitting) and the methods that seek to infer the empirical risk-neutral probability from asset price (with or without option price), as canonical valuation developed by Stutzer (1996). In the case of canonical valuation, the maximum entropy principle is employed to transform the empirical distribution into its risk-neutral counterpart, by minimizing the Kullback-Leibler information criterion (KLIC).

Several papers have extended Stutzer's original work and demonstrated
that the methodology is flexible and performs very well in the presence of realistic financial time series, see Gray and Newman (2005), Gray, Edwards, and Kalotay (2007), Alcock and Carmichael (2008), Haley et al (2010) and Almeida and Azevedo (2014). Other researchers, as Haley et al (2010) and Almeida et al (2014), suggested the adoption of members of the Cressie-Read family of discrepancy functions as alternative ways of measuring distance in the space of probabilities.
This paper introduces an empirical version of the Esscher transform (1932) for nonparametric option pricing. We assume that the empirical pricing kernel is known and given by an empirical version of the Esscher transform (1932). This assumption is reasonable, because it is well-known in the information theory that a problem of maximum entropy has its solution in the form of the Esscher transform (Buchen and Kelly, 1996, Stutzer, 1996, Duan, 2002. Duan (2002) also develops a nonparametric option pricing theory based on Esscher transform (1932). He uses a binary search to find the Esscher parameter and the measure Q is evaluated using the standard polynomial approximation formula. In our case, we use a consistent estimator for the moment generation function and we avoid the use of intensive computational methods. As the change of measure does not involve the distribution of the model's innovations, this method of risk-neutralization is applicable even when the moment generating function of the innovations' probability distribution does not exist. Hence, we obtain a method that does not require a set of restrictive assumptions for the formulation of a specific model; that provides a clear and easy way to obtain a risk-neutral distribution; is adaptable and flexible to respond to changes in the data generating process; and explores the whole cross-section information contained in the underlying asset's price.
In many applications, the empirical pricing kernel is the object of interest because it describes risk preferences of a representative agent in an economy, and the risk aversion function estimates the investors' expectations about future return probabilities (Hansen and Jagannathan, 1991, Aït-Sahalia and Lo, 2000, and Jackwerth, 2000. Our objective is to verify if mild assumptions on the empirical pricing kernel are able to obtain option prices closer to the observed in the market. In order to evaluate the flexibility of the proposed method to adapt to different contexts, three experiments (two with synthetic data and one with real data) are performed. We compare the proposed pricing method to the Black-Scholes (1973) model and the Heston (1993) model.
The paper is organized as follows. Section 2.2 discusses the Esscher transform. In Section 2.3, we introduce the empirical Esscher transform. Section 2.4 presents the methodology we use to compare the different pricing methods, and the results are discussed in Section 2.5. Finally, Section 2.6 concludes.

The Esscher transform
Let be a random variable with probability density function ( ) and let be a real number. Then, the Esscher transform (ET) of ( ) with Esscher parameter is given by ( ; ), defined as: Note that ( ; ) is also a probability density function since it integrates one. Furthermore, the ET can be interpreted as a reweighted version of ( ), with reweighting function given by: (2. 2) The denominator of this expression represents the moment generating function (mgf) of ( ), denoted by: In this case, for the Esscher transform to exist, the mgf of X must exist, which precludes some well-known density functions, like the t-student. Hence, the ET of ( ) can be expressed as: Consider now the ET of the density ( ) of , the log-return of an asset for a period , given by: (2.5) Gerber and Shiu (1994) proposed to use the ET of as the risk-neutral distribution (RND) for the log-return of this asset. They call it Risk-Neutral Esscher Transform (RNET). In this context, ( ) is referred to as the physical probability measure P and ( ; ), the ET of ( ) , is identified as the riskneutral measure Q or, still, the equivalent martingale measure.
Let be the price of an asset at time t. According to the fundamental theorem of asset pricing (Bingham et al, 2004), the risk-neutral value � ( ) 0 � � of a derivative ( ) with maturity T is given by the expected value of the payoff under the measure Q, discounted by the risk-free rate of return for period T, : This is also true if the derivative is the asset itself so that � ( ) . This imposes the non-arbitrage constraint: (2.7) Now, defining the measure Q as the ET of ( ), we obtain the following condition for the value of : (2.8) Hence, the measure Q is given by Gerber et al (1994) explores several different distributional assumptions to , price dynamics and log-returns of an asset. They show that the RNET encompasses the classical option pricing formula of Black-Scholes (1973)

for
Wiener processes, and the Binomial Model (Cox, Ross and Rubinstein, 1979). 3 Duan (2002) explored empirical distribution of and develops a nonparametric option pricing theory based on the first equation in (2.8). In this case, he used a binary search to find * , the integral was evaluated numerically and ( ; ) was evaluated using the standard polynomial approximation formula.
The RNET can also be applied to incomplete markets, which admit infinite measure Q. It provides an economic justification for selecting this particular transform, since it emerges as the solution for the problem of pricing a derivative under a power utility function (see Gerber et al, 1996).
Moreover, using the relative entropy principle, the risk-neutral density can be obtained from the following problem: where ( ) is the model known and the discrepancy between it and another model ( ) can be obtained by minimization of an information criterion.
It is well-known in the information theory that the programming problem in (2.9) has its solution in the form of the ET (Buchen and Kelly, 1996, Stutzer, 1996, Duan, 2002.
Using the above expressions, it is easy to check that the sample version of the noarbitrage condition is given by: Then, the empirical risk-neutral measure Q is given by � , � * � =1, , with Again, if � � exists for all ∈ ℜ then � ( + 1) and � ( ) will converge in probability to their respective population values and, by consequence, the solution of the non-arbitrage constraint will also converge, i.e., � * → * .
The price of a European call on a non-dividend-paying stock is obtained under the risk-neutral distribution ( ) and the payoff is discounted at the deterministic risk-free rate : or, alternatively, where T is the time to maturity, is the underlying asset price, K is the strike price, ( ) is the physical distribution of the asset price at the option's expiration and ( ) = ( ) ( ) ⁄ is the pricing kernel, characterizing the change of measure ( ) to ( ).

Methodology
This section presents the methodology that is used to compare the proposed method to artificial and real data. To investigate its applicability in some settings, the empirical Esscher transform is applied to price European call options across a range of moneyness and maturities. The algorithm for our method is: 1. Simulate the physical distribution for , , = 1, … , ; 2. Compute the empirical Esscher parameter, � * , using the equation (2.16); 3. Compute the option price with the equation (2.19).
In experiment 1, the asset price (step 1) follows a Geometric Brownian Model (GBM), under the physical measure: where is the underlying asset's price at time t, is the expected rate of return, is the volatility and follows a Wiener process. We compare the empirical Esscher transform simulated option prices to the true price, Black-Scholes model.
Using the assumptions that the future asset price follows a lognormal distribution, and the returns follow a normal distribution, we obtain the Black-Scholes (1973) formula for the price, at time 0, of a European call option on a non-dividend-paying stock, The functions N(d 1 ) and N(d 2 ) are the cumulative probability distribution function for a normal variable with zero mean and variance equal to 1, C is the price of the call option, 0 is the stock price at time 0, r is the risk-free interest rate continuously compounded, and σ is the asset volatility. Based on the works of Hutchinson and Poggio (1994), Stutzer (1996) and Gray et al (2005), we use an annualized volatility of 20%, a drift of 10% and the riskless rate of interest is assumed to be a constant of 5%.
In experiment 2, the asset price (step 1) follows the Heston (1993) model, which assumes a diffusion process for the asset price and another stochastic process for the volatility. The asset price follows the diffusion, under the physical measure: where 1, is a Wiener process. The volatility � follows the diffusion: where 2, is a Wiener process that has correlation with 1, , is the long-run mean of the variance, is a mean reversion parameter, is the volatility of variance. We compare the empirical Esscher transform simulated option prices to the true price, the Heston models.
The price of a European call option with time to maturity ( − ), is given by: (2.27) The quantities 1 and 2 are the probabilities that the call option expires in-the-money, conditional on the log of the asset price, = ( ), and on the volatility , each at time . The probabilities can be obtained by inverting the characteristic functions defined below. Thus: (2.33) In these expressions = − is the time to maturity, = √−1 is the imaginary unit, 1 = 1 2 ⁄ , 2 = − 1 2 ⁄ , 1 = + − , and 2 = + . The parameter represents the price of volatility risk as a function of the asset price, volatility, and time.
We use the Euler discretizations for the stochastic process of price and volatility. Based on the works of Lin, Strong and Xu (2001), Zhang and Shu (2003) and Gray et al (2005), we use the following values: = 3.00, =0.04, =0.40 and = −0.50. The initial value of the volatility equals to its long-term average. For consistency with the Black-Scholes world simulations, the drift of 10% and the riskless rate of interest is assumed to be a constant 5% continuously compounded.
In these artificial experiments, the moneyness ( / ) is equal to 0.90, 0.97, In experiment 3, we evaluate nonparametric opting pricing with real data.
We compare the prices of the proposed method (EET) to Stutzer prices (STZ) and Black-Scholes prices (BSM). For each time to maturity T, we perform bootstrap with replacement on historical returns of the underlying asset. We follow the sequence: (a) we construct a single trajectory for the asset price by drawing a certain quantity of historical log returns. For example, if the option has 17 days to maturity, then we draw the same quantity; (b) we accumulate the log returns of this trajectory and we obtain one price; (c) we repeat the process (a) and (b) 252 times to construct the physical distribution for the price at maturity (step 1). We obtain the risk-neutral measure (step 2) and we calculate the option price (step 3).
We repeat this procedure 15.000 times and we calculate the MAPE.
The Stutzer (1996) method begins with the asset's historical distribution of T-year gross returns , = 1, … , , which are expressed as price relatives. By the maximum entropy principle, it shows that the risk-neutral probabilities are: where * is the Lagrange multiplier, given by the following minimization (2.35) Note that the equation ( Stutzer (1996) reports that the performance of canonical valuation improves when a small amount of option data is used. 5 These data are specifics to the Brazilian market. 6 All required data are obtained from Bovespa (http://www.bmfbovespa.com.br).
(121/252) obtained by linear interpolations. 7 Table 2.1 presents the main descriptive statistics of the log returns.     We also analyze the behavior of the empirical Esscher parameter. The results are presented in Figure 2.3 and tables 2.7, 2.8, 2.9 and 2.10. In the Figure   2. synthetic data with the one obtained for real data, the more important change is the signal. That is, the Esscher parameters obtained with synthetic data are simulated with a drift ( = 10.00%) greater than the risk-free rate ( = 5.00%).
Thus, the negative parameter shifts the risk-neutral distribution to the left, what eliminates the risk premium and assures the average yield equal to risk-free rate.
In real data, the opposite happens. The positive parameter shifts the risk-neutral distribution to the right. This is contrary to financial theory. However, this does not constitute an arbitrage opportunity, because the daily risk-free rate is between the worst and the best daily return (see Cox et al, 1979). As we can see in figures 2.1 and 2.2, and in Table 2.2, the price time series are in fall, and in this case, applications in risk-free interest rates are paying more than these stocks.

Conclusions
In this paper, we propose an empirical version of the Esscher transform for nonparametric option pricing. We conduct artificial experiments in Black-Scholes and Heston worlds and real experiments to explore the potential usefulness of the proposed method.
Artificial results show that the EET prices improve alongside sample size.
EET also provides higher prices for all maturities, and the MAPE decreases as moneyness does.
Real data results show that, when the stochastic process of underlying asset is unknown, the lowest pricing errors are between the nonparametric methods. For a maturity equal to 17, the nonparametric methods present similar results. For others maturities, the proposed method presents the lowest MAPE for moneyness equal to deep-out-of-the-money, out-of-the-money and at-the-money.
We also analyze the behavior of the empirical Esscher parameter. We can highlight that the standard deviation decreases with the maturity and with the increase in the sample size, and the values of the descriptive statistics begin to converge to a constant value in larger samples. When we compare only the empirical Esscher parameter obtained for synthetic and real data, the more important change is the signal. That is, the Esscher parameters obtained with synthetic data are simulated with a drift ( = 10.00%) greater than the risk-free rate ( = 5.00%). Thus, the negative parameter shifts the risk-neutral distribution to the left, what eliminates the risk premium and assures the average yield equal to risk-free rate. With real data, the opposite happens. The positive parameter shifts the risk-neutral distribution to the right. This is contrary to financial theory.
However, this does not constitute an arbitrage opportunity, because the daily riskfree rate is between the worst and the best daily return. Price time series are have been falling and in this case, applications in risk-free interest rates are paying more than in these stocks.
Further research can be done comparing the proposed method to other nonparametric pricing methodologies, verifying Monte Carlo simulation techniques and obtaining the Greeks.

Introduction
Volatility is perhaps the most used risk measure in finance and it is a key factor in option pricing and asset allocation. Although asset return volatility is well defined, it is not directly observable. What we observe are prices of an asset and of its derivatives, leading to several important implications in studying and modeling the volatility. Empirical evidences also show that asset return volatility is stochastic and mean reverting, it responds asymmetrically to positive and negative returns, and the systematic patterns in implied volatility indicate that only an asymmetric probability distribution with heavier tails would be able to produce market prices (Tsay, 2013).
In option pricing, there are two directions in which we model volatility: continuous time and discrete time models. Christoffersen, Heston and Jacobs (2011) argue that continuous time models have become the workhorse of modern option pricing theory, given that they offer closed-form solutions for European option and have the flexibility of incorporating stochastic volatility, stochastic jumps, leverage effects and various types of risk premia (Heston, 1993, Bakshi, Cao and Chen, 1997and Broadie, Chernov and Johannes, 2007. In a discrete time setting, the stochastic volatility is often modeled using extensions of the autoregressive conditional heteroscedasticity (ARCH) model proposed by Engle (1982) and generalized by Bollerslev (1986) through (GARCH) framework. When regard to option pricing, they offer, at least, five advantages with regard to continuous time pricing models. First, it may be considered an accurate numerical approximation of continuous time models Cao, 1992 andNelson, 1996) avoiding any discretization bias.
Second, their predictions are exactly compatible with the filter to extract the variance. Third, estimation is computationally fast. Fourth, the volatility is observable in each time point. Fifth, they can incorporate multiple factors (Engle and Lee, 1999), long memory (Bollerslev and Mikkelsen, 1996), and non-Gaussian innovations (Bollerslev, 1987, andNelson, 1991).
The theoretical price of an option is evaluated as the discounted expected value of the payoff function under a martingale measure. In general, to change the physical measure to an artificial one, in continuous time models, used the Girsanov theorem. 8 In discrete time models, according to Christoffersen, Elkamhi, Feunou and Jacobs (2010), a class of pricing kernels, or Radon-Nikodym, can be specified and impose restrictions that ensure the existence of an artificial measure.
Obviously, option pricing models with Gaussian innovations cannot capture the skewness, the kurtosis and the leptokurtosis of the financial data. The use of stochastic jumps is the preferred approach to deal with the shortcoming of models with Gaussian innovations in continuous time (Bates, 2000, Eraker, Johannes and Polson, 2003, Chernov, Gallant, Ghysels and Tauchen, 2003. In discrete time, the moment generating function is only able to risk-neutralize a few probabilities distributions, beyond the Gaussian case. For example, gamma distribution (Siu, Tong and Yang, 2004), inverse gaussian innovations (Christoffersen, Jacobs and Heston, 2006), smoothly-truncated stable distribution (Menn and Rachev, 2005), generalized error distributions innovations (Christoffersen, Jacobs and Minouni, 2006) and generalized hyperbolic innovations (Chorro, Guégan andIelpo, 2008 andBadescu, Elliott, Kulperger, Miettinen andSiu, 2011).
In situations where we need to model innovations with heavy-tailed distributions such as Student't, we can use three methods, according to Liu, Li and Ng (2015). The first method used is a Markov-switching GARCH model with Student's t innovations. To avoid changing the measure, Satoyoshi and Mitsui (2011) replaced the drift term in the conditional mean equation with the risk-free interest rate. They justified the drift term's replacement by using the assumption that investors in the real world are risk-neutral, requiring no compensation for risk.
The second method does not require any distributional assumption for the innovations. The asymmetric GJR-GARCH model 9 , implemented by Barone-Adesi, Engle, and Mancini (2008), consists of two steps: first, they estimate the GARCH parameters using historical returns; in the second step, they calibrate the model to the observed market prices. The drift term in the conditional mean equation is determined in such a way that the expected asset return implied by the model was the risk-free interest rate.
The third method, propose by Badescu and Kulperger (2007), is similar to that of Barone-Adesi et al (2008), but without resorting to option prices. The authors estimate the GARCH parameters by quasi-maximum likelihood and then, approximate the unknown innovation distribution function using a kernel density estimator based on the standardized residuals. They compute option prices by simulating stock prices under the physical measure and by evaluating Radon-Nikodym derivatives.
Liu et al (2015) proposes one more method for heavy-tailed distributions under GARCH models with Hansen's skewed-t distributed innovations. They use the canonical valuation method, developed by Stutzer (1996), to identify a riskneutral measure. That is, the risk-neutralization is applied to the empirical distribution of the sample paths generated from the assumed model. The change of measure does not involve the distribution of the model's innovations, this method of risk-neutralization is applicable even when the moment generating function of the innovations' probability distribution does not exist. Maximum entropy principle is then employed to transform the empirical distribution of the sample of future asset returns distribution into its risk-neutral counterpart, by minimizing the Kullback-Leibler information criterion (KLIC).
In a spirit similar to the canonical valuation method of Stutzer (1996), Duan (2002) develops a nonparametric option pricing theory without resorting to option prices. He formalizes the risk-neutralization process so that one can infer directly from the price dynamics of the underlying asset to establish the riskneutral pricing dynamic under GARCH framework. 10 Transforming one-period asset return empirical distribution to normality is a key step in constructing this nonparametric option pricing theory. Applying the relative entropy principle with the condition that the expected asset returns are equal to the risk-free rate, one can derive the risk-neutral distribution for the normalized asset return.
In this work, we propose a method to obtain European option prices under a GARCH framework with non-Gaussian innovations. Several papers have To avoid the formulation of a restricted model, risk-neutralization is applied to the empirical distribution of the sample paths generated from the assumed model, as Liu et al (2015) and Duan (2002). To identify a risk-neutral measure, we use the empirical Esscher transform. Hence, the sample paths are reweighted, giving rise to a risk-neutralized sample from which option prices can be obtained by a weighted sum of the options' payoffs in each path. We will compare our approach empirically to two competing benchmarks: Black-Scholes (1973) and Heston and Nandi (2000).
The paper is organized as follows. In section 3.2 we introduce the proposed method. Section 3.3 presents the methodology used to compare the different pricing methods, and the results are discussed in section 3.4. Finally, section 3.5 concludes.

Proposed Method
In this section, we develop a data generator process dynamic that captures the most important features of the return process.

Asset Price Dynamic
Under physical measure the GARCH model assumed is: where is the asset price at time , is a constant, ℎ is the conditional variance, with its evolution captured by some GARCH-type model, and { } is the sequence of standardized innovations, which are independent and identically distributed random variables with zero mean and unit variance. It follows the conditional mean and variance of are and ℎ , respectively.
According to Tsay (2013), for most asset return series, the serial correlations are weak, if there is any. Thus, building a mean equation, , to remove the sample mean from data if the sample mean is significantly different from zero. In some cases, a simple auto-regressive model might be needed. In other cases, the equation mean may employ some explanatory variables, such as indicator variables to capture possible daily effects.
In option pricing, the form of the equation (3.1) depends on an underlying risk-neutral model. For example, in Duan (1995), was replaced by + �ℎ − ℎ 2 ⁄ , so that the expected value in risk-neutral measure is equal to the risk-free rate.

Return volatility
Time series models in which a parameter of conditional distribution is a function of past observations are widely used in econometrics. Such models are termed 'observation driven' as opposed to 'parameter driven'. Examples of observation driven models are the class of GARCH models. The stochastic volatility models are parameter driven in which the volatility is determined by an unobserved stochastic process, such as the Heston (1993)  In both cases, when the disturbance is Gaussian, the models are equivalent to the standard GARCH (1,1). If we assume that the disturbance follows a Student t distribution, there is an important difference between the standard t-GARCH (1,1) model of Bollerslev (1987) and the Harvey (2013) (3. 2) The mechanism for updating the time-varying parameter ℎ is given by: 13 The scale is defined as * = ( − 2) 1 2 � ℎ . The dynamic equation is then set up for the logarithm of scale = . 14 The appendix 6.2 presents the link between equations (3.1) and (3.2).
where is a finite constant which may be the information matrix, or some other constant, including unity. Let the information matrix be: , or score of the conditional distribution, is a random variable which has zero mean at the true parameter value.
This variable is a martingale difference by construction, 15 as it is shown below: (3.7) In the dynamic conditional score models, the idea is to replace the observation, or their squares, in the dynamic equation for the volatility, by the score of the conditional distribution. For example, let the first-order Gaussian GARCH model be, according to equation (3.1) with ~(0,1), then: The conditions on , and ensure that the variance remains positive.
The sum of and is typically close to one. The logarithm of the pdf is: and the first derivative: The information matrix is: and by replacing (3.10) and (3.11) in (3.3), we have: Rewriting it (3.12) as: and finally replacing (3.13) in (3.8), we have: where = + and −1 is given by (3.12). In the normal case, the mechanism for updating the ℎ , given by score of the conditional distribution is similar to standard Gaussian GARCH(1,1).

Beta-t-GARCH Model
When the observations have a conditional t distribution, with degrees of freedom, we can write in (3.1) as: where the serially independent zero mean variable has a standard distribution: Therefore, has a distribution and standardized to have unit variance.

Proof:
Since 2 ⁄ is the ratio of a squared standard normal to a 2 , it follows from Lemma 1 that , then its mean and variances will be given by: (3.22) The score in (3.20), according to (3.21) and (3.22), has mean zero and variance given by: (3.23) Rewriting (3.7), but with −1 , we have: This model is named Beta-t-GARCH because −1 is a linear function which its variable has a beta distribution. The principal feature of the Beta-t-GARCH class is that a linear combination of past values of the martingale difference is given by the conditional score: The variable −1 may be expressed as: (3.28) where = and = − , = 1, … , . In the limit as → ∞, ( + 1) −1 = −1 2 , it leads to the standard GARCH specification. A sufficient condition for the conditional variance to remain positive is > 0, ≥ 0, and ≥ 0, = 1, … , . The beta-t-GARCH (1,1) model is: The Beta-t-GARCH (1,1) model may be extended to include leverage effects by adding the indicator variable ( −1 < 0): Forecasts for more than one step ahead of the conditional variance can be made for Beta-t-GARCH models, as in standard GARCH case, by using the law of iterated expectations. 16

Risk-Neutral Measure
Consider a random sample of size of an asset's log-return for a period , denoted by � , � =1, . Then, the empirical risk-neutral Q measure is given by

Methodology
This section presents the methodology used to compare the proposed method to artificial and real data. To investigate its applicability in some settings, the empirical Esscher transform is applied to price European call option across a range of moneyness and maturities.
The real-world parameters are estimated using the log returns of the  In the model of Heston and Nandi (2000), the logarithmic return = ( −1 ⁄ ) is assumed to follow GARCH (1,1) in the mean process driven by the following pair of equations, under physical measure: where is the risk-free interest rate, represents the risk premium, 2 is the conditional variance, is the error term distributed as a standard normal variable, ~(0,1). The determines kurtosis, determines skewness and variance persistence is + 2 , the process will be mean-reverting if + 2 < 1. The risk-neutral version of this model can be written as: where is replaced by -1/2, * is equal to * = + � + 1 2 � and is replaced by * = + − 1/2. The price at time t of a European call option with maturity at time t+T is given by: where T is the time to maturity, * [] is the expectation at time t under riskneutral distribution, is the price of the underlying asset at time t and 1 , 2 are the risk-neutral probabilities.
The quantities 1 and 2 are the probabilities that can be obtained by inverting the characteristic functions * ( ): (3.37) (3.40) Note that and are defined recursively, by working backwards from the maturity time t+T of the option. Note also that and are functions of t, t+T, and , so that ≡ ( ; + , ) and ≡ ( ; + , ). Both these terms can be solved recursively from time t+T, working back through time and using the terminal conditions: The next terms in the backward recursion would be + = and: (3.42) As mentioned earlier, for pricing options the risk-neutral distribution must be used. To obtain the risk-neutral generating function * ( ), in all terms and , we replace with -1/2, and with * .
For our proposal, the algorithm is: For each time to maturity T, we performed bootstrap with replacement on the underlying asset's historical returns. We follow the sequence: (a) we construct a single trajectory for the asset price by drawing a certain quantity of historical log returns. For example, if the option has 17 days to maturity, then we draw the same quantity; (b) we accumulate the log returns of this trajectory and we obtain one price; (c) we repeat the process (a) and (b) 252 times (or 5 × 10 4 times) to construct the physical distribution for the price at maturity. We obtain the riskneutral measure (step 2) and we calculate the option price (step 3). We repeat this procedure 15.000 times (or 200 times) and we calculate the Mean Absolute Percentage Error (MAPE).

Testing for serial dependence in the data
The basic idea behind volatility study is that the series of log returns is serially uncorrelated or with minor lower order serial correlations, but is a stocks are serially uncorrelated, but non-linearly dependent.
If we rewrite (3.1) as � = � − �, where � is the mean of (observed market returns), then � will be the centered return. The squared series � 2 is then used to check for conditional heteroscedasticity, which is also known as the with p value 1.62e-05 for Petrobras.

Parameter estimation
Regarding the optimization of parameters, we used the following heuristics. From the sets of generated initial conditions, we start the optimization using the Nelder-Mead method. The values of the resulting parameters will serve as initial conditions for optimization through BFGS method, and these are later passed to the Nelder-Mead method and so on, until the difference between the solutions is less than a tolerance value of 0.10. also shows that a positive contributes −1 ℎ −1 to ℎ , whereas a negative has a larger impact ( + * ) −1 ℎ −1 with * ≥ 0. The test results show no significant serial correlations in the squared standardized residuals, suggesting that the models are sufficient to explain the heteroscedasticity in the log returns series of both stocks.
We investigate if the standardized residuals have a normal distribution (null hypothesis) through the Jarque Bera test. 22 In the beta-t-GARCH (1,1), the values are: p-value = 1.991e-13 for Petrobras and p-value = 2.997e-06 for Vale.
In the GARCH (1,1) of Heston-Nandi (2000), the values are equals to: p-value = 2.2e-16.   This table contains the prices for a European call option from EET-AM (empirical Esscher transform -assumed model), EET-B (bootstrap with replacement on historical returns), BS (Black-Scholes) and HN (Heston-Nandi) methods for different moneyness, maturities and they are compared to the true market price of Petrobras data. The numbers reported for each combination are the mean absolute percentage error (MAPE). In the proposed method, we use 252 returns and the simulation is repeated 15,000 times.

Conclusions
In this work, we propose a method to obtain European option prices under a GARCH framework with non-Gaussian innovations. We used a new class of models, the dynamic conditional score, proposed by Harvey (2013), for modeling the volatility (and heavy tails) of observed underlying asset prices. These models replace the observations, or their squares, by the score of the conditional distribution. They are more robust in extreme events, allowing the modeling of leverage effect, adding components of short and long-term volatility.
To avoid the formulation of a restrict model, the risk-neutralization is applied to the empirical distribution of the sample paths generated from the assumed model, as Liu et al (2015) and Duan (2002) have done. To identify a risk-neutral measure, we use the empirical Esscher transform. The sample paths are reweighted, giving rise to a risk-neutralized sample from which option prices can be obtained by a weighted sum of the options' pay-offs in each path.
We empirically compare our approach to competing benchmarks: Black-Scholes (1973) and Heston and Nandi (2000). In general, the proposed method, with an assumed model to describe the empirical distribution, presented the lowest MAPE. When we increase the size of the empirical distribution, only in lower maturities the MAPE was reduced.
Future research would benefit from conducting an extensive empirical study on the performance of our proposed pricing method, considering asset returns of different frequencies, multiple cross-sections of market option prices and long-dated options.

4.1 Introduction
The importance of option-implied information was seen in Bates (1991).
He studied the behavior of S&P 500 future options' prices prior to the crash of October 1987, and found unusually negative skewness in the option-implied distribution leading to the conclusion that the crash was expected by the market.
This fact led to the development of methods that seek more information to explain the behavior of assets, markets and investors. The assumption is based on financial assets which are continually updated and thus incorporate more recent information than other economic indicators. Several studies have emerged with varying objectives: • Pricing of illiquid derivatives and exotic options -Pérignon and Villa • Recover the asset price's stochastic process (building an implied binomial tree) - Rubinstein (1994), Derman and Kani (1994), Jackwerth (1999); • Estimate implied risk aversion (describes risk preferences of a representative agent in an economy) - Figlewski and Malik (2014), Bollerslev and Todorov (2011), Duan and Zang (2013), Bondarenko (2003, 2009), Jackwerth (2000), Aït-Sahalia and Lo (2000), Rosenberg and Engle (2002), Bliss and Panigirtzoglou (2002).
• Estimate empirical risk-neutral density -Almeida and Azevedo (2014) We can observe that most studies are concerned about the implied moments of risk-neutral distributions. According to Jackwerth (2004), these methods can be segregated into two general groups: parametric and nonparametric. 23 Parametric methods can be sub-divided into three groups: expansion, generalized distribution and mixture methods. The risk-neutral distribution is obtained by criterion optimization which minimizes the sum of squared errors given by the difference between occurred and predicted values.
Nonparametric methods allow greater flexibility when fitting the riskneutral distribution. Rather than requiring a parametric form of distribution, they allow general functions. These methods can be divided in three groups: kernel, maximum entropy and curve-fitting methods. Curve-fitting methods can be divided into other two sub-classes: fitting a function between implied volatilities and strike prices or a risk-neutral distribution being approximated by some general function.
The advantages of parametric methods, according to Bondarenko (2003), are analytical expressions and the possibility of extracting parameters to perform hedge. However, effectiveness depends on the correct modeling of the data generating process. The main advantage of nonparametric methods is that they do not require a specific format for the probability distribution, they are more flexible and adaptable to any function class. Although these techniques reduce the misspecification risk, they require larger sample sizes and are affected by irregularities such as data sparsity and problems to complete tails. Jackwerth (2004) argues that both methods suffer with negative probabilities and integrability to one. Another important constrain is the number of options that are traded in the market. Moreover, these methods are specific for a given maturity, because statistical properties of observed prices cannot be used for other maturities (Duan, 2002).
The robustness of the option-implied risk-neutral distribution can be compared between several methods by perturbing actual option prices. Bliss et al (2001) derived risk-neutral distributions based on many different perturbed sets of prices. The result is an indicative of how much risk-neutral distributions can differ from each other and that the confidence intervals around the moments can be large. Santos and Guerra (2015), Jackwerth (2004) and Bliss et al (2001) cite that the easiest and most stable methods tend to be in the group of curve-fitting methods. However, Jackwerth (2004) says that a largely unresolved area of study is the development of statistical tests. Much of the current work lacks statistical rigor and is merely descriptive.
This work introduces a new approach to indirect estimation of implicit risk-neutral probability. We generalize the discrete version of the Breeden and Litzenberger (1978) method for the case where states are not equally spaced. We suppose that the risk-neutral probability mass function is given by Empirical Esscher transform and it depends on the strike price. We use the historical distribution of the underlying asset price and the observed option prices to estimate the implicit Esscher parameter. Then, we fit a polynomial between the implied Esscher parameter and the strike price in the same spirit of Shimko (1993).
Indirect estimation combines the historical information of the underlying asset with the option-implied information. According to , option prices contain useful information that are not easily extracted using econometric models, and combining historical information may be effective and provide new insights about how information and risk preferences are incorporated into prices on financial markets.
The works of Aït-Sahalia et al (2000), Jackwerth (2000) and Rosenberg et al (2000) explore the idea of indirect estimation to obtain empirical pricing kernel, the relationship between probabilities (risk-neutral and physical) and the implied risk aversion. Grith et al (2012) uses the indirect estimation to obtain empirical pricing kernel and the risk-neutral probabilities. In our case, we suppose that the empirical pricing kernel is known and given by an empirical version of the Esscher transform (1932). This assumption is reasonable, because it is well known in the information theory that a problem of maximum entropy has a solution in the form of the Esscher transform (Buchen andKelly, Stutzer, 1996, Duan, 2002).
We ran simulation experiments under different situations which seek to highlight the differences and similarities between the methods. We compare our method to two approach alternatives: Double Lognormal proposed by Baha (1997) and the Shimko (1993) method.
The remainder of this work is organized as follows. In Section 4.2 we describe the theoretical framework of Breeden et al (1978).

Breeden and Litzenberger method
The time-state preference approach to general equilibrium in an economy, as developed by Arrow (1964) and Debreu (1959), is one of the most general frameworks available for the theory of finance under uncertainty. An Arrow-Debreu security is a security associated with a particular state of the economy which pays $1 if that state occurs, and nothing otherwise. The price of an Arrow-Debreu security is referred to as state-price. According to Cox, Ross and Rubinstein (1979), the state-price associated with a particular state is simply the risk-neutral probability of that state discounted at the risk-free rate. Breeden and Litzenberger (1978) implements the time-state preference model in a multiperiod economy, deriving the prices of Arrow-Debreu securities from prices of call options on aggregate consumption. Given the prices of Arrow-Debreu securities, the value of any cash flow is calculated, that is, these prices permit an equilibrium evaluation of assets with uncertain payoffs at many future dates.
Suppose that the value of the underlying asset in periods has a discrete probability distribution with possible values of: = $1.00, $2.00, … , $ . Let

Underlying Asset
Note that, as the strike price of a call option is increased from to + 1, two changes occur in the payoff vector: (1) the payoff in the set of states with = + 1 becomes zero, and (2) the payoffs in all states with ≥ + 2 are reduced by the change in the strike price. The difference between ( , ) − ( + 1, ) gives a payoff of $1.00 in every state with ≥ + 1, and ( + 1, ) − ( + 2, ) gives a payoff of $1.00 in every state for which ≥ + 2: A payoff of $1.00 for = 1 may be constructed as [ (0, Given the call prices, ( , ), prices of elementary claims, ( , ), must be computed from the replicating portfolio of calls that consists of one long call with = − 1, one long with = + 1 and two short calls with = .
In general, if the step size between potential values is ∆ , then [ ( , ) − ( + ∆ , )] has a payoff vector with zeros for ≤ , and with ∆ for all levels greater than or equal to + ∆ . Therefore, the portfolio of call options that produces a payment of $1.00 if the market is , and zero otherwise, is: (4.1) The payoff of portfolio ( , ) is the same as one Arrow-Debreu security that payoffs one unit of cash if, and only if, is equal to . Thus, it turns out that a complete set of options at all strike prices is equivalent to a complete set of Arrow-Debreu securities.
Consider 1⁄ shares of this portfolio: (4.2) Assuming that the asset has a continuous payoff and taking the limit of expression (4.2) as ∆ goes to zero, we have: The result (4.3) can also be obtained from the relationship proposed by Cox and Ross (1976). 24 Under the risk-neutral distribution ( ), the payoff is discounted at the deterministic risk-free rate : or, where (  (4.8) The probability distribution function (. ) can be obtained by taking the derivative of (4.7) or (4.8) with respect to K: (4.10) According to , an approximation to (. ) and (. ) can be made using finite differences. Following the put-call parity, we can replace call prices by put prices ( ) in the formulas above.
According to Aparicio and Hodges (1998), one important characteristic of the Breeden and Litzenberger (1978) approach is that no assumptions are made about the underlying asset price dynamics and also, market participants preferences are not restricted as they are reflected in the call option prices.
Moreover, it is assumed that there are no restrictions on short sales, that there are no transaction costs or taxes, and that investors may borrow at the riskless rates of interest. Their work is the starting point of a line of research addressed to the recovery of relevant aspects of the underlying asset distribution from option market data.

No-arbitrage constraints
According to Carr (2001), there are three essential properties to obtain a well-defined risk-neutral distribution (RND). The RND is nonnegative: (4.11) it integrates to one: (4.12) and the RND reprices all calls (or martingale property): The nonnegativity (4.11) and integrability (4.12) properties ensure that the risk-neutral distribution is a probability distribution. The martingale property (4.13) ensures that the means of distribution is the forward price, which includes the special case = 0. That is, this option is guaranteed to be in-the-money and at maturity we exercise the option and buy the underlying asset.
According to Brunner and Hafner (2003), these three properties, (4.11), (4.12) and (4.13), can be formulated in terms of a call option (or implied volatilities). A set of equivalent conditions is, at first, 25 that the value of a call option never be greater than the asset price and never less than its intrinsic value. Second, 25 See appendix 6.6.
that the call price function monotonically decrease. Finally, that the call option price function be convex.
Äit-Sahalia and Duarte (2003) proposed a method of option-implied density estimation based on locally polynomial regressions that incorporate shape restriction. The theory-imposed restrictions are that the price of a call option must be a decreasing (4.15) and convex function of the option's strike price (4.16).

Proposed method
Consider the notation: is the number of possible states for the underlying at maturity = ; are the possible prices at time = ; are the physical probabilities of price ; are the risk-neutral probabilities of state ; is the fair value for derivative ; 0 is the price of underlying at = 0 (present day), where = 1, … , , and K is the strike price.

Breeden and Litzenberger Method with Uneven Spaced States
The price of a call option under a discrete risk-neutral distribution can be expressed as: So, one can assemble a portfolio formed by one long position on option and a short position on +1 . The possible payoffs are then given by a sequence of + 1 zeros followed by − 1 of difference between the payoffs, given by : Consider now that we take 1 ( − −1 ) , = 1, … , , shares of these portfolios: Table 4.4: Arrow-Debreu Securities.
Now, it is easy to see that the portfolio is: and pays 1 if 1 occurs. Then, the portfolio: and pays 1 if occurs. In general: is an Arrow-Debreu security giving a payoff of 1 when, taking +1 = ∞, final state is . Note that represents the strike price around which is calculated the second difference above. Also, remember that the price of an Arrow-Debreu security is the 'risk-neutral probability' multiplied by − (see equation 4.10).
The price of this portfolio is known, since options are available in the market.

Breeden and Litzenberger Method with uneven spaced states and depending on
Consider now that the probability mass function depends on . The price of a call option would then be written as: , > , ∈ { , = 1, } (4.26) where: is the empirical Esscher transform. In our case, = = 1/ and can be cancelled: . (4.27) As in (4.21): (4.28) and according to the precedent section in equation (4.22): �� (4.30)

Nonnegativity property
This property depends on two factors in the formula (4.30): the difference between prices, ( − −1 ) and ( +1 − ), and the impact that ℎ( ) causes in the terms (. ) in the second bracket. Assuming that the difference in the second ratio, 1 ( +1 − ) ⁄ , occurs in the tenth decimal, i.e, it is much smaller than the difference in the first ratio, 1 ( − −1 ) ⁄ . Suppose that the amount generated in the first bracket is approximately equal to the value generated in the second bracket. Thus, the nonnegativity property is violated due to the quantity generated by the second ratio. But if we fit, in ℎ( ), a function that reduces the importance of the second ratio, that is, the difference in the second bracket becomes the smallest, then it does not violate the property of nonnegativity.

Integrability property
Suppose that the scenarios are equally spaced, = 1, … , , we have: It doesn't sum to one. But if ( , 0) = ( , 1), it sums to one. Note that the formula is imperfect because ℎ( ) is not perfect.
According to Casella and Berger (2010), from a purely mathematical viewpoint, any nonnegative function with a finite positive integral (or sum) can be turned into a probability distribution function (or probability mass function). For example, if ( ) is any nonnegative function that is positive on a set , 0 elsewhere, and: for some constant > 0, then the function ( ) = ( )⁄ is a probability distribution function of a random variable X taking values in A. Most methods use this standardization for risk-neutral distribution sum to one.

Martingale property
Note that call option prices vary greatly across strike prices. That is, deepin-the-money (DITM) are valued as high as the underlying asset itself, whereas deep-out-of-the-money (DOTM) calls are valued close to zero. According to Jackwerth (2004), the particular exercise of fitting the risk-neutral distribution is rarely undertaken because some of the requirements lead to numerical difficulties.
Thus, Jackwerth (2004) concludes that the choice of option prices, for each method, can be somewhat of an art.
In our case, we include the special case by calculating the h-implied for strike equal to zero. Moreover, as noted in the literature, we obtain the martingale condition depending on a set of options used for each maturity. For example, for short periods, we can use two types of combinations: DITM, in-the-money (ITM) and at-the-money (ATM) or ATM and out-of-the-money (OTM). For long periods, just ATM, OTM and DOTM. Obviously, these combinations can yield a range of risk-neutral distributions. 26

Methodology
The algorithm for our method: 1. Simulate the physical distribution for ; 2. Compute the h-implied for each observed option (including = 0); In step one, our analysis is performed based on an unknown data generator process for the underlying asset. First, we simulate 5,000 prices, for each maturity, using the data generator process of the Heston (1993) model. After, we performed 5,000 bootstraps with replacement on historical returns to simulate an unknown stochastic process for the underlying asset. 27 In step two, we consider the market price given by the Heston (1993) model. We substitute these prices in step two's equation to calculate the h-implied. In step three, we fit a polynomial of degree two in ℎ( ). 28 We calculate (. ) in step four, and in step five, .
The parameter values of the Heston (1993) model follow Almeida and Azevedo (2014). They use the objective and risk-neutral parameters estimated in Garcia, Lewis, Pastorello and Renault (2011) adopting S&P 500 daily data from January 1996 to December 2005.
Garcia et al (2011) does not report the estimated drift and the risk-free rate. Almeida et al (2014) imputes arbitrarily the drift to 10% and the risk-free rate to the year 2000 average of the 1-year T-Bill, = 5.93%. The same value for drifts has also been used in Gray and Newman (2005) and Harley and Walker (2010).
According to latter authors, these values represent typical estimates from market data.
We consider four scenarios for pricing in table 4.4. We introduce three scenarios (scenario 1, 2 and 3) beyond the estimated in Garcia (2011) et al (scenario 4). That is, we change to a positive value (scenarios 1 and 2) and we reduce the value of in 50% (scenario 1 and 3). The initial value of the volatility was equal to its long-term average. Finally, we compare our method to two approach alternatives: mixture method (following Ait-Sahalia and Duarte, 2003, Baha, 1997, Ornelas et al, 2011, Santos et al, 2015 and the Shimko (1993) method. The prices of the mixture method, or mixture of double lognormal (DLN), and the prices of the Shimko (1993) method (SHM) are calculated using expression (4.4). The strike prices were 70, 80, 90, 92.5, 95, 97.5, 100, 102.5, 105, 107.5, 110 and 120. The spot price was 100 and we use three maturities, 30, 90 and 180.

Lognormal Mixture
The mixture of lognormals was proposed by Bahra (1997) and Melick and Thomas (1997). Instead of specifying a dynamic for the underlying asset price, it is possible to make assumptions about the functional form of the risk-neutral distribution and then obtain the parameters of the distribution by minimizing the distance between observed option prices and theoretical prices. Consider the weighted sum of lognormal distribution functions: (4.31) where ( , ; ) is the ith lognormal distribution with parameters and . The term k defines the number of mixtures describing the risk-neutral distribution. In order to guarantee that ( ) is a probability distribution, ≥ 0 for = 1, … , ., and ∑ = 1. Replacing (4.48) in (4.4), we have the theoretical prices of the European call option: (4.35) Applying the mixture of two lognormals used by Bahra (1997), named double lognormal (DLN), we get the following formula for theoretical prices of European call options: For theoretical prices of European put options: (4.41) In order to find the parameters of the implied risk-neutral distribution we have to solve the minimization problem: where the first and second terms refer to the sum of the squared deviation between theoretical prices and the observed market prices ( and ). The third term of the equation states that the expected value of the risk-neutral distribution must be equal to the underlying asset's forward price ( = 0 ).

Shimko Method
The Shimko method (1993) assumes that the volatility smile is a function of strike price: (4.43) The annual implied volatility function ( ) is given by: (4.44) Shimko (1993) applies Breeden et al (1978) in the Black-Scholes (1973) model. Taking the first and second derivatives with respect to , we obtain: The probability distribution function is calculated by: where is the forward price and ( ) is the Gaussian normal distribution function: (4.52) In the original formulation of Shimko (1993), the volatilities are interpolated by the range containing the observed prices. Probabilities in the sections located above and below the range of interpolated values are estimated by lognormal distributions. In this work, we keep a constant volatility for these points, as Campa, Chang and Reider (1998) and Malz (1997).

Results
The following Figures illustrate the h-implied behavior for the different proposed scenarios. In Figures (4.1) to (4.6), we obtain the h-implied based on the data generator process of Heston (1993) to analyze several effects about the empirical parameter. In tables 4.6 to 4.9, we use bootstrap with replacement under historical prices of the data generator process by Heston (1993).

Effect of the asymmetry and kurtosis
In Figure 4.2, we analyze the impact of positive skewness on the himplied. The correlation parameter controls the skewness return's distribution.
When > 0, the probability distributions will be positively skewed. This has the effect of fattening the right tail of distribution, and thinning the left tail. In our case, to reproduce the market prices with greater strikes, the value of the h-implied grows and in consequence increases the probabilities of the underlying asset's price in the right tail. In Figure 4.3, we analyze the impact of negative skewness on the himplied. When the correlation is negative, the distribution is negatively skewed and more weight goes to the left tail, and less to the right tail. In our case, to reproduce the market prices with bigger strikes, the value of the h-implied reduces and in doing so, it decreases the probabilities of the underlying asset's price in the right tail.   We can conclude that the behavior of the h-implied depends on the deformation required in the distribution to reproduce the market price.

Effect of the sample size
The sample size affects only the intensity of the h-implied value in the tails' distribution. That is, if we have more mass in the tails, the h-implied value is small. In the table 4.5, we calculate the probability of exercising the option, for a given strike price, and we analyze the impact that the sample size (5,000 scenarios) has on the value of h-implied. In the maturity = 30 and = 120, the probability of exercising the option is very small and the value of h-implied increases (see Figure 4.6). Therefore, to minimize the effect of the sample size on the h-implied, we can discard option (D)OTM in the short term.   In general, smiles and smirks are more pronounced for short-term options and less pronounced for long-term options. This is synonymous with long-term returns being closer to being normally distributed than short-terms returns.
Moneyness and maturity are the two factors thought to influence the most the shape of smiles and smirks.
When > 0, the volatility smile is positively sloped and when < 0, the volatility smile is negatively sloped. The increase of sigma also causes an increase of the volatility smile's curvature. In this work, we can see that the proposed method follows market expectations and reproduces the volatility smile. In some situations, the smile of the proposed method alternates above and below the market volatility smile. Percentage bias for different strike prices (K) and maturities (T) from EET (empirical Esscher transform), DLN (double lognormal) and SHM (Shimko method), under assumptions of the scenarios 1 and 2 in the table 4.4. Underlying asset current price = $ 100.00; risk-free rate = 5.93%.

Conclusions
This work proposes a new approach for the estimation of risk-neutral distribution. We develop a discrete version of Breeden and Litzenberger (1978) where states are not equally spaced. Our method can be classified as an indirect way of estimation because we estimate the physical distribution of the underlying asset's historical prices and the empirical Esscher parameter from option market prices. Following, we fit a polynomial of degree two, between the h-implied and the strike price, in order to estimate the state price.
The most straightforward application of the risk-neutral distribution is pricing any payoff with the same time until expiration (including illiquid and exotic options). We ran simulation experiments under different situations, which seek to highlight the differences and similarities between the methods. We compare our method to two approach alternatives: Double Lognormal proposed by Baha (1997) and the Shimko (1993) method. We calculate the European call option prices for each method according to various scenarios and maturities. Our method shows better results in various scenarios and, when we analyze the volatility smile (or smirk), our method reproduces market asymmetry.
Further research can be done comparing the proposed method to others methodologies, studying other option-implied information and its applications, verifying the results with parametric data generating processes, and using other functions instead of polynomials to help with pricing accuracy.

Conclusion
One of the central questions in quantitative finance is how to get a measure of the risk-neutral probability that provides theoretical prices closer to those observed in the market. The literature highlights two approaches to this problem: models based on the general equilibrium (Arrow, 1964, Debreu, 1959, Lucas, 1978 and the models based on absence of arbitrage (Black-Scholes, 1973, Cox and Ross, 1976, Harrison and Kreps, 1979, Harrison and Pliska, 1981. In both cases, these approaches require the formulation of an explicit riskneutral model and are restricted to a few probability distributions for modeling the economy's uncertainty. However, empirical observations of asset returns showed several stylized facts, which highlight the parametric misspecification risk for the used stochastic process. Hence, due to the poor empirical performance of parametric methods, the nonparametric option pricing techniques have expanded rapidly in recent years, because they offer an alternative by avoiding possibly biased parametric restrictions (Haley and Walker, 2010).
The main objective of this thesis is to verify if simple assumptions on empirical pricing kernel are able to generate a measure Q that produces theoretical prices closer to those observed in the market. From our investigation, we are able to derive three articles.
The first article (Chapter 2) introduces the empirical Esscher transform and studies the nonparametric option pricing. In our proposal, we make only mild assumptions on the price kernel and there is no need for the formulation of a riskneutral model. First, we simulate sample paths for the returns under the physical measure P. Then, based on the empirical Esscher transform, the sample is reweighted, giving rise to a risk-neutralized sample from which derivative prices can be obtained by a weighted sum of the options' pay-offs in each path.
We conduct artificial experiments in Black-Scholes and Heston worlds and real experiments to explore the potential usefulness of the proposed method.
Artificial results show that the EET prices improve along with the sample size.
Real data results show that, when the stochastic process of underlying asset is unknown, the lowest pricing errors are between the nonparametric methods. For a maturity equal to 17, the nonparametric methods present similar results. For others maturities, the proposed method presents the lowest MAPE for moneyness equal to deep-out-of-the-money, out-of-the-money and at-the-money.
We also analyze the behavior of the empirical Esscher parameter. We can highlight that the standard deviation decreases along with the maturity and with the increase in the sample size, and the values of the descriptive statistics begin to converge to a constant value in larger samples. When we compare only between the empirical Esscher parameter obtained for synthetic and real data, the more important change is the signal. That is, the Esscher parameters obtained with synthetic data are simulated with a drift ( = 10.00%) greater than the risk-free rate ( = 5.00%). Thus, the negative parameter shifts the risk-neutral distribution to the left, which eliminates the risk premium and assures the average yield equal to the risk-free rate. With real data, the opposite happens. The positive parameter shifts the risk-neutral distribution to the right. This is contrary to financial theory.
However, this does not constitute an arbitrage opportunity, because the daily riskfree rate is between the worst and the best daily return. When price time series are in falling, applications in risk-free interest rates are paying more than in these stocks.
In the second article (Chapter 3), we demonstrate that our proposal is flexible and performs very well in the presence of realistic financial time series.
We use a recently proposed dynamic conditional score models, developed by Harvey (2013), which offers an alternative to model the volatility (and heavy tails) of observed underlying asset price using GARCH models and analyzes the nonparametric option pricing method.
We empirically compare our approach to competing benchmark approaches, like Black-Scholes (1973) and Heston and Nandi (2000), with real data. In general, the proposed method with an assumed model to describe the empirical distribution presents the lowest MAPE. When we increase the size of the empirical distribution, only for lower maturity the MAPE was reduced.
In our third contribution (Chapter 4), we introduce a new approach for indirect estimation of the implicit state-price in financial asset prices using the empirical Esscher transform. First, we generalize the discrete version of the spaced. Second, we use the empirical Esscher transform to include underlying assets' and derivatives' data. We use the historical distribution of the underlying asset prices and the observed option prices to estimate the implicit empirical Esscher parameter. Then, we fit polynomials between the implied Esscher parameter and the strike price, as in Shimko (1993), to obtain our measure Q.
We run simulation experiments under different situations, which seek to highlight the differences and similarities between the methods. We compare our method to two approach alternatives: Double Lognormal proposed by Baha (1997) and the Shimko (1993)

Jacobian Method
Let a continuous random variable with density ( ). We want to find the density of a new random variable = ℎ( ), where ℎ(. ) is a function. The density of , ( ), can be found from following procedure: where the absolute value | ⁄ |, called Jacobian term, is to ensure that . (6.32) The logarithm of (6.32): (6.33)

Changing the Measure
Under the risk-neutral probability, the discounted price process must be a martingale. The first condition that the risk-neutral probability structure must satisfy is that the discounted price process has zero drift under this structure and it must also be equivalent to the original structure (i.e., the same set of price paths must have positive probability under both structures). This section presents the key to achieving both results.

Girsanov Theorem
Since under continuous-time stochastic processes the events can occur over a continuous range of values, we define the physical probability space as Though ( ) gives the time 0 probability density, we do not explicitly mention the subscript for time 0. Let a stochastic process ( ) be adapted, which means that it is observable on ℱ . In other words, it is possible to deduce all possible values of ( ) based on events in ℱ .
At least since the paper of Black and Scholes (1973), it has become commonplace in the study of derivative pricing to employ continuous-time models of asset price evolution. Most continuous-time models are based on Wiener process. Assume that the Ito process , with drift and variance 2 : where is Wiener process. Suppose we wish to change the drift of the (6.34) to another drift in an equivalent probability structure. We use the following procedures: 1. First, we define a new Wiener process, � , using the old Wiener process, ; 2. Next, we redefine a new process, � , using the step 1.
Step 1: Define and � : − = (6.35) Note that λ need not be a constant, since , and are not required to be constants. Now define the process � by: (6.36) Step 2: Now, let the process Ŷ be defined by: By definition, the � process has a drift of . A little bit of algebra shows that this process � is identical to the process defined in (6.34). Replace (6.36) in (6.37): Consider the discounted price process obtained by discounting the growth in stock prices at the risk-free rate, i.e., by dividing the stock price, , by the bond price, : = . (6.42) The bond price evolves from its initial value 0 = 1 according to the ordinary differential equation: where r (expressed in continuously compounded terms) is constant. The stock price evolves from its initial value 0 according to the stochastic differential equation: = + . (6.44) Using Ito's lemma in conjunction with (6.43) and (6.44), we can derive the instantaneous drift and variance of the process. Let denote this drift and 2 the variance: = + . (6.45) To find a risk-neutral probability, we have to find an equivalent probability structure in which has zero drift. That is, = 0 in (6.35) and λ is given by: = . (6.46) Girsanov's Theorem ensures that the process may be rewritten using Ŵ as a process with zero drift: = � (6.47) Since the new probability structure is equivalent to the original structure, and since is a martingale under the new structure.

Breeden and Litzenberger Method with Integral
According to Leibnitz rule, if we can derivative ( , ), ( ) and ( ) with relation at , we have:  (6.67)

Boundary Conditions of Option Prices
In order to avoid arbitrage opportunities, there are several restrictions on the price of an option, which are discussed in the appendix. This section discusses boundary conditions of calls only. 29 In the first restriction (4.14), the value of a call option is never greater than the asset price and never less than its intrinsic value: The price of a call cannot be negative ( , ) ≥ 0, as it needs not be exercised, because its intrinsic value is ( − − ; 0). If call has a negative price, ( , ) < 0, then a riskless profit could be made by buying the call (receiving an instant positive profit equal to the value of the call) and holding it until expiration to make a nonnegative income equal to the value of the call at expiration. The price also cannot exceed the stock price ( , ) ≤ 0 , since ending up owning the stock is the best that can happen to the option holder. If  At expiration, portfolio #1 has the same or a higher value than portfolio #2.
Hence, an arbitrage-free call has to satisfy: ( , ) + − ≥ 0 → ( , ) ≥ (0, 0 − − ). In the second restriction (4.15), the value of a vertical call spread is nonpositive or the call price function is monotonically decreasing: The monotonicity of the call option prices establishes that the following inequality has to be satisfied: ( 1 − 2 ) − ≥ ( 2 , ) − ( 1 , ), ℎ 1 > 2 . (6.72) In case inequality above is not fulfilled, buy ( 1 ) short sell ( 2 ). This yields at least ( 1 − 2 ) − , which is invested at the risk-free rate. This portfolio has a nonnegative value at expiration. Moreover, it requires no net investment, and in case of strict inequality even yields a positive amount of money at = 0. A detailed analysis is shown in table 6.2. Value at = 0 Value at expiration = Finally, the third restriction (4.16), the value of a butterfly spread is nonnegative or the call option price function is convex. The risk-neutral distribution is given as a function of the first derivative of the call option pricing formula: ( ) = 1 + ( , ) | = . (6.73) If is twice differentiable, the risk-neutral density function is given by the following expression: (6.74)