Example 6.4: Forecast variance for the ETS(A,A,A) model 2 страница

Читайте также:

It follows that

^Vn + h|n = (^F 2 ^⊗ ^F 1) ^Vn + h− 1 |n (^F 2 ^⊗ ^F 1) ^_

_ _

+ σ ² (F ₂ ⊗ F ₁) V_n ₊ _h₋ ₁ _|_n (G ₂ ⊗ G ₁) ^_ + (G ₂ ⊗ G ₁) V_n ₊ _h₋ ₁ _|_n (F ₂ ⊗ F ₁) ^_

Appendix: Derivations101

(

2 ^⊗

1 +

⊗

₁)^_

n + h− 1 |n ⁺

+ σ

^−→h− 1

(^−→_h₋ ₁) ^_

× (G ₂ ⊗ F ₁+ F ₂ ⊗ G ₁) ^_

(

₂ ⊗

₁)^_

_₍

⊗

₁) ^_.

+ σ

_n ₊ _h₋ ₁ _|_n + 2 ^−→_h₋ ₁

(^−→_h₋ ₁) ^_

The forecast mean and variance are given by

µn + h|n =E(yn + h | ^xn, ^zn) = ^w ₁ ^_^Mh− 1 ^w 2

and

v_n ₊ _h_|_n =V(y_n ₊ _h | x_n, z_n) =V[vec(w ₁ ^_Q_h₋ ₁ w ₂+ w ₁ ^_Q_h₋ ₁ w ₂ ^_ε _n ₊ _h)]

= V[(w ₂ ^_ ⊗ w ₁ ^_) ^−→Q_h₋ ₁ + (w ₂ ^_ ⊗ w ₁ ^_) ^−→Q_h₋ ₁ ε _n ₊ _h ]

= (w ₂ ^_		) + σ	² M	M	w		⊗	w	1)
⊗ w ₁ ^_)[ V_n _{+ h} ₋ ₁ _\|_n (1+ σ	^−→h− 1	(^−→_h₋ ₁) ^_ ](

= (1 + σ ²)(w ₂ ^_ ⊗ w ₁ ^_) V_n ₊ _h₋ ₁ _|_n (w ₂ ^_ ⊗ w ₁ ^_) ^_ + σ ² µ ² _n ₊ _h_|_n.

When σ is sufficiently small (much less than 1), it is possible to obtain some simpler but approximate expressions. The second term in (6.31) can be dropped to give M_h = F ₁ ^h⁻ ¹ M ₀(F ₂ ^h⁻ ¹) ^_, and so

µ_n ₊ _h_|_n ≈ w ₁ ^_F ₁ ^h⁻ ¹ x_n (w ₂ ^_F ₂ ^h⁻ ¹ z_n) ^_.

The order of this approximation can be obtained by noting that the obser-

vation equation may be written as y_t = u _{1, t} u _{2, t} u _{3, t}, where u _{1, t} = w ₁ ^_x_t₋ ₁, u _{2, t}= w ₂ ^_z_t₋ ₁and u _3, _t =1+ ε _t. Then

E(yt) = E(u 1, tu 2, tu 3, t) = E(u 1, tu 2, t)E(u 3, t),

because u _{3, t} is independent of u _{1, t} and u _{2, t}. Therefore, because E(u _{1, t} u _{2, t}) = E(u _{1, t})E(u _{2, t}) + Cov(u _{1, t}, u _2, _t), we have the approximation:

µn + h|n =E(yn + h | ^xn, ^zn) =E(u 1, n + h | ^xn)E(u 2, n + h | ^zn)E(u 3, n + h) + O (σ ²).

When u _{2, n}₊ _h is constant the result is exact. Now let

µ _{1, h}=E(u _{1, n}₊ _h ₊₁ | x_n) =E(w ₁ ^_x_n ₊ _h | x_n) = w ₁ ^_F ₁ ^h x_n,

µ _{2, h}=E(u _{2, n}₊ _h ₊₁ | z_n) =E(w ₂ ^_z_n ₊ _h | z_n) = w ₂ ^_F ₂ ^h z_n, v 1, h =V(u 1, n + h +1 | ^xn) =V(^w ₁ ^_^xn + h | ^xn),

and

Then

^v 2, h

^v 12, h

= V(u 2, n + h +1 | ^zn) = V(^w ₂ ^_^zn + h | ^zn),

= Cov(u ² ^{+ +}, u ² ^{+ +} | x_n, z_n)

1, n h 1 2, n h 1

= Cov([ w ₁ ^_x_n ₊ _h ]², [ w ₂ ^_z_n ₊ _h ]² | x_n, z_n).

= 0 ^__m,

102 6 Prediction Distributions and Intervals

µ_n ₊ _h_|_n = µ _{1, h} ₋ ₁ µ _{2, h} ₋ ₁+ O (σ ²) = w ₁ ^_F ₁ ^h⁻ ¹ x_nw ₂ ^_F ₂ ^h⁻ ¹ z_n + O (σ ²).

By the same arguments, we have

E(y ² _t) = E(u _1,² _tu _2,² _tu _3,² _t) = E(u _1,² _tu _2,² _t)E(u _3,² _t),
and
E(y ² _n ₊ _h \| ^zn, ^xn) = E(u _1,² _n ₊ _hu _2,² _n ₊ _h \| ^xn, ^zn)E(u _3,² _n ₊ _h)	_
_

= Cov(u ²_{1, n}₊ _h, u ²_{2, n}₊ _h | ^xn, ^zn) + E(u ²_{1, n}₊ _h | ^xn)E(u ²_{2, n}₊ _h | ^zn) E(u ²_{3, n}₊ _h)

= (1 + σ ²)[ v _{12, h} ₋ ₁ + (v _{1, h} ₋ ₁ + µ ²_{1, h} ₋ ₁)(v _{2, h} ₋ ₁ + µ ²_{2, h} ₋ ₁)].

Assuming that the covariance v _{12, h} ₋ ₁ is small compared to the other terms, we obtain

^vn + h|n ^≈ ⁽¹⁺ ^σ ²⁾⁽ ^v 1, h− 1⁺ ^µ ²1, h− 1⁾⁽ ^v 2, h− 1⁺ ^µ ²2, h− 1⁾ ⁻ ^µ ²1, h− 1 ^µ ²2, h− 1^.

We now simplify these results for the ETS(M,A_d,M) case where x_t = (__t, b_t) ^_

and z_t = (s_t,..., s_t₋_m ₊₁) ^_, and the matrix coefficients are w ₁ ^_ = [1, φ ], w ₂ ^_ = [0,..., 0, 1],

F ₁

, F ₂ =

₀ _

^_0 φ

m− 1

^Im− 1

0 _m₋ ₁^_

G ₁

α α

and G ₂

0 ^_

m− 1

0 _m₋ ₁^_

_ _{β β} _

^Om− 1

Many terms will be zero in the formulae for the expected value and the

variance because of the following relationships: G ²₂ = O_m, w ₂ ^_G ₂

and (w ₂ ^_ ⊗ w ₁ ^_)(G ₂ ⊗ X) = 0₂ ^__m where X is any 2 × 2 matrix. For the

terms that remain, w ₂ ^_ ⊗ w ₁ ^_ and its transpose will only use the terms from

the last two rows of the last two columns of the large matrices because

w ₂ ^_ ⊗ w ₁ ^_ = [0₂ ^__m₋ ₂, 1, 1].

Using the small σ approximations and exploiting the structure of the ETS(M,A_d,M) model, we can obtain simpler expressions that approximate

^µn + h|n ^and ^vn + h|n ^.

Note that w ₂ ^_F ₂ ^j G ₂ = γd_j ₊_{1, m} w ₂ ^_. So, for h < m, we have

^w ₂ ^_^zn + h | ^zn = ^w ₂ ^_ ∏(^F 2+ ^G 2 ε n + h−j +1) ^zn = ^w ₂ ^_^F ₂ ^h ^zn = sn−m + h +1 j =1

Furthermore,

^µ 2, h ⁼ ^s_n−_m + _h ⁺ _m

and v _{2, h}=^_(1+ γ ² σ ²) ^h^m − 1^_ s ² _−m _{+ h}+.

ⁿ m

Appendix: Derivations

Also note that x_n has the same properties as for ETS(M,A_d,N) in Class 2. Thus

µ 1, h = _n + φh bn

and v _{1, h}= (1+ σ ²) θ_h − µ ²_{1, h}.

Combining all of the terms, we arrive at the approximations

		µ	n + h\|n	= µ ˜	n + h\|n	s	+ + O (σ ²)
					n−m + h_m
and	v_n ₊ _h_\|_n ≈ s ² _n₋_m ₊ _h_m +^_ θ_h (1+ σ ²)(1+ γ ² σ ²) ^h^m − µ ˜² _n ₊ _h_\|_n ^_,
where µ ˜	n + h\|n	= __n + φ_h b_n,	θ ₁=	µ ˜²		, and
						n +1 \|n
						h− 1	h ≥ 2.
		θ_h = µ ˜² _n ₊ _h_\|_n + σ ²∑	(α + βφ_j)² θ_h₋_j,
						j =1

These expressions are exact for h ≤ m. The other cases of Class 3 can be derived as special cases of ETS(M,A_d,M).

Derivation of C_j Values

We first demonstrate that for Class 1 models, lead-time demand can be resolved into a linear function of the uncorrelated level and error compo-nents. Back-solve the transition equation (6.20) from period n + j to period n, to give

x_n ₊ _j = F ^j x_n +∑ F ^j⁻ⁱgε _n ₊ _i. i =1

Now from (6.19) and (6.20) we have

y_n ₊ _j = w^_x_n ₊ _j₋ ₁+ ε _n ₊ _j

= w^_F x_n ₊ _j₋ ₂ + w^_gε _n ₊ _j₋ ₁ + ε _n ₊ _j

j− 1

= w^_F ^j⁻ ¹ x_n + ∑ w^_F ^j⁻ⁱ⁻ ¹ gε _n ₊ _i + ε _n ₊ _j i =1

j− 1

= ^µn + j|n + ∑ ^cj−i^ε n + i + ^ε n + j ^, i =1

where c_k = w^_F ^k⁻ ¹ g. Substituting this into (6.11) gives (6.15).

104 6 Prediction Distributions and Intervals

To derive the value of C_j for the ETS(A,A_d,A) model, we plug the value of c_i from Table 6.2 into (6.13) to obtain

C_j =1+∑(α + βφ_i + γd_i _{, m})

i =1

j j

= 1 + α j + β ∑ φ_i + γ ∑ d_i _{, m}

	i =1	i =1
= 1 + α j +	βφ	^_(j + 1)(1 − φ) − (1 − φ^j ⁺¹)^_ + γ j_m,
(1 − φ)²

where j_m = j / m_ is the number of complete seasonal cycles that occur within j time periods.

A similar derivation for the ETS(A,A,A) model leads to

j
C_j =1+∑(α + iβ + γd_i _{, m}) =1+ j α +	₂¹ β (j +1) + γ j_m.
i =1	_	_

The expressions for C_j for the other linear models are obtained as special cases of either ETS(A,A_d,A) or ETS(A,A,A) and are given in Table 6.6.

Selection of Models

One important step in the forecasting process is the selection of a model that could have generated the time series and would, therefore, be a reasonable choice for producing forecasts and prediction intervals. As we have seen in Chaps. 2–4, there are many specific models within the general innovations state space model (2.12). There are also many approaches that one might implement in a model selection process. In Sect. 7.1, we will describe the use of information criteria for selecting among the innovations state space models. These information criteria have been developed specifically for time series data and are based on maximized likelihoods. We will consider four commonly recommended information criteria and one relatively new infor-mation criterion. Then, in Sect. 7.2, we will use the MASE from Chap. 2 to develop measures for comparing model selection procedures. These mea-sures will be used in Sects. 7.2.2 and 7.2.3 to compare the five information criteria with each other, and the commonly applied prediction validation method for model selection using the M3 competition data (Makridakis and Hibon 2000) and a hospital data set. We also compare the results with the application of damped trend models for all time series. Finally, some implications of these comparisons will be given in Sect. 7.3.

7.1 Information Criteria for Model Selection

The goal in model selection is to pick the model with the best predictive ability on average. Finding the model with the smallest within-sample one-step-ahead forecast errors, or even the one with the maximum likelihood, does not assure us that the model will be the best one for forecasting.

One approach is to use an information criterion which penalizes the like-lihood to compensate for the potential overfitting of data. The general form of the information criteria for an innovations state space model is

ˆ	\| y) + qζ (n),	(7.1)
IC = − 2 log L (θ, x ˆ₀

		Selection of Models
			Table 7.1.Penalties in the information criteria.
Criterion	ζ (n)	Penalty	Source
AIC			2 q	Akaike (1974)
BIC		log(n)	q log(n)	Schwarz (1978)
HQIC		2 log(log(n))	2 q log(log(n))	Hannan and Quinn (1979)
AICc		2 n /(n− q − 1)	2 qn /(n− q − 1)	Sugiura (1978)
LEIC		Empirical c	qc	Billah et al. (2003)
		ˆ	\| y)is the maximized likelihood function, q is the number of
where L (θ, x ˆ₀
parameters in	ˆ		, and ζ (n) is a function
θ plus the number of free states in x ˆ₀

of the sample size. Thus, qζ (n) is the penalty assigned to a model for the number of parameters and states in the model. (We also require that the state space model has no redundant states—see Sect. 10.1, p. 149.) The information criteria that will be introduced in this chapter are summarized in Table 7.1.

For the Gaussian likelihood, we can drop the additive constants in

− 2 log(L (^{ˆ, ˆ} |))and replace the expression by L^∗ (,)from (5.3) to

θ x ₀ y θ x ₀

obtain

n		n
IC = n log ∑	ε ² _t	+ 2 ∑ log	r (x_t	−	1)	+ qζ (n).	(7.2)
t =1		t =1	\|		\|

Recall from Chap. 5 that ε _t = [ y_t − w (x_t₋ ₁)]/ r (x_t₋ ₁). Also, the likelihood function is based on a fixed seed state x ₀. Not only is the fixed seed state critical for this form of the Gaussian likelihood in the nonlinear version, it is essential in both the linear and nonlinear case for comparing models that differ by a nonstationary state (see Chap. 12 for a discussion of this problem).

Дата добавления: 2015-10-24; просмотров: 134 | Нарушение авторских прав

<== предыдущая страница	\|	следующая страница ==>
Example 6.4: Forecast variance for the ETS(A,A,A) model 1 страница	\|	Example 6.4: Forecast variance for the ETS(A,A,A) model 3 страница

mybiblioteka.su - 2015-2025 год. (0.03 сек.)