Читайте также:
|
|
It follows that
Vn + h|n = (F 2 ⊗ F 1) Vn + h− 1 |n (F 2 ⊗ F 1) _
_ _
+ σ 2 (F 2 ⊗ F 1) Vn + h− 1 |n (G 2 ⊗ G 1) _ + (G 2 ⊗ G 1) Vn + h− 1 |n (F 2 ⊗ F 1) _
Appendix: Derivations101 | |||||||||||||||
( | G | 2 ⊗ | F | 1 + | F | ⊗ | G | 1)_ | V | n + h− 1 |n + | M | M | _ | ||
+ σ | −→h− 1 | (−→h− 1) _ |
× (G 2 ⊗ F 1+ F 2 ⊗ G 1) _
( | G | 2 ⊗ | G | 1)_ | V | M | M | _( | G | ⊗ | G | 1) _. | ||||
+ σ | n + h− 1 |n + 2 −→h− 1 | (−→h− 1) _ |
The forecast mean and variance are given by
µn + h|n =E(yn + h | xn, zn) = w 1 _Mh− 1 w 2
and
vn + h|n =V(yn + h | xn, zn) =V[vec(w 1 _Qh− 1 w 2+ w 1 _Qh− 1 w 2 _ε n + h)]
= V[(w 2 _ ⊗ w 1 _) −→Qh− 1 + (w 2 _ ⊗ w 1 _) −→Qh− 1 ε n + h ]
= (w 2 _ | ) + σ | 2 M | M | w | ⊗ | w | 1) | |||
⊗ w 1 _)[ Vn + h − 1 |n (1+ σ | −→h− 1 | (−→h− 1) _ ]( |
= (1 + σ 2)(w 2 _ ⊗ w 1 _) Vn + h− 1 |n (w 2 _ ⊗ w 1 _) _ + σ 2 µ 2 n + h|n.
When σ is sufficiently small (much less than 1), it is possible to obtain some simpler but approximate expressions. The second term in (6.31) can be dropped to give Mh = F 1 h− 1 M 0(F 2 h− 1) _, and so
µn + h|n ≈ w 1 _F 1 h− 1 xn (w 2 _F 2 h− 1 zn) _.
The order of this approximation can be obtained by noting that the obser-
vation equation may be written as yt = u 1, t u 2, t u 3, t , where u 1, t = w 1 _xt− 1, u 2, t = w 2 _zt− 1and u 3, t =1+ ε t. Then
E(yt) = E(u 1, tu 2, tu 3, t) = E(u 1, tu 2, t)E(u 3, t),
because u 3, t is independent of u 1, t and u 2, t . Therefore, because E(u 1, t u 2, t ) = E(u 1, t )E(u 2, t ) + Cov(u 1, t , u 2, t), we have the approximation:
µn + h|n =E(yn + h | xn, zn) =E(u 1, n + h | xn)E(u 2, n + h | zn)E(u 3, n + h) + O (σ 2).
When u 2, n + h is constant the result is exact. Now let
µ 1, h =E(u 1, n + h +1 | xn) =E(w 1 _xn + h | xn) = w 1 _F 1 h xn,
µ 2, h =E(u 2, n + h +1 | zn) =E(w 2 _zn + h | zn) = w 2 _F 2 h zn, v 1, h =V(u 1, n + h +1 | xn) =V(w 1 _xn + h | xn),
and
Then
v 2, h
v 12, h
= V(u 2, n + h +1 | zn) = V(w 2 _zn + h | zn),
= Cov(u 2 + +, u 2 + + | xn, zn)
1, n h 1 2, n h 1
= Cov([ w 1 _xn + h ]2, [ w 2 _zn + h ]2 | xn, zn).
= 0 _m, |
102 6 Prediction Distributions and Intervals
µn + h|n = µ 1, h − 1 µ 2, h − 1+ O (σ 2) = w 1 _F 1 h− 1 xnw 2 _F 2 h− 1 zn + O (σ 2).
By the same arguments, we have
E(y 2 t) = E(u 1,2 tu 2,2 tu 3,2 t) = E(u 1,2 tu 2,2 t)E(u 3,2 t), | ||
and | ||
E(y 2 n + h | zn, xn) = E(u 1,2 n + hu 2,2 n + h | xn, zn)E(u 3,2 n + h) | _ | |
_ |
= Cov(u 21, n + h, u 22, n + h | xn, zn) + E(u 21, n + h | xn)E(u 22, n + h | zn) E(u 23, n + h)
= (1 + σ 2)[ v 12, h − 1 + (v 1, h − 1 + µ 21, h − 1)(v 2, h − 1 + µ 22, h − 1)].
Assuming that the covariance v 12, h − 1 is small compared to the other terms, we obtain
vn + h|n ≈ (1+ σ 2)( v 1, h− 1+ µ 21, h− 1)( v 2, h− 1+ µ 22, h− 1) − µ 21, h− 1 µ 22, h− 1.
We now simplify these results for the ETS(M,Ad,M) case where xt = (_t, bt) _
and zt = (st,..., st−m +1) _, and the matrix coefficients are w 1 _ = [1, φ ], w 2 _ = [0,..., 0, 1],
F 1 | = | φ | , F 2 = | 0 _ | , | ||||||||
_0 φ | _ | _ | m− 1 | ||||||||||
Im− 1 | 0 m− 1_ | ||||||||||||
G 1 | = | α α | , | and G 2 | = | 0 _ | γ | . | |||||
m− 1 | 0 m− 1_ | ||||||||||||
_ β β _ | _ | Om− 1 |
Many terms will be zero in the formulae for the expected value and the
variance because of the following relationships: G 22 = Om, w 2 _G 2
and (w 2 _ ⊗ w 1 _)(G 2 ⊗ X) = 02 _m where X is any 2 × 2 matrix. For the
terms that remain, w 2 _ ⊗ w 1 _ and its transpose will only use the terms from
the last two rows of the last two columns of the large matrices because
w 2 _ ⊗ w 1 _ = [02 _m− 2, 1, 1].
Using the small σ approximations and exploiting the structure of the ETS(M,Ad,M) model, we can obtain simpler expressions that approximate
µn + h|n and vn + h|n .
Note that w 2 _F 2 j G 2 = γdj +1, m w 2 _. So, for h < m, we have
h
w 2 _zn + h | zn = w 2 _ ∏(F 2+ G 2 ε n + h−j +1) zn = w 2 _F 2 h zn = sn−m + h +1 j =1
Furthermore,
µ 2, h = sn−m + h + m
and v 2, h =_(1+ γ 2 σ 2) hm − 1_ s 2 −m + h +.
n m
Appendix: Derivations |
Also note that xn has the same properties as for ETS(M,Ad,N) in Class 2. Thus
µ 1, h = _n + φh bn
and v 1, h = (1+ σ 2) θh − µ 21, h .
Combining all of the terms, we arrive at the approximations
µ | n + h|n | = µ ˜ | n + h|n | s | + + O (σ 2) | |||||
n−m + hm | ||||||||||
and | vn + h|n ≈ s 2 n−m + hm +_ θh (1+ σ 2)(1+ γ 2 σ 2) hm − µ ˜2 n + h|n _, | |||||||||
where µ ˜ | n + h|n | = _n + φh bn, | θ 1= | µ ˜2 | , and | |||||
n +1 |n | ||||||||||
h− 1 | h ≥ 2. | |||||||||
θh = µ ˜2 n + h|n + σ 2∑ | (α + βφj)2 θh−j, | |||||||||
j =1 |
These expressions are exact for h ≤ m. The other cases of Class 3 can be derived as special cases of ETS(M,Ad,M).
Derivation of Cj Values
We first demonstrate that for Class 1 models, lead-time demand can be resolved into a linear function of the uncorrelated level and error compo-nents. Back-solve the transition equation (6.20) from period n + j to period n, to give
j
xn + j = F j xn +∑ F j−igε n + i. i =1
Now from (6.19) and (6.20) we have
yn + j = w_xn + j− 1+ ε n + j
= w_F xn + j− 2 + w_gε n + j− 1 + ε n + j
.
.
.
j− 1
= w_F j− 1 xn + ∑ w_F j−i− 1 gε n + i + ε n + j i =1
j− 1
= µn + j|n + ∑ cj−iε n + i + ε n + j , i =1
where ck = w_F k− 1 g. Substituting this into (6.11) gives (6.15).
104 6 Prediction Distributions and Intervals
To derive the value of Cj for the ETS(A,Ad,A) model, we plug the value of ci from Table 6.2 into (6.13) to obtain
j
Cj =1+∑(α + βφi + γdi , m )
i =1
j j
= 1 + α j + β ∑ φi + γ ∑ di , m
i =1 | i =1 | ||
= 1 + α j + | βφ | _(j + 1)(1 − φ) − (1 − φj +1)_ + γ jm, | |
(1 − φ)2 |
where jm = j / m_ is the number of complete seasonal cycles that occur within j time periods.
A similar derivation for the ETS(A,A,A) model leads to
j | ||
Cj =1+∑(α + iβ + γdi , m ) =1+ j α + | 21 β (j +1) + γ jm. | |
i =1 | _ | _ |
The expressions for Cj for the other linear models are obtained as special cases of either ETS(A,Ad,A) or ETS(A,A,A) and are given in Table 6.6.
Selection of Models
One important step in the forecasting process is the selection of a model that could have generated the time series and would, therefore, be a reasonable choice for producing forecasts and prediction intervals. As we have seen in Chaps. 2–4, there are many specific models within the general innovations state space model (2.12). There are also many approaches that one might implement in a model selection process. In Sect. 7.1, we will describe the use of information criteria for selecting among the innovations state space models. These information criteria have been developed specifically for time series data and are based on maximized likelihoods. We will consider four commonly recommended information criteria and one relatively new infor-mation criterion. Then, in Sect. 7.2, we will use the MASE from Chap. 2 to develop measures for comparing model selection procedures. These mea-sures will be used in Sects. 7.2.2 and 7.2.3 to compare the five information criteria with each other, and the commonly applied prediction validation method for model selection using the M3 competition data (Makridakis and Hibon 2000) and a hospital data set. We also compare the results with the application of damped trend models for all time series. Finally, some implications of these comparisons will be given in Sect. 7.3.
7.1 Information Criteria for Model Selection
The goal in model selection is to pick the model with the best predictive ability on average. Finding the model with the smallest within-sample one-step-ahead forecast errors, or even the one with the maximum likelihood, does not assure us that the model will be the best one for forecasting.
One approach is to use an information criterion which penalizes the like-lihood to compensate for the potential overfitting of data. The general form of the information criteria for an innovations state space model is
ˆ | | y) + qζ (n), | (7.1) | |
IC = − 2 log L (θ, x ˆ0 |
Selection of Models | |||||
Table 7.1.Penalties in the information criteria. | |||||
Criterion | ζ (n) | Penalty | Source | ||
AIC | 2 q | Akaike (1974) | |||
BIC | log(n) | q log(n) | Schwarz (1978) | ||
HQIC | 2 log(log(n)) | 2 q log(log(n)) | Hannan and Quinn (1979) | ||
AICc | 2 n /(n− q − 1) | 2 qn /(n− q − 1) | Sugiura (1978) | ||
LEIC | Empirical c | qc | Billah et al. (2003) | ||
ˆ | | y)is the maximized likelihood function, q is the number of | ||||
where L (θ, x ˆ0 | |||||
parameters in | ˆ | , and ζ (n) is a function | |||
θ plus the number of free states in x ˆ0 |
of the sample size. Thus, qζ (n) is the penalty assigned to a model for the number of parameters and states in the model. (We also require that the state space model has no redundant states—see Sect. 10.1, p. 149.) The information criteria that will be introduced in this chapter are summarized in Table 7.1.
For the Gaussian likelihood, we can drop the additive constants in
− 2 log(L (ˆ, ˆ |))and replace the expression by L∗ (,)from (5.3) to
θ x 0 y θ x 0
obtain
n | n | ||||||||
IC = n log ∑ | ε 2 t | + 2 ∑ log | r (xt | − | 1) | + qζ (n). | (7.2) | ||
t =1 | t =1 | | | | |
Recall from Chap. 5 that ε t = [ yt − w (xt− 1)]/ r (xt− 1). Also, the likelihood function is based on a fixed seed state x 0. Not only is the fixed seed state critical for this form of the Gaussian likelihood in the nonlinear version, it is essential in both the linear and nonlinear case for comparing models that differ by a nonstationary state (see Chap. 12 for a discussion of this problem).
Дата добавления: 2015-10-24; просмотров: 134 | Нарушение авторских прав
<== предыдущая страница | | | следующая страница ==> |
Example 6.4: Forecast variance for the ETS(A,A,A) model 1 страница | | | Example 6.4: Forecast variance for the ETS(A,A,A) model 3 страница |