Skip to main content.

Appendix A: Additional analysis of the New Zealand SADEM Energy Model

Econometric Estimation of End-use Sectors

SADEM currently uses econometric estimation of the residential and commercial-other industry sectors, as well as for petrol and for diesel demand. An analysis of the econometric parameterization and results in the SADEM model should focus on:

  1. Understanding the current specification;
  2. Interpreting the results of the current model; and
  3. Thinking more broadly about whether this is the correct specification, and if not what improvements could be made.

Understanding the current specification

A multivariate Ordinary Least Squares (OLS) approach is used - this requires that the error terms (or residuals) must be independent, and normally distributed with a mean of zero and a variance equal to the square of the standard deviations. It is often worth using scatter plots to ensure this is indeed a linear system, although in this case by inspection these appear to be well defined trends.

As in all econometrics, a detailed understanding of the residual pattern is required to test for autocorrelation - i.e., serial correlation between error terms. This could entail residual plots to test the equal distribution of the error terms, and normal probability plots to test whether the error terms are normally distributed. Alternatively there is a range of econometric tests - these include the Durbin Watson statistic (given here) and the LM statistic. Especial care should be taken when dealing with time series data (as in this case). Additional analysis could be made using contingency tables.

The amount of data points varies from series to series, with 1961-2004 data for the residential and commercial-other industry sectors (i.e., n = 43), 1969-2004 data for the petrol use (n = 35) and 1975-2004 data for diesel use (n = 29). These sample sizes are not huge, but generally n > 30 (or rather [n-1-k] for degrees of freedom) is a benchmark for a well defined and normal distribution (under the law of large numbers).

The natural logs of the independent and dependent variables are used - this is fine as none of the parameters ever becomes negative. This has the added value of ensuring that the coefficients of the regression are elasticities (i.e., the % change in Y from a % change in X). Note that dummy variables are not logged and hence care should be taken in explaining their magnitude and impact.

Heteroscedascity can occur when residuals do not have a common variance - this is generally solved by dividing through by the expected difference and re-running the regression - this is less likely to be a problem in this case. What may well be a problem is the issue of multicollinearity - when two or more of the dependent variables are related. This could be the case in terms of energy demands and GDP and is discussed in the next section.

Interpreting the results of the current model

It should be emphasized that in forecasting through to only 2010, any reasonable model is unlikely to be very wrong. This is especially true for these relatively straightforward trends (the exception being a step change in diesel demand). Any difference and misspecifications will of course be magnified as we go out further in time. Annex 1 details the revised specifications of the econometric estimations.

Generally in looking at the four series, it is striking that the Adjusted R2 values are so high - we are focussing on Adjusted R2 to correct for the inevitable better fit from adding parameters. Adjusted R2 of 0.977. 0.994, 0.993 and 0.994 suggest that these estimations are being driven by the specification of lagged demand (period t-1). This is supported by inspection of the well-behaved time trends of these four energy demands. One consequence of this is that a similar result could probably be found by dropping all the other parameters and simply using a time trend on past demand (as in the aviation demand estimation). A superior approach would be to drop the use of lagged demand as a variable and analyse the true drivers of energy demand.

The importance of the prior demand variable is further supported by the very high coefficients for this parameter, as well as its exceptionally low p-value indicating its significance. In addition, the low AIC values support the parsimonious specification of the regression - indeed the problem is not in too many explanatory variables but the overriding influence of just one of them. In addition to dominating the regression, lagged demand is also to be expected to exhibit multicollinearity with other dependent variables, especially GDP.

Finally, dropping the lagged demand variable would eliminate some potential concerns over the robustness of the OLS model in terms of its bias, and correlation between error terms. Although the Durbin Watson statistic is around 2, which generally supports the case for independent error terms (although perhaps with less certainty in the residential case), removal of the lagged variable would give greater credence to the use of OLS in this case.

Turning to an examination of the coefficients and p-values in the 4 series, it is worth noting the changes in these values between EO 03, NP 05 and those in the SADEM documentation - is this purely down to updated data? All the signs and magnitudes appear to be reasonable, although with lagged demand having by far the largest influence.

For the residential and commercial-industrial series, it is recommended that the MED might consider dropping the use of the degree day variable. It is not significant at any reasonable level, and would be expected to add little insight in terms of the expected temperature changes over the next 20 years. In addition, the inclusion of degree days is problematic in terms of equal weighting being given to cooling and heating degree days (presumably the former would necessitate electricity-led air conditioning while the former would require other fuels for space heating). Staying with the commercial-industrial series, GDP is not a significant parameter in this model, which is surprising. It is likely that a crude specification of GDP is not encompassing the structural change in economic patterns and hence energy use, and hence not picking up a key driver.

For the diesel series, the use of the dummy variable gives some cause for concern. It is clearly significant, but imparts no information as to what is driving this post-1993 change. Dummy variables are of considerable use, as long as they clearly define two states of the world and can provide insight into why this is different. Hence this variable could be better defined or the model might be re-specified to shed actual insights into the underlying causes of trend changes.

Finally, it is not clear whether this model can be used to robustly forecast beyond 2010. As the model is currently swamped by the inclusion of lagged demand it does not generate real insights into the actual drivers of this demand, and hence arguably requires respecification. Examples of difficulties include understanding the role of increased efficiency and potential saturation effects in energy use.

Is this the correct specification? What improvements could be made?

The previous discussion strongly suggested that these 4 series are not well specified, particularly in generating clear insights on underlying drivers of energy demand. Certainly removing all parameters bar lagged demand and rerunning the regression could illustrate the importance of this variable. Subsequently dropping the lagged variable for a respecification of the key drivers would be of great interest. If lagged demand is to be kept in, then testing the potential bias of OLS estimation using an error correction model (regression of first differences of the independent and dependent variables) would be of interest.

In any respecification of the independent variables, some potential drivers spring to mind for the various series. Personal transport (petrol) drivers include average fuel efficiencies, vehicle utilization, current fleet structure and penetration rates for new vehicles in the market. We have noted the intention of the MED modelling team to liase with the Ministry of Transport concerning the tier II bottom-up vehicle fleet model and again strongly encourage this. Residential energy use drivers include average home sizes, use of insulation, and average comfort temperatures. Note that it would be useful to specify residential energy use by fuel - for instance electricity vs. natural gas to model changes in appliance use, air conditioning and water-space heating. For the commercial-other industrial estimation, a disaggregation not by fuel but rather by sector would be superior. This would only need to be done by 6 or 10 sectors (depending on data availability). Finally, for freight transport (diesel) a clear understanding of step changes via specific dummy variables would be of great interest for actual insights. As an alternative to a dummy variable(s), piecewise linear regression for diesel demand could be employed.

Finally, in any respecification of the independent variables, consideration should be made to the use of squared terms, to capture the impact of different base levels of a parameter, or interaction terms, to capture the impact of levels of other variables on the parameter in question.

Annex 1: Econometric results

Petrol: log(Qt) = α + β1log(Qt-1) + β2log(Pt) + β3log(GDPt) +εt

where

α - is the intercept term

Qt - is the demand for petrol (PJ) at time t

Pt - is the petrol price at time t

GDPt - is the GDP figure at time t

εt - is the error term and is assumed to be Normally distributed with mean zero and constant variance.

Parameter: α β1 β2 β3
Estimate: 0.334 0.742 -0.051 0.149
p-value: 0.068 <0.001 0.008 0.038
EO'03: 0.340 0.653 -0.065 0.216

Statistic: R2 Adjusted R2 AIC Durbin-Watson Log Likelihood
Estimate: 0.979 0.977 -82.65 2.183 86.6502

Diesel: log(Qt) = α + β1log(Qt-1) + β2log(Pt) + β3log(GDPt) + β4Dummy +εt

where

α - is the intercept term

Qt - is the demand for diesel (PJ) at time t

Pt - is the diesel price at time t

GDPt - is the GDP figure at time t

Dummyt - is the dummy variable defined as taking a value of 1 from 1993 onwards or zero otherwise.

εt - is the error term and is assumed to be Normally distributed with mean zero and constant variance.

Parameter: α β1 β2 β3 β4
Estimate: -1.270 0.725 -0.054 0.349 0.120
p-value: 0.084 <0.001 0.064 0.012 0.001
EO'03: -0.837 0.675 -0.078 0.325 0.130

Statistic: R2 Adjusted R2 AIC Durbin-Watson Log Likelihood
Estimate: 0.995 0.994 -55.43 2.249 60.4324

Residential: log(Qt) = α + β1log(Qt-1) + β2log(Pt) + β3log(GDPt-1) + β4log(DDt) +εt

where

α - is the intercept term

Qt - is the demand for energy (PJ) at time t

Pt - is the weighted average energy price at time t

GDPt - is the GDP figure at time t

DDt - is the HDD and CDD population weighted

εt - is the error term and is assumed to be Normally distributed with mean zero and constant variance.

Parameter: α β1 β2 β3 β4
Estimate: -0.929 0.534 -0.086 0.365 0.052
p-value: 0.021 <0.001 0.003 <0.001 0.119
EO'03: -1.393 0.599 -0.083 0.319 0.118

Statistic: R2 Adjusted R2 AIC Durbin-Watson Log Likelihood
Estimate: 0.994 0.993 -105.40 1.678 110.402

Other Industrial and Commercial: log(Qt) = α + β1log(Qt-1) + β2log(Pt) + β3log(GDPt) + β4log(DDt) +εt

where

α - is the intercept term

Qt - is the demand for energy (PJ) at time t

Pt - is the weighted average energy price at time t

GDPt - is the GDP figure at time t

DDt - is the HDD and CDD population weighted

εt - is the error term and is assumed to be Normally distributed with mean zero and constant variance.

Parameter: α β1 β2 β3 β4
Estimate: -0.186 0.838 -0.067 0.123 0.030
p-value: 0.755 0.000 0.073 0.125 0.493
EO'03: -0.823 0.776 -0.062 0.220 0.054

Statistic: R2 Adjusted R2 AIC Durbin-Watson Log Likelihood
Estimate: 0.994 0.994 -90.63 2.086 95.6341