Saturday, March 05, 2016

Statistical temperature forecasts

Here is a link to a paper by Terence Mills, "Statistical Forecasting How Fast Will Future Warming Be?"  It puts the current climate forecasts in perspective, statistically.

Terence Mills is Professor of Applied Statistics and Econometrics at Loughborough University.

Terrence provides good reasons for doubting the global warming precision certainty you hear so often from politicians, the media, and many "climate scientists".

Here are the "Introduction" and "Discussion" sections of the paper.


The analysis and interpretation of temperature data is clearly of central importance to debates about anthropological global warming (AGW) and climate change in general. For the purpose of projecting future climate change, scientists and policymakers rely heavily on large-scale ocean–atmosphere general circulation models, which have grown in size and complexity over recent decades without necessarily becoming more reliable at forecasting. The field of economics spent the post-war decades developing computerised models of the economy that also grew to considerable size and complexity, but by the late 1970s two uncomfortable truths had been realised. First, these models produced generally poor forecasts, and adding more equations or numerical detail did not seem to fix this. Second, relatively simple statistical models that had no obvious basis in economic theory were proving much more reliable at forecasting. It took many years for economists to rationalise statistical forecasting by working out its structural connections to this theory. But before this had happened, economic practitioners were already relying on these models simply because of their relative success.

Is there a parallel with climatology? In recent years, statisticians and econometricians have begun applying the tools of statistical forecasting to climate datasets. As these exercises have become more and more successful, there is a corresponding concern that such models either have no basis in climatological theory, or may even seem to contradict it. In this report we focus on forecasting models in general, and their application to climate data in particular, while leaving aside the potentially interesting question of how such models might or might not be reconciled with the physical theory underpinning climate models.

Data organised as evenly-spaced observations over time are called ‘time series’. The analysis of time series has a long and distinguished history, beginning with descriptive examinations but with major technical advances occurring in the early years of the 20th century, following quickly on from the development of the concept of correlation.1 The publication of the first edition of George Box and Gwilym Jenkins’ famous book Time Series Analysis: Forecasting and Control in 1970 brought the techniques for modelling and forecasting time series to a wide audience. Their methods have since been extended, refined and applied to many disciplines, notably economics and finance, where they provide the foundations for time series econometrics.2 Although some of the fundamental developments in time series were made using meteorological data, it is notable that many contributors to the debates concerning AGW and climate change seem unaware of this corpus of theory and practice, although contributions by time series econometricians have now begun to appear, albeit with rather limited influence on such debate.3,4

The main purpose of this report is to set out a framework that encompasses a wide range of models for describing the evolution of an individual time series. All such 1 models decompose the data into random and deterministic components, and then use these components to generate forecasts of future observations, accompanied by measures of uncertainty. A central theme of the report is that the choice of model has an important impact on the form of the forecasts and on the behaviour of forecast uncertainty, particularly as the forecast horizon increases. But since some models fit the data better than others, we are able to provide some guidance about which sets of forecasts are more likely to be accurate. The framework is illustrated using three readily available and widely used temperature series. These are:

 • the HADCRUT4 global land and sea surface anomaly series, available monthly from January 1850

 • the Remote Sensing System (RSS) lower troposphere series, available monthly from January 1979

 • Central England temperatures (CET), available monthly from January 1659.

In each case the series are examined up to December 2014.5 All computations are performed using commercially available software, so that the analyses should be easily replicable, and hence could be refined and extended by anyone familiar with such software. Indeed, it is taken to be the very essence of statistical modelling that these models, and hence the forecasts computed using them, should be subjected to ‘severe testing’ and subsequently replaced by superior models if found wanting in any aspect.


The central aim of this report is to emphasise that, while statistical forecasting appears highly applicable to climate data, the choice of which stochastic model to fit to an observed time series largely determines the properties of forecasts of future observations and of measures of the associated forecast uncertainty, particularly as the forecast horizon increases. The importance of this result is emphasised when, as in the examples presented above, alternative well-specified models appear to fit the observed data equally well – the ‘skinning the cat’ phenomenon of modelling temperature time series.11

In terms of the series analysed throughout the paper, a clear finding presents itself for the two global temperature series. Irrespective of the model fitted, forecasts do not contain any trend, with long-horizon forecasts being flat, albeit with rather large measures of imprecision even from models in which uncertainty is bounded. This is a consequence of two interacting features of the fitted models: the inability to isolate a significant drift or trend parameter and the large amount of overall noise in the observations themselves compared to the fitted ‘signals’. Both of these features make forecasting global temperature series a necessarily uncertain exercise, but stochastic models are at least able to accurately measure such uncertainty.

The regional CET series does contain a modest warming signal, the extent of which has been shown to be dependent on the season: winters have tended to become warmer, spring and autumn less so, and summers have shown hardly any trend increase at all. The monthly pattern of temperatures through the year has remained stable throughout the entire 355 years of the CET record.

The models considered in the report also have the ability to be updated as new observations become available. At the time of writing, the HADCRUT4 observations for the first four months of 2015 were 0.690, 0.660, 0.680 and 0.655. Forecasts from the ARIMA (0, 1, 3) model made at April 2015 are now 0.642 for May, 0.635 for June and 0.633 thereafter, up from the forecast of 0.582 made at December 2014. This uplift is a consequence of the forecasts for the first four months of 2015, these being 0.588, 0.593, 0.582 and 0.582, underestimating the actual outturns, although the latter are well inside the calculated forecast intervals.

What the analysis also demonstrates is that fitting a linear trend, say, to a preselected portion of a temperature record, a familiar ploy in the literature, cannot ever be justified.12 At best such trends can only be descriptive exercises, but if the series is generated by a stochastic process then they are likely to be highly misleading, will have incorrect measures of uncertainty attached to them and will be completely useless for forecasting. There is simply no substitute for analysing the entire temperature record using a variety of well-specified models.

It may be thought that including ‘predictor’ variables in the stochastic models will improve both forecasts and forecast uncertainty. Long experience of forecasting nonstationary data in economics and finance tells us that this is by no means a given, even though a detailed theory of such forecasting is available.13 Models in which ‘forcing’ variables have been included in this framework have been considered, with some success, when used to explain observed behaviour of temperatures.14 Their use in forecasting, where forecasts of the forcing variables are also required, has been much less investigated, however: indeed, the difficulty in identifying stable relationships between temperatures and otherforcing variables suggests that analogous problems to those found in economics and finance may well present themselves here as well.

No comments: