Saturday, March 02, 2019

Yet another example of climate science that isn’t

Ross McKitrick destroys a new paper by Ben Santer et al. in Nature Climate Change.

Here is the link.

After reading the article, you will appreciate that those who are quick to call others "climate deniers" are, perhaps, those most likely to lack the understanding of what might be wrong with the "alarmist" view.

Here are some excerpts.
-----------------------------------------------------
Ben Santer et al. have a new paper out in Nature Climate Change arguing that with 40 years of satellite data available they can detect the anthropogenic influence in the mid-troposphere at a 5-sigma level of confidence. This, they point out, is the “gold standard” of proof in particle physics, even invoking for comparison the Higgs boson discovery in their Supplementary information.

----------


Their results are shown in the above Figure. It is not a graph of temperature, but of an estimated “signal-to-noise” ratio. The horizontal lines represent sigma units which, if the underlying statistical model is correct, can be interpreted as points where the tail of the distribution gets very small. So when the lines cross a sigma level, the “signal” of anthropogenic warming has emerged from the “noise” of natural variability by a suitable threshold. They report that the 3-sigma boundary has a p value of 1/741 while the 5-sigma boundary has a p value of 1/3.5million. Since all signal lines cross the 5-sigma level by 2015, they conclude that the anthropogenic effect on the climate is definitively detected.

I will discuss four aspects of this study which I think weaken the conclusions considerably: (a) the difference between the existence of a signal and the magnitude of the effect; (b) the confounded nature of their experimental design; (c) the invalid design of the natural-only comparator; and (d) problems relating “sigma” boundaries to probabilities.
---------
Confounded signal design
So they haven’t identified a distinct anthropogenic fingerprint. What they have detected is that observations exhibit a better fit to models that have the Figure 2 warming pattern in them, regardless of cause, than those that do not.
--------
Invalid natural-only comparator
The above argument would matter less if the “nature-only” comparator controlled for all known warming from natural forcings. But it doesn’t, by construction.

Everything depends on how valid the natural variability comparator is. We are given no explanation of why the authors believe it is a credible analogue to the natural temperature patterns associated with post-1979 non-anthropogenic forcings. It almost certainly isn’t.
----------
t-statistics and p values
The probabilities associated with the sigma lines in Figure 1 are based on the standard Normal tables. People are so accustomed to the Gaussian (Normal) critical values that they sometimes forget that they are only valid for t-type statistics under certain assumptions, that need to be tested. I could find no information in the Santer et al. paper that such tests were undertaken.

I will present a simple example of a signal detection model to illustrate how t-statistics and Gaussian critical values can be very misleading when misused.

A simple way of investigating causal patterns in time series data is using an autoregression. Simply regress the variable you are interested in on itself aged once plus lagged values of the possible explanatory variables. Inclusion of the lagged dependent variable controls for momentum effects, while the use of lagged explanatory variables constrains the correlations to a single direction: today’s changes in the dependent variable cannot cause changes in yesterday’s values of the explanatory variables. This is useful for identifying what econometricians call Granger causality: when knowing today’s value of one variable significantly reduces the mean forecast error of another variable.

My temperature measure (“Temp”) is the average MT temperature anomaly in the weather balloon records. I add up the forcings into “anthro” (ghg + o3 + aero + land) and “natural” (tsi + volc + ESOI).

I ran the regression Temp = a1 + a2* l.Temp + a3*l.anthro +a4* l.natural where a lagged value is denoted by an “l.” prefix. The results over the whole sample length are:

The coefficient on “anthro” is more than twice as large as that on “natural” and has a larger t-statistic. Also its p-value indicates a probability of detection if there were no effect of 1 in 2.4 billion. So I could conclude based on this regression that anthropogenic forcing is the dominant effect on temperatures in the observed record.

The t-statistic on anthro provides a measure much like what the Santer et al. paper shows. It represents the marginal improvement in model fit based on adding anthropogenic forcing to the time series model, relative to a null hypothesis in which temperatures are affected only by natural forcings and internal dynamics. Running the model iteratively while allowing the end date to increase from 1988 to 2017 yields the results shown below in blue (Line #1):


It looks remarkably like Figure 1 from Santer et al., with the blue line crossing the 3-sigma level in the late 90s and hitting about 8 sigma at the peak.

But there is a problem. This would not be publishable in an econometrics journal because, among many other things, I haven’t tested for unit roots. I won’t go into detail about what they are, I’ll just point out that if time series data have unit roots they are nonstationary and you can’t use them in an autoregression because the t-statistics follow a nonstandard distribution and Gaussian (or even Student’s t) tables will give seriously biased probability values.

I ran Phillips-Perron unit root tests and found that anthro is nonstationary, while Temp and natural are stationary. This problem has already been discussed and grappled with in some econometrics papers (see for instance here and the discussions accompanying it, including here).

A possible remedy is to construct the model in first differences. If you write out the regression equation at time t and also at time (t-1) and subtract the two, you get d.Temp = a2* l.d.Temp + a3*l.d.anthro +a4*l.d.natural, where the “d.” means first difference and “l.d.” means lagged first difference. First differencing removes the unit root in anthro (almost – probably close enough for this example) so the regression model is now properly specified and the t-statistics can be checked against conventional t-tables.

The coefficient magnitudes remain comparable but—oh dear—the t-statistic on anthro has collapsed from 8.56 to 1.32, while those on natural and lagged temperature are now larger. The problem is that the t-ratio on anthro in the first regression was not a t-statistic, instead it followed a nonstandard distribution with much larger critical values. When compared against t tables it gave the wrong significance score for the anthropogenic influence. The t-ratio in the revised model is more likely to be properly specified, so using t tables is appropriate.

The corresponding graph of t-statistics on anthro from the second model over varying sample lengths are shown in Figure 4 as the green line (Line #2) at the bottom of the graph. Signal detection clearly fails.

What this illustrates is that we don’t actually know what are the correct probability values to attach to the sigma values in Figure 1. If Santer et al. want to use Gaussian probabilities they need to test that their regression models are specified correctly for doing so. But none of the usual specification tests were provided in the paper, and since it’s easy to generate a vivid counterexample we can’t assume the Gaussian assumption is valid.

No comments: