This post was written on Aug 12, 2014, but not published until Mar 2, 2020 (today).
One of the signature findings of IPCC AR5 WG2 has been that climate change has already had a negative impact on crop yields, especially wheat and maize. These findings are prominent in the WG2 Summary for Policy Makers and were featured in WG2 press coverage. The topic of crop yields are a specialty of WG2 Co-Chair Christopher Field. Field’s frequent co-author, David Lobell, was a Lead Author of the chapter on Food (chapter 7), which in turn cited and relied on a series of Lobell articles, in particular, Lobell et al (Science 2011, Climate Trends and Global Crop Production Since 1980, pdf), which was a statistical analysis of crop yields from 1980 to 2008 (or to 2002 in some analyses) for four major crops (wheat, maize, rice, soy) for 185 countries.
In the period 1980-2008, both crop yields and temperatures have positive trends (notwithstanding the pause/hiatus in the 21st century). Because both series have positive trends, there is therefore a positive correlation between crop yields and temperatures for the vast majority of crop-country combinations.
Given that both series are going up, it is an entirely valid question to wonder who Lobell and coauthors arrived at their signature negative impact merely by applying elementary statistical methods to annual data of yields, temperature and precipitation. I’ll look at this question in today’s post.
Data
In 2011, I obtained the data for Lobell et al 2011 from lead author Lobell (who undertook at the time to place both data and code online, neither of which appears to be done.) I had asked Lobell to archive code, because it wasn’t entirely clear what he had done. Lobell collated temperature and precipitation data from both UDel and CRU. (For the latter, Lobell used the CRU TS data made famous by the Harry Readme.) In the figure below, I’ve plotted Lobell’s yield and temperature data for the China-wheat combination (both standardised to SD units), as an example of both series going up.
Lobell regressed Yield (actually log Yield) against time, temperature and precipitation variables, describing the procedure as follows:
Translating these climate trends into potential yield impacts required models of yield response. We used regression analysis of historical data to relate past yield outcomes to weather realizations. All of the resulting models include T and P, their squares, country-specific intercepts to account for spatial variations in crop management and soil quality, and country-specific time trends to account for yield growth due to technology gains (6).
The precipitation and quadratic terms don’t appear to affect the regression very much, i.e. the main effects are delivered by the model in which Yield is regressed against time and temperature as follows:
(1) Yield ~ Year + Temperature
Using conventional regression nomenclature, the regression coefficient b is given by the formula
(2) b= (X^T * X)^{-1} X^T y
where the X matrix of independent variables if {Year; Temperature} and y is the Yield vector.
For convenience (and thus is irrelevant to the point that I’m working towards), normalize the data.
X^T y is simply the vector of correlations of Yield to Time (the normalized trend) and Temperature.
(X^T * X) is nothing more than the correlation matrix between Year and Temperature i.e. the off-diagonal element r is the temperature trend (normalized units) as follows:
| 1 r |
| r 1 |
The calculation of the OLS regression coefficients uses the inverse of this matrix,
| 1 -r | * 1/(1-r^2)
| -r 1 |
The negative term in the off-diagonal means that the OLS coefficient for the regression of Yield onto Time and Temperature is calculated as a function of the correlation between yield and temperature, the trend in yield, the trend in temperature as follows:
b_temperature = 1/(1-r^2) (-r*trend_yield + cor_yield_temp)
In other words, if the correlation between Yield and temperature is less than the product of the trend in yields and trend in temperature (both normalized), then the regression coefficient is negative. This has nothing to do with yields or temperatures, but is a trivial property of the matrix algebra.
As an example, for the Chinese wheat series shown above, although there is a positive correlation between yield and temperature (0.5096), the OLS regression coefficient of a regression of Yield against Year and Temperature results in a negative coefficient. Applying the above formula, the normalized trends (correlations between year and item) for yield and temperature are 0.984 and 0.548, yielding 0.5096- 0.984*0.584 <0.